Back to 2020
@mikiobraun
Mikio Braun
@mikiobraun
Replying to @caglarml
IIRC KL divergence encodes coding length if you built a code using one probability distribution as assumption and the other is the real one. Makes total sense that one being small and the other being large is bad, but the other way not so much.
1 Likes