200114_simulation.PNG
-
Amira Abdel-Rahman authoredAmira Abdel-Rahman authored
title: Problem Set 3
(4.1)
{:.question} Verify that the entropy function satisfies the required properties of continuity, non-negativity, monotonicity, and independence.
I will prove these properties for the discrete case.
First, I want to show that \lim_{x \to 0^+} x \log_b x = 0 for any b > 1, since otherwise entropy is not defined for distributions that assign zero probability to any outcome. This can be done using L'Hôpital's Rule.
\begin{align*} \lim_{x \to 0^+} x \log_b x &= \lim_{x \to 0^+} \frac{\log_b x}{x^{-1}} \\ &= \lim_{x \to 0^+} \frac{\frac{d}{d x} \log_b x}{\frac{d}{d x} x^{-1}} \\ &= \lim_{x \to 0^+} \left( \frac{1}{x \ln b} \right) \left( \frac{-1}{x^{-2}} \right) \\ &= \lim_{x \to 0^+} \frac{-x}{\ln b} \\ &= 0 \end{align*}
Continuity
To talk about the continuity of entropy, we need to define a topology for probability distributions. Let \Omega be a set of finite cardinality n. The space of probability distributions over \Omega can be viewed as the set of vectors in \mathbb{R}_{\geq 0}^n with L^1 norm 1. In this way entropy is a function that maps a subset of \mathbb{R}_{\geq 0}^n to \mathbb{R}. So I will prove the continuity of entropy with respect to the topologies of \mathbb{R}_{\geq 0}^n and \mathbb{R}.
First let's show that x \log x is continuous. I take as given that \log(x) is a continuous function on its domain (after all it's the inverse of e^x, which is strictly monotonic and C^\infty). Then x \log(x) is also continuous, since finite products of continuous functions are continuous. This suffices for x > 0. At zero, x \log x is continuous because we have defined it to be equal to the limit we found above.
Thus each term of the entropy function is a continuous function from \mathbb{R}_{\geq 0} to \mathbb{R}. But we can also view each term as a function from \mathbb{R}_{\geq 0}^n to \mathbb{R}. Each one ignores most of its inputs, but this doesn't change its continuity. (The epsilon-delta proof follows easily from the triangle inequality, since the only part of the distance between inputs that matters is that along the active coordinate.) So entropy is a sum of continuous functions, and is thus continuous.
Non-negativity
The probability of each individual outcome must be between zero and one. Thus -p_i \log p_i \geq 0 for all i. Since x \log x is only equal to zero when x is zero or one, the entropy can only be zero when a single outcome has probability one.
Monotonicity
Note that \partial/\partial p_i H(p) = -\log(p_i) - 1 for any i. This is a strictly decreasing function, so entropy is strictly concave on all of \mathbb{R}_{\geq 0}^n. The constraint that \sum p_i is one is linear, so entropy is concave on this subset of \mathbb{R}_{\geq 0}^n as well. Thus there is a unique global maximum.
We can locate it using a Lagrange multiplier. Our Lagrange function is
-\sum_{i = 1}^n p_i \log p_i + \lambda \left( \sum_{i = 1}^n p_i - 1 \right)
The partial derivative with respect to any p_i is -\log p_i - 1 + \lambda. Since this depends only on \lambda, it implies that all the p_i must be the same. Taking our constraint into account this means there's only one possibility: p_i = 1/n for all i. This is the maximum entropy distribution that we seek.
Call this distribution p_*. Its entropy is -\sum_{i = 1}^n 1/n \log 1/n = -\log 1/n. Thus
H(p) \leq H(p_*) = - \log \frac{1}{n}
for all probability distributions p over n outcomes. Equality is only achieved for p_* itself, since the maximum is unique. Note that H(p_*) grows without bound as n increases.