diff --git a/_psets/3.md b/_psets/3.md index 7da5512020f8bdcaf01a08ab4a2f6f292b6ace3c..49b6645d0bb1760e9f442454812380f139d6eea3 100644 --- a/_psets/3.md +++ b/_psets/3.md @@ -8,12 +8,141 @@ title: Problem Set 3 Verify that the entropy function satisfies the required properties of continuity, non-negativity, monotonicity, and independence. +I will prove these properties for the discrete case. + +First, I want to show that $$\lim_{x \to 0^+} x \log_b x = 0$$ for any $$b > 1$$, since otherwise +entropy is not defined for distributions that assign zero probability to any outcome. This can be +done using [L'Hôpital's Rule](https://en.wikipedia.org/wiki/L%27H%C3%B4pital%27s_rule). + +$$ +\begin{align*} +\lim_{x \to 0^+} x \log_b x &= \lim_{x \to 0^+} \frac{\log_b x}{x^{-1}} \\ +&= \lim_{x \to 0^+} \frac{\frac{d}{d x} \log_b x}{\frac{d}{d x} x^{-1}} \\ +&= \lim_{x \to 0^+} \left( \frac{1}{x \ln b} \right) \left( \frac{-1}{x^{-2}} \right) \\ +&= \lim_{x \to 0^+} \frac{-x}{\ln b} \\ +&= 0 +\end{align*} +$$ + +### Continuity + +To talk about the continuity of entropy, we need to define a topology for probability distributions. +Let $$\Omega$$ be a set of finite cardinality $$n$$. The space of probability distributions over +$$\Omega$$ can be viewed as the set of vectors in $$\mathbb{R}_{\geq 0}^n$$ with $$L^1$$ norm 1. In +this way entropy is a function that maps a subset of $$\mathbb{R}_{\geq 0}^n$$ to $$\mathbb{R}$$. So +I will prove the continuity of entropy with respect to the topologies of $$\mathbb{R}_{\geq 0}^n$$ +and $$\mathbb{R}$$. + +First let's show that $$x \log x$$ is continuous. I take as given that $$\log(x)$$ is a continuous +function on its domain. Then $$x \log(x)$$ is also continuous, since finite products of continuous +functions are continuous. This suffices for $$x > 0$$. At zero, $$x \log x$$ is continuous because +we have defined it to be equal to the limit we found above. + +Thus each term of the entropy function is a continuous function from $$\mathbb{R}$$ to +$$\mathbb{R}$$. This suffices to show that negative entropy is continuous, based on the lemma below. +Thus entropy is continuous, since negation is a continuous function, and finite compositions of +continuous functions are continuous. + +The necessary lemma is easy to prove, if symbol heavy. I will use the +[$$L^1$$ norm](https://en.wikipedia.org/wiki/Norm_(mathematics)#p-norm), but this is without loss of +generality because all norms on a finite dimensional vector space induce the same topology. Let +$$f : \mathbb{R}^n \to \mathbb{R}$$ and $$g : \mathbb{R} \to \mathbb{R}$$ be continuous functions, +and define $$h : \mathbb{R}^{n + 1} \to \mathbb{R}$$ as + +$$h(x_1, \ldots, x_{n + 1}) = f(x_1, \ldots, x_n) + g(x_{n + 1})$$ + +Fix any $$x = (x_1, \ldots, x_{n + 1}) \in \mathbb{R}^{n + 1}$$, and any $$\epsilon > 0$$. Since +$$f$$ is continuous, there exists some positive $$\delta_f$$ such that for any $$y \in +\mathbb{R}^n$$, $$\lVert (x_1, \ldots, x_n) - (y_1, \ldots, y_n) \rVert < \delta_f$$ implies +$$\lVert f(x_1, \ldots, x_n) - f(y_1, \ldots, y_n) \rVert < \epsilon / 2$$. For the same reason +there is a similar $$\delta_g$$ for $$g$$. Let $$\delta$$ be the smaller of $$\delta_f$$ and +$$\delta_g$$. Now fix any $$y \in \mathbb{R}^{n + 1}$$ such that $$\lVert x - y \rVert < \delta$$. +Note that + +$$ +\begin{align*} +\lVert (x_1, \ldots, x_n) - (y_1, \ldots, y_n) \rVert &= \sum_{i = 1}^n \lVert x_i - y_i \rVert \\ +&\leq \sum_{i = 1}^{n + 1} \lVert x_i - y_i \rVert \\ +&< \delta_f +\end{align*} +$$ + +and similarly for the projections of $$x$$ and $$y$$ along the $$n + 1$$st dimension. Thus + +$$ +\begin{align*} +\lVert h(x) - h(y) \rVert +&= \lVert f(x_1, \ldots, x_n) + g(x_{n + 1}) - f(y_1, \ldots, y_n) + g(y_{n + 1}) \rVert \\ +&\leq \lVert f(x_1, \ldots, x_n) - f(y_1, \ldots, y_n) \rVert + + \lVert g(x_{n + 1}) + g(y_{n + 1}) \rVert \\ +&< \frac{\epsilon}{2} + \frac{\epsilon}{2} \\ +&= \epsilon +\end{align*} +$$ + +It follows that $$h$$ is continuous. + + +### Non-negativity + +The probability of each individual outcome must be between zero and one. Thus $$-p_i \log p_i \geq +0$$ for all $$i$$. Since $$x \log x$$ is only equal to zero when $$x$$ is zero or one, the entropy +can only be zero when a single outcome has probability one. + +### Monotonicity + +Note that $$\partial/\partial p_i H(p) = -\log(p_i) - 1$$ for any $$i$$. This is a strictly +decreasing function, so entropy is strictly concave on all of $$\mathbb{R}_{\geq 0}^n$$. The +constraint that $$\sum p_i$$ is one is linear, so entropy is concave on this subset of +$$\mathbb{R}_{\geq 0}^n$$ as well. Thus there is a unique global maximum. + +We can locate it using a [Lagrange multiplier](https://en.wikipedia.org/wiki/Lagrange_multiplier). +Our Lagrange function is + +$$ +-\sum_{i = 1}^n p_i \log p_i + \lambda \left( \sum_{i = 1}^n p_i - 1 \right) +$$ + +The partial derivative with respect to any $$p_i$$ is $$-\log p_i - 1 + \lambda$$. Since this +depends only on $$\lambda$$, it implies that all the $$p_i$$ must be the same. Taking our constraint +into account this means there's only one possibility: $$p_i = 1/n$$ for all $$i$$. + +Call this distribution $$p_*$$. Its entropy is $$-\sum_{i = 1}^n 1/n \log 1/n = -\log 1/n$$. Thus + +$$ +H(p) \leq H(p_*) = - \log \frac{1}{n} +$$ + +for all probability distributions $$p$$ over $$n$$ outcomes. Equality is only achieved for $$p_*$$ +itself, by the strict concavity of entropy. Note that $$H(p_*)$$ grows without bound as $$n$$ +increases. + +### Independence + +$$ +\begin{align*} +H(p, q) &= -\sum_{i = 1}^n \sum_{j = 1}^m p_i q_j \log(p_i q_j) \\ +&= -\sum_{i = 1}^n \sum_{j = 1}^m p_i q_j \left( \log p_i + \log q_j \right) \\ +&= -\sum_{i = 1}^n \sum_{j = 1}^m p_i q_j \log p_i - + \sum_{i = 1}^n \sum_{j = 1}^m p_i q_j \log q_j \\ +&= -\sum_{i = 1}^n p_i \log p_i \sum_{j = 1}^m q_j - + \sum_{j = 1}^m q_j \log q_j \sum_{i = 1}^n p_i \\ +&= -\sum_{i = 1}^n p_i \log p_i - \sum_{j = 1}^m q_j \log q_j \\ +&= H(p) + H(q) +\end{align*} +$$ + ## (4.2) {:.question} Prove the relationships in Equation (4.10). +$$ +\begin{align*} +\end{align*} +$$ + ## (4.3)