Skip to content
Snippets Groups Projects
Commit d5573e2a authored by Erik Strand's avatar Erik Strand
Browse files

Add answers for 4.1

parent 5b229107
Branches
No related tags found
No related merge requests found
...@@ -8,12 +8,141 @@ title: Problem Set 3 ...@@ -8,12 +8,141 @@ title: Problem Set 3
Verify that the entropy function satisfies the required properties of continuity, non-negativity, Verify that the entropy function satisfies the required properties of continuity, non-negativity,
monotonicity, and independence. monotonicity, and independence.
I will prove these properties for the discrete case.
First, I want to show that $$\lim_{x \to 0^+} x \log_b x = 0$$ for any $$b > 1$$, since otherwise
entropy is not defined for distributions that assign zero probability to any outcome. This can be
done using [L'Hôpital's Rule](https://en.wikipedia.org/wiki/L%27H%C3%B4pital%27s_rule).
$$
\begin{align*}
\lim_{x \to 0^+} x \log_b x &= \lim_{x \to 0^+} \frac{\log_b x}{x^{-1}} \\
&= \lim_{x \to 0^+} \frac{\frac{d}{d x} \log_b x}{\frac{d}{d x} x^{-1}} \\
&= \lim_{x \to 0^+} \left( \frac{1}{x \ln b} \right) \left( \frac{-1}{x^{-2}} \right) \\
&= \lim_{x \to 0^+} \frac{-x}{\ln b} \\
&= 0
\end{align*}
$$
### Continuity
To talk about the continuity of entropy, we need to define a topology for probability distributions.
Let $$\Omega$$ be a set of finite cardinality $$n$$. The space of probability distributions over
$$\Omega$$ can be viewed as the set of vectors in $$\mathbb{R}_{\geq 0}^n$$ with $$L^1$$ norm 1. In
this way entropy is a function that maps a subset of $$\mathbb{R}_{\geq 0}^n$$ to $$\mathbb{R}$$. So
I will prove the continuity of entropy with respect to the topologies of $$\mathbb{R}_{\geq 0}^n$$
and $$\mathbb{R}$$.
First let's show that $$x \log x$$ is continuous. I take as given that $$\log(x)$$ is a continuous
function on its domain. Then $$x \log(x)$$ is also continuous, since finite products of continuous
functions are continuous. This suffices for $$x > 0$$. At zero, $$x \log x$$ is continuous because
we have defined it to be equal to the limit we found above.
Thus each term of the entropy function is a continuous function from $$\mathbb{R}$$ to
$$\mathbb{R}$$. This suffices to show that negative entropy is continuous, based on the lemma below.
Thus entropy is continuous, since negation is a continuous function, and finite compositions of
continuous functions are continuous.
The necessary lemma is easy to prove, if symbol heavy. I will use the
[$$L^1$$ norm](https://en.wikipedia.org/wiki/Norm_(mathematics)#p-norm), but this is without loss of
generality because all norms on a finite dimensional vector space induce the same topology. Let
$$f : \mathbb{R}^n \to \mathbb{R}$$ and $$g : \mathbb{R} \to \mathbb{R}$$ be continuous functions,
and define $$h : \mathbb{R}^{n + 1} \to \mathbb{R}$$ as
$$h(x_1, \ldots, x_{n + 1}) = f(x_1, \ldots, x_n) + g(x_{n + 1})$$
Fix any $$x = (x_1, \ldots, x_{n + 1}) \in \mathbb{R}^{n + 1}$$, and any $$\epsilon > 0$$. Since
$$f$$ is continuous, there exists some positive $$\delta_f$$ such that for any $$y \in
\mathbb{R}^n$$, $$\lVert (x_1, \ldots, x_n) - (y_1, \ldots, y_n) \rVert < \delta_f$$ implies
$$\lVert f(x_1, \ldots, x_n) - f(y_1, \ldots, y_n) \rVert < \epsilon / 2$$. For the same reason
there is a similar $$\delta_g$$ for $$g$$. Let $$\delta$$ be the smaller of $$\delta_f$$ and
$$\delta_g$$. Now fix any $$y \in \mathbb{R}^{n + 1}$$ such that $$\lVert x - y \rVert < \delta$$.
Note that
$$
\begin{align*}
\lVert (x_1, \ldots, x_n) - (y_1, \ldots, y_n) \rVert &= \sum_{i = 1}^n \lVert x_i - y_i \rVert \\
&\leq \sum_{i = 1}^{n + 1} \lVert x_i - y_i \rVert \\
&< \delta_f
\end{align*}
$$
and similarly for the projections of $$x$$ and $$y$$ along the $$n + 1$$st dimension. Thus
$$
\begin{align*}
\lVert h(x) - h(y) \rVert
&= \lVert f(x_1, \ldots, x_n) + g(x_{n + 1}) - f(y_1, \ldots, y_n) + g(y_{n + 1}) \rVert \\
&\leq \lVert f(x_1, \ldots, x_n) - f(y_1, \ldots, y_n) \rVert +
\lVert g(x_{n + 1}) + g(y_{n + 1}) \rVert \\
&< \frac{\epsilon}{2} + \frac{\epsilon}{2} \\
&= \epsilon
\end{align*}
$$
It follows that $$h$$ is continuous.
### Non-negativity
The probability of each individual outcome must be between zero and one. Thus $$-p_i \log p_i \geq
0$$ for all $$i$$. Since $$x \log x$$ is only equal to zero when $$x$$ is zero or one, the entropy
can only be zero when a single outcome has probability one.
### Monotonicity
Note that $$\partial/\partial p_i H(p) = -\log(p_i) - 1$$ for any $$i$$. This is a strictly
decreasing function, so entropy is strictly concave on all of $$\mathbb{R}_{\geq 0}^n$$. The
constraint that $$\sum p_i$$ is one is linear, so entropy is concave on this subset of
$$\mathbb{R}_{\geq 0}^n$$ as well. Thus there is a unique global maximum.
We can locate it using a [Lagrange multiplier](https://en.wikipedia.org/wiki/Lagrange_multiplier).
Our Lagrange function is
$$
-\sum_{i = 1}^n p_i \log p_i + \lambda \left( \sum_{i = 1}^n p_i - 1 \right)
$$
The partial derivative with respect to any $$p_i$$ is $$-\log p_i - 1 + \lambda$$. Since this
depends only on $$\lambda$$, it implies that all the $$p_i$$ must be the same. Taking our constraint
into account this means there's only one possibility: $$p_i = 1/n$$ for all $$i$$.
Call this distribution $$p_*$$. Its entropy is $$-\sum_{i = 1}^n 1/n \log 1/n = -\log 1/n$$. Thus
$$
H(p) \leq H(p_*) = - \log \frac{1}{n}
$$
for all probability distributions $$p$$ over $$n$$ outcomes. Equality is only achieved for $$p_*$$
itself, by the strict concavity of entropy. Note that $$H(p_*)$$ grows without bound as $$n$$
increases.
### Independence
$$
\begin{align*}
H(p, q) &= -\sum_{i = 1}^n \sum_{j = 1}^m p_i q_j \log(p_i q_j) \\
&= -\sum_{i = 1}^n \sum_{j = 1}^m p_i q_j \left( \log p_i + \log q_j \right) \\
&= -\sum_{i = 1}^n \sum_{j = 1}^m p_i q_j \log p_i -
\sum_{i = 1}^n \sum_{j = 1}^m p_i q_j \log q_j \\
&= -\sum_{i = 1}^n p_i \log p_i \sum_{j = 1}^m q_j -
\sum_{j = 1}^m q_j \log q_j \sum_{i = 1}^n p_i \\
&= -\sum_{i = 1}^n p_i \log p_i - \sum_{j = 1}^m q_j \log q_j \\
&= H(p) + H(q)
\end{align*}
$$
## (4.2) ## (4.2)
{:.question} {:.question}
Prove the relationships in Equation (4.10). Prove the relationships in Equation (4.10).
$$
\begin{align*}
\end{align*}
$$
## (4.3) ## (4.3)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment