Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
P
pit
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Erik Strand
pit
Commits
d5573e2a
Commit
d5573e2a
authored
Feb 23, 2019
by
Erik Strand
Browse files
Options
Downloads
Patches
Plain Diff
Add answers for 4.1
parent
5b229107
Branches
Branches containing commit
No related tags found
No related merge requests found
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
_psets/3.md
+129
-0
129 additions, 0 deletions
_psets/3.md
with
129 additions
and
0 deletions
_psets/3.md
+
129
−
0
View file @
d5573e2a
...
@@ -8,12 +8,141 @@ title: Problem Set 3
...
@@ -8,12 +8,141 @@ title: Problem Set 3
Verify that the entropy function satisfies the required properties of continuity, non-negativity,
Verify that the entropy function satisfies the required properties of continuity, non-negativity,
monotonicity, and independence.
monotonicity, and independence.
I will prove these properties for the discrete case.
First, I want to show that $$
\l
im_{x
\t
o 0^+} x
\l
og_b x = 0$$ for any $$b > 1$$, since otherwise
entropy is not defined for distributions that assign zero probability to any outcome. This can be
done using
[
L'Hôpital's Rule
](
https://en.wikipedia.org/wiki/L%27H%C3%B4pital%27s_rule
)
.
$$
\b
egin{align
*
}
\l
im_{x
\t
o 0^+} x
\l
og_b x &=
\l
im_{x
\t
o 0^+}
\f
rac{
\l
og_b x}{x^{-1}}
\\
&=
\l
im_{x
\t
o 0^+}
\f
rac{
\f
rac{d}{d x}
\l
og_b x}{
\f
rac{d}{d x} x^{-1}}
\\
&=
\l
im_{x
\t
o 0^+}
\l
eft(
\f
rac{1}{x
\l
n b}
\r
ight)
\l
eft(
\f
rac{-1}{x^{-2}}
\r
ight)
\\
&=
\l
im_{x
\t
o 0^+}
\f
rac{-x}{
\l
n b}
\\
&= 0
\e
nd{align
*
}
$$
### Continuity
To talk about the continuity of entropy, we need to define a topology for probability distributions.
Let $$
\O
mega$$ be a set of finite cardinality $$n$$. The space of probability distributions over
$$
\O
mega$$ can be viewed as the set of vectors in $$
\m
athbb{R}_{
\g
eq 0}^n$$ with $$L^1$$ norm 1. In
this way entropy is a function that maps a subset of $$
\m
athbb{R}_{
\g
eq 0}^n$$ to $$
\m
athbb{R}$$. So
I will prove the continuity of entropy with respect to the topologies of $$
\m
athbb{R}_{
\g
eq 0}^n$$
and $$
\m
athbb{R}$$.
First let's show that $$x
\l
og x$$ is continuous. I take as given that $$
\l
og(x)$$ is a continuous
function on its domain. Then $$x
\l
og(x)$$ is also continuous, since finite products of continuous
functions are continuous. This suffices for $$x > 0$$. At zero, $$x
\l
og x$$ is continuous because
we have defined it to be equal to the limit we found above.
Thus each term of the entropy function is a continuous function from $$
\m
athbb{R}$$ to
$$
\m
athbb{R}$$. This suffices to show that negative entropy is continuous, based on the lemma below.
Thus entropy is continuous, since negation is a continuous function, and finite compositions of
continuous functions are continuous.
The necessary lemma is easy to prove, if symbol heavy. I will use the
[
$$L^1$$ norm
](
https://en.wikipedia.org/wiki/Norm_(mathematics
)
#p-norm), but this is without loss of
generality because all norms on a finite dimensional vector space induce the same topology. Let
$$f :
\m
athbb{R}^n
\t
o
\m
athbb{R}$$ and $$g :
\m
athbb{R}
\t
o
\m
athbb{R}$$ be continuous functions,
and define $$h :
\m
athbb{R}^{n + 1}
\t
o
\m
athbb{R}$$ as
$$h(x_1,
\l
dots, x_{n + 1}) = f(x_1,
\l
dots, x_n) + g(x_{n + 1})$$
Fix any $$x = (x_1,
\l
dots, x_{n + 1})
\i
n
\m
athbb{R}^{n + 1}$$, and any $$
\e
psilon > 0$$. Since
$$f$$ is continuous, there exists some positive $$
\d
elta_f$$ such that for any $$y
\i
n
\m
athbb{R}^n$$, $$
\l
Vert (x_1,
\l
dots, x_n) - (y_1,
\l
dots, y_n)
\r
Vert <
\d
elta_f$$ implies
$$
\l
Vert f(x_1,
\l
dots, x_n) - f(y_1,
\l
dots, y_n)
\r
Vert <
\e
psilon / 2$$. For the same reason
there is a similar $$
\d
elta_g$$ for $$g$$. Let $$
\d
elta$$ be the smaller of $$
\d
elta_f$$ and
$$
\d
elta_g$$. Now fix any $$y
\i
n
\m
athbb{R}^{n + 1}$$ such that $$
\l
Vert x - y
\r
Vert <
\d
elta$$.
Note that
$$
\b
egin{align
*
}
\l
Vert (x_1,
\l
dots, x_n) - (y_1,
\l
dots, y_n)
\r
Vert &=
\s
um_{i = 1}^n
\l
Vert x_i - y_i
\r
Vert
\\
&
\l
eq
\s
um_{i = 1}^{n + 1}
\l
Vert x_i - y_i
\r
Vert
\\
&<
\d
elta_f
\e
nd{align
*
}
$$
and similarly for the projections of $$x$$ and $$y$$ along the $$n + 1$$st dimension. Thus
$$
\b
egin{align
*
}
\l
Vert h(x) - h(y)
\r
Vert
&=
\l
Vert f(x_1,
\l
dots, x_n) + g(x_{n + 1}) - f(y_1,
\l
dots, y_n) + g(y_{n + 1})
\r
Vert
\\
&
\l
eq
\l
Vert f(x_1,
\l
dots, x_n) - f(y_1,
\l
dots, y_n)
\r
Vert +
\l
Vert g(x_{n + 1}) + g(y_{n + 1})
\r
Vert
\\
&<
\f
rac{
\e
psilon}{2} +
\f
rac{
\e
psilon}{2}
\\
&=
\e
psilon
\e
nd{align
*
}
$$
It follows that $$h$$ is continuous.
### Non-negativity
The probability of each individual outcome must be between zero and one. Thus $$-p_i
\l
og p_i
\g
eq
0$$ for all $$i$$. Since $$x
\l
og x$$ is only equal to zero when $$x$$ is zero or one, the entropy
can only be zero when a single outcome has probability one.
### Monotonicity
Note that $$
\p
artial/
\p
artial p_i H(p) = -
\l
og(p_i) - 1$$ for any $$i$$. This is a strictly
decreasing function, so entropy is strictly concave on all of $$
\m
athbb{R}_{
\g
eq 0}^n$$. The
constraint that $$
\s
um p_i$$ is one is linear, so entropy is concave on this subset of
$$
\m
athbb{R}_{
\g
eq 0}^n$$ as well. Thus there is a unique global maximum.
We can locate it using a
[
Lagrange multiplier
](
https://en.wikipedia.org/wiki/Lagrange_multiplier
)
.
Our Lagrange function is
$$
-
\s
um_{i = 1}^n p_i
\l
og p_i +
\l
ambda
\l
eft(
\s
um_{i = 1}^n p_i - 1
\r
ight)
$$
The partial derivative with respect to any $$p_i$$ is $$-
\l
og p_i - 1 +
\l
ambda$$. Since this
depends only on $$
\l
ambda$$, it implies that all the $$p_i$$ must be the same. Taking our constraint
into account this means there's only one possibility: $$p_i = 1/n$$ for all $$i$$.
Call this distribution $$p_
*
$$. Its entropy is $$-
\s
um_{i = 1}^n 1/n
\l
og 1/n = -
\l
og 1/n$$. Thus
$$
H(p)
\l
eq H(p_
*
) = -
\l
og
\f
rac{1}{n}
$$
for all probability distributions $$p$$ over $$n$$ outcomes. Equality is only achieved for $$p_
*
$$
itself, by the strict concavity of entropy. Note that $$H(p_
*
)$$ grows without bound as $$n$$
increases.
### Independence
$$
\b
egin{align
*
}
H(p, q) &= -
\s
um_{i = 1}^n
\s
um_{j = 1}^m p_i q_j
\l
og(p_i q_j)
\\
&= -
\s
um_{i = 1}^n
\s
um_{j = 1}^m p_i q_j
\l
eft(
\l
og p_i +
\l
og q_j
\r
ight)
\\
&= -
\s
um_{i = 1}^n
\s
um_{j = 1}^m p_i q_j
\l
og p_i -
\s
um_{i = 1}^n
\s
um_{j = 1}^m p_i q_j
\l
og q_j
\\
&= -
\s
um_{i = 1}^n p_i
\l
og p_i
\s
um_{j = 1}^m q_j -
\s
um_{j = 1}^m q_j
\l
og q_j
\s
um_{i = 1}^n p_i
\\
&= -
\s
um_{i = 1}^n p_i
\l
og p_i -
\s
um_{j = 1}^m q_j
\l
og q_j
\\
&= H(p) + H(q)
\e
nd{align
*
}
$$
## (4.2)
## (4.2)
{:.question}
{:.question}
Prove the relationships in Equation (4.10).
Prove the relationships in Equation (4.10).
$$
\b
egin{align
*
}
\e
nd{align
*
}
$$
## (4.3)
## (4.3)
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
sign in
to comment