Answer 15.7

bc48ba10 · Erik Strand · 034fe1b5 · bc48ba10 · bc48ba10 · bc48ba10
Commit bc48ba10 authored 6 years ago by Erik Strand
--- a/_psets/12.md
+++ b/_psets/12.md
@@ -222,12 +222,20 @@ the convolutional encoder in Figure 15.20, what data were transmitted?
 {:.question}
 This problem is harder than the others.

+My code for all sections of this problem is
+[here](https://gitlab.cba.mit.edu/erik/compressed_sensing). I wrote it in C++ using
+[Eigen](http://eigen.tuxfamily.org) for vector and matrix operations.
+
 ### (a)

 {:.question}
 Generate and plot a periodically sampled time series {$$t_j$$} of N points for the sum of two sine
 waves at 697 and 1209 Hz, which is the DTMF tone for the number 1 key.

+Here's a plot of 250 samples taken over one tenth of a second.
+
+![samples](../assets/img/pset12_fig_a.png)
+
 ### (b)

 {:.question}
@@ -244,18 +252,28 @@ D_{ij} =
 \end{align*}
 $$

+![samples](../assets/img/pset12_fig_b.png)
+
 ### (c)

 {:.question}
 Plot the inverse transform of the {$$f_i$$} by multiplying them by the inverse of the DCT matrix
 (which is equal to its transpose) and verify that it matches the time series.

+The original samples are recovered.
+
+![samples](../assets/img/pset12_fig_c.png)
+
 ### (d)

 {:.question}
 Randomly sample and plot a subset of M points {$$t^\prime_k$$} of the {$$t_j$$}; you’ll later
 investigate the dependence on the sample size.

+Here I've selected 100 samples from the original 250. The plot is recognizable but very distorted.
+
+![samples](../assets/img/pset12_fig_d.png)
+
 ### (e)

 {:.question}
@@ -270,6 +288,19 @@ $$
 {:.question}
 and plot the resulting estimated coefficients.

+Gradient descent very quickly drives the loss function to zero. However it's not reconstructing the
+true DCT coefficients.
+
+![samples](../assets/img/pset12_fig_e.png)
+
+To make sure I don't have a bug in my code, I plotted the samples we get by performing the inverse
+DCT on the estimated coefficients.
+
+![samples](../assets/img/pset12_fig_e_2.png)
+
+Sure enough all samples in the subset are matched exactly. But the others are way off the mark.
+We've added a lot of high frequency content, and are obviously overfitting.
+
 ### (f)

 {:.question}
@@ -287,6 +318,20 @@ $$
 {:.question}
 and plot the resulting estimated coefficients.

+With L2 regularization, we remove some of the high frequency content. This makes the real peaks a
+little more prominent.
+
+![samples](../assets/img/pset12_fig_f.png)
+
+However it comes at a cost: gradient descent no longer drive the loss to zero. As such the loss
+itself isn't a good termination condition. In its place I terminate when the squared norm of the
+gradient is less than $$\num{1e-6}$$. The final loss for the coefficients in the plot above is
+around 50.
+
+You can easily see that the loss is nonzero from the reconstructed samples.
+
+![samples](../assets/img/pset12_fig_f_2.png)
+
 ### (g)

 {:.question}
@@ -301,3 +346,33 @@ $$
 {:.question}
 Plot the resulting estimated coefficients, compare to the L2 norm estimate, and compare the
 dependence of the results on M to the Nyquist sampling limit of twice the highest frequency.
+
+With L1 regularization the DCT coefficients are recovered pretty well. There is no added high
+frequency noise.
+
+![samples](../assets/img/pset12_fig_g.png)
+
+It still can't drive the loss to zero. Additionally it's hard to drive the squared norm of the
+gradient to zero, since the gradient of the absolute values shows up as 1 or -1. (Though to help
+prevent oscillation I actually drop this contribution if the absolute value of the coefficient in
+question is less than $$\num{1e-3}$$.) So here I terminate when the relative change in the loss
+falls below $$\num{1e-9}$$.
+
+The final loss is around 40; smaller than we were able to find with L2 regularization. However it
+did take more effort: this version converged after 21,060 iterations, as opposed to 44 (for L2) or
+42 (for unregularized).
+
+The recovered samples are also much more recognizable. The amplitude of our waveform seems overall
+a bit diminished, but unlike our previous attempts it looks similar to the original.
+
+![samples](../assets/img/pset12_fig_g_2.png)
+
+This technique can recover the signal substantially below the Nyquist limit. The highest frequency
+signal is 1209 Hz, so with traditional techniques we'd have to sample at 2418 Hz or faster to avoid
+artifacts. Since I'm only plotting over one hundreth of a second, I thus need at least 242 samples.
+So my original 250 is (not coicidentally) near here. But even with a subset of only 50 samples, the
+L1 regularized gradient descent does an admirable job at recovering the DCT coefficients and
+samples:
+
+![samples](../assets/img/pset12_fig_g_3.png)
+![samples](../assets/img/pset12_fig_g_4.png)
--- a/assets/img/pset12_fig_a.png
+++ b/assets/img/pset12_fig_a.png
--- a/assets/img/pset12_fig_b.png
+++ b/assets/img/pset12_fig_b.png
--- a/assets/img/pset12_fig_c.png
+++ b/assets/img/pset12_fig_c.png
--- a/assets/img/pset12_fig_d.png
+++ b/assets/img/pset12_fig_d.png
--- a/assets/img/pset12_fig_e.png
+++ b/assets/img/pset12_fig_e.png
--- a/assets/img/pset12_fig_e_2.png
+++ b/assets/img/pset12_fig_e_2.png
--- a/assets/img/pset12_fig_f.png
+++ b/assets/img/pset12_fig_f.png
--- a/assets/img/pset12_fig_f_2.png
+++ b/assets/img/pset12_fig_f_2.png
--- a/assets/img/pset12_fig_g.png
+++ b/assets/img/pset12_fig_g.png
--- a/assets/img/pset12_fig_g_2.png
+++ b/assets/img/pset12_fig_g_2.png
--- a/assets/img/pset12_fig_g_3.png
+++ b/assets/img/pset12_fig_g_3.png
--- a/assets/img/pset12_fig_g_4.png
+++ b/assets/img/pset12_fig_g_4.png