Imagine that you have a sparsely sampled signal (with many missing measurements), and you want to recover a densely sampled version of the same signal.
If you can find a function that is capable of transforming the original signal into a representation that is also sparse (with many zero or trivially non-zero coefficients), then you have a good heuristic that you can use to find an approximation of the original signal:- maximising sparsity in the transform domain. This is really just Occam's razor - if you have good reason to suppose that the data can be fit by a simple model, then choosing the simplest model that fits the data is an optimal guessing strategy.
Good examples of such functions include the Fourier transform or various wavelet transforms - indeed, any parametric approximation or model would probably do the job.
If the transform domain really is sparse, and each coefficient in the transform domain impacts multiple measurements in the original domain, then (with high probability) you will be able to guess the "right" coefficients in the transform domain by simply
picking the set of coefficients in the transform domain which both reproduces the original signal and minimizes the number of non-zero components in the transform domain.
However, choosing the most parsimonious model is only one kind of regularization that we can (and should) deploy. We should also think about the set of coefficients that represents the most physically and statistically plausible model.
Thus, when choosing our approach to regularization, we need to consider three things:
* Parsimony (As represented by sparsity, approximated by the L0 and L1 norms).
* Physical plausibility (A transform domain closely related to a physical model is needed here).
* Statistical plausibility (A transform domain where coefficients are as independent as possible is needed here - to reduce the number and dimensionality of the joint distributions that we need to calculate).
These three requirements are not necessarily harmonious. Does that mean we need three different transforms? How do we reconcile differences when they crop up?