# Paper of the Day (Po'D): Recovering Sparse Speech Excitations

| 4 Comments
Hello, and welcome to the Paper of the Day (Po'D): Signal Processing Edition. Today's paper comes from more of our colleagues: Daniele Giacobello, Mads Christensen, Manohar N. Murthi,  Søren Holdt Jensen, and Marc Moonen: Retrieving Sparse Patterns Using a Compressed Sensing Framework: Applications to Speech Coding Based on Sparse Linear Prediction, IEEE Sig. Proc. Letts., vol. 17, no. 1, pp. 103-106, Jan. 2010.

The usual approach to linear prediction in speech coding is to minimize the squared norm of the error signal given a predictor order $$P$$, i.e., $$\mathbf{a}_\textrm{2opt} = \arg \min_{\mathbf{a} \in \mathbf{R}^P} || \mathbf{x} - \mathbf{X}\mathbf{a} ||_2$$ where $$\mathbf{X}$$ is a matrix of rows composed of elements of the signal $$\mathbf{x}$$. Though this does guarantee that the error will be distributed more uniformly than the original signal, it does not guarantee that the error will be sparse, i.e., that the number of zero-valued samples in $$\mathbf{r} = \mathbf{x} - \mathbf{X}\mathbf{a}_\textrm{2opt}$$ will be small. The authors propose instead to find a predictor that will recover the "sparse patterns" of speech, i.e., the glottal impulses in voiced speech. Thus, instead of minimizing the $$\ell_2$$-norm of the residual, they minimize the $$\ell_1$$-norm, which has been shown capable of recovering sparse solutions for underdetermined systems: $$\mathbf{a}_\textrm{1opt} = \arg \min_{\mathbf{a} \in \MR^P} || \mathbf{x} - \mathbf{X}\mathbf{a} ||_1.$$ We can return to an approximation of the original signal with either of these weights by filtering the error signal, i.e., $$\hat{\mathbf{x}} = \mathbf{H} (\mathbf{x} - \mathbf{X}\mathbf{a}_\textrm{1opt}) = \mathbf{H} \mathbf{r}$$ where $$\mathbf{H}$$ is the $$N\times N$$ synthesis matrix with rows constructed of the impulse response of the all-pole filter given by $$\mathbf{a}_\textrm{1opt}$$ (and truncated to length-$$N$$). We can see in this form that if we consider $$\mathbf{r}$$ to be sparse, and $$\mathbf{H}$$ to be a dictionary, then we have succeeded in representing the non-sparse signal $$\mathbf{x}$$ in a sparse way! However, we do not just want sparsity in the solution, but we want to guarantee the error will be less than some maximum $$\epsilon > 0$$, i.e., we want the error signal to satisfy $$\mathbf{r}_\textrm{best} = \arg \min_{\mathbf{r}\in\MR^N} || \mathbf{r} ||_1 \; \textrm{subject to} \; || \mathbf{x} - \mathbf{H} \mathbf{r} ||_2^2 \le \epsilon.$$ Using an $$\ell_1$$-norm minimization of the residual for speech coding has been proposed before and revisited numerous times. However, what the authors do here is bring it to a compressed sensing framework. Given that we assume our error signal $$\mathbf{r}$$ has some sparsity, we can construct a $$M \times N$$ measurement matrix $$\mathbf{\Phi}$$ with $$M \ll N$$ such that the equation above is solvable with fewer constraints. The best solution is then given with high probability by $$\mathbf{r}_\textrm{best} = \arg \min_{\mathbf{r}\in\MR^N} || \mathbf{r} ||_1 \; \textrm{subject to} \; || \mathbf{\Phi}\mathbf{x} - \mathbf{\Phi}\mathbf{H} \mathbf{r} ||_2^2 \le \epsilon.$$ Now the problem is reduced to one with $$M \ll N$$ constraints. There are, of course, caveats. We must assume the sparsity $$K$$ of the error signal; however, due to the limitations of the human voice, we can reasonably gauge this given $$N$$ samples. Second, based on this sparsity, or measurement matrix should have $$M = O(K \log N)$$ to give a high probability the best solution $$\mathbf{r}_\textrm{best}$$ will be found. Their experimental results indicate that for voiced speech they are not only recovering sparse solutions with signal errors close to an optimal method, but they are doing so 1000 times quicker! Good work fellows.

## 4 Comments

Bob,

Do you guys have an RSS feed ?

Cheers,

Igor.

Bob,

In case MT does not provide an easy way, you can always use the Google/Feedburner solution.

Cheers,

Igor.

CRISSP is a research group in ADMT at Aalborg University Copenhagen (AAU-KBH), Denmark.

Authors: Bob L. Sturm Sofia Dahl Stefania Serafin

### Archives ↑ Grab this Headline Animator

### About this Entry

This page contains a single entry by Bob L. Sturm published on 12.03.2010 14:50.

Equations! was the previous entry in this blog.

Paper of the Day (Po'D): Performance of Greedy Algorithms is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.