We deal with the problem of sparsity-based audio inpainting, i.e. filling in the missing segments of audio. A consequence of the approaches based on mathematical optimization is the insufficient amplitude of the signal in the filled gaps. Remaining in the framework based on sparsity and convex optimization, we propose improvements to audio inpainting, aiming at compensating for such an energy loss. The new ideas are based on different types of weighting, both in the coefficient and the time domains. We show that our propositions improve the inpainting performance in terms of both the SNR and ODG.
Preprint version of the paper is available here.
Following the idea of reproducible research, we make all the implementations freely available at the
GitHub repository.
Please note that LTFAT toolbox (version>=2.4.0, available here) must be installed and loaded in order to run the scripts and reproduce the data.
The following table contains abbreviations and full names of the algorithms used in the overall evaluation. Last four were included as the candidates for the current state of the art.
Abbreviation | Full name |
---|---|
ℓ1 DR | Simple synthesis model, ℓ1 minimization using Douglas–Rachford |
ℓ1 CP | Simple analysis model, ℓ1 minimization using Chambolle–Pock |
ℓ1 wDR | Weighted synthesis model, ℓ1 minimization using Douglas–Rachford |
ℓ1 wCP | Weighted analysis model, ℓ1 minimization using Chambolle–Pock |
ℓ1 reDR | Iteratively reweighted synthesis model, repeatedly using Douglas–Rachford |
ℓ1 reCP | Iteratively reweighted analysis model, repeatedly using Chambolle–Pock |
TDC | Time-domain compensation approach |
S-SPAIN_H | Synthesis Sparse Audio Inpainter (with hard thesholding) |
A-SPAIN | Analysis Sparse Audio Inpainter |
OMP | Orthogonal Matching Pursuit |
Janssen | Janssen autoregressive method |
The audio database used for the evaluation consists of 10 musical excerpts in mono, sampled at 44.1 kHz, with an approximate length of 7 seconds. They were extracted from the EBU SQAM database. The excerpts were thoroughly selected to cover a wide range of audio signal characteristics. Since a significant number of methods is based on signal sparsity, the selection took care about including different levels of sparsity in the signals (w.r.t. the Gabor transform).
Playback can be started by clicking on one of the table cells (the cells turn light blue when the cursor hovers over them). Your browser must support a HTML5 audio player. Alternativelly, the file path is shown just below the player and the file can be downloaded by Save Link As ...
Select gap length:
5 ms
10 ms
15 ms
20 ms
25 ms
30 ms
35 ms
40 ms
45 ms
50 ms
Select evaluation metric:
None
SNR
PEMO-Q ODG
01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 10 | |
Original | × | × | × | × | × | × | × | × | × | × |
With gaps | × | × | × | × | × | × | × | × | × | × |
ℓ1 DR | × | × | × | × | × | × | × | × | × | × |
ℓ1 CP | × | × | × | × | × | × | × | × | × | × |
ℓ1 wDR | × | × | × | × | × | × | × | × | × | × |
ℓ1 wCP | × | × | × | × | × | × | × | × | × | × |
ℓ1 reDR | × | × | × | × | × | × | × | × | × | × |
ℓ1 reCP | × | × | × | × | × | × | × | × | × | × |
TDC | × | × | × | × | × | × | × | × | × | × |
S-SPAIN_H | × | × | × | × | × | × | × | × | × | × |
A-SPAIN | × | × | × | × | × | × | × | × | × | × |
OMP | × | × | × | × | × | × | × | × | × | × |
Janssen | × | × | × | × | × | × | × | × | × | × |