Audio Inpainting: Revisited and Reweighted

Abstract

We deal with the problem of sparsity-based audio inpainting, i.e. filling in the missing segments of audio. A consequence of the approaches based on mathematical optimization is the insufficient amplitude of the signal in the filled gaps. Remaining in the framework based on sparsity and convex optimization, we propose improvements to audio inpainting, aiming at compensating for such an energy loss. The new ideas are based on different types of weighting, both in the coefficient and the time domains. We show that our propositions improve the inpainting performance in terms of both the SNR and ODG.

Reproducible Research

Preprint version of the paper is available here.
Following the idea of reproducible research, we make all the implementations freely available at the GitHub repository.
Please note that LTFAT toolbox (version>=2.4.0, available here) must be installed and loaded in order to run the scripts and reproduce the data.

Algorithms

The following table contains abbreviations and full names of the algorithms used in the overall evaluation. Last four were included as the candidates for the current state of the art.

Abbreviation	Full name
ℓ₁ DR	Simple synthesis model, ℓ₁ minimization using Douglas–Rachford
ℓ₁ CP	Simple analysis model, ℓ₁ minimization using Chambolle–Pock
ℓ₁ wDR	Weighted synthesis model, ℓ₁ minimization using Douglas–Rachford
ℓ₁ wCP	Weighted analysis model, ℓ₁ minimization using Chambolle–Pock
ℓ₁ reDR	Iteratively reweighted synthesis model, repeatedly using Douglas–Rachford
ℓ₁ reCP	Iteratively reweighted analysis model, repeatedly using Chambolle–Pock
TDC	Time-domain compensation approach
S-SPAIN_H	Synthesis Sparse Audio Inpainter (with hard thesholding)
A-SPAIN	Analysis Sparse Audio Inpainter
OMP	Orthogonal Matching Pursuit
Janssen	Janssen autoregressive method

Audio Excerpts

The audio database used for the evaluation consists of 10 musical excerpts in mono, sampled at 44.1 kHz, with an approximate length of 7 seconds. They were extracted from the EBU SQAM database. The excerpts were thoroughly selected to cover a wide range of audio signal characteristics. Since a significant number of methods is based on signal sparsity, the selection took care about including different levels of sparsity in the signals (w.r.t. the Gabor transform).

01. violin

02. clarinet

03. bassoon

04. harp

05. glockenspiel

06. celesta

07. accordion

08. guitar

09. piano

10. wind ensemble

Playback can be started by clicking on one of the table cells (the cells turn light blue when the cursor hovers over them). Your browser must support a HTML5 audio player. Alternativelly, the file path is shown just below the player and the file can be downloaded by Save Link As ...

Select gap length: 5 ms 10 ms 15 ms 20 ms 25 ms 30 ms 35 ms 40 ms 45 ms 50 ms
Select evaluation metric: None SNR PEMO-Q ODG

Loaded file: None

	01	02	03	04	05	06	07	08	09	10
Original	×	×	×	×	×	×	×	×	×	×
With gaps	×	×	×	×	×	×	×	×	×	×
ℓ₁ DR	×	×	×	×	×	×	×	×	×	×
ℓ₁ CP	×	×	×	×	×	×	×	×	×	×
ℓ₁ wDR	×	×	×	×	×	×	×	×	×	×
ℓ₁ wCP	×	×	×	×	×	×	×	×	×	×
ℓ₁ reDR	×	×	×	×	×	×	×	×	×	×
ℓ₁ reCP	×	×	×	×	×	×	×	×	×	×
TDC	×	×	×	×	×	×	×	×	×	×
S-SPAIN_H	×	×	×	×	×	×	×	×	×	×
A-SPAIN	×	×	×	×	×	×	×	×	×	×
OMP	×	×	×	×	×	×	×	×	×	×
Janssen	×	×	×	×	×	×	×	×	×	×

Results

The following figures present the final evaluation of several proposed methods. Conventional methods are included as well. The comparison is done in terms of two objective metrics — SNR (Signal to Noise Ratio) and PEMO-Q ODG (Objective Difference Grade).