Audio Inpainting: Revisited and Reweighted

Ondřej Mokrý, Pavel Rajmic

Abstract

We deal with the problem of sparsity-based audio inpainting, i.e. filling in the missing segments of audio. A consequence of the approaches based on mathematical optimization is the insufficient amplitude of the signal in the filled gaps. Remaining in the framework based on sparsity and convex optimization, we propose improvements to audio inpainting, aiming at compensating for such an energy loss. The new ideas are based on different types of weighting, both in the coefficient and the time domains. We show that our propositions improve the inpainting performance in terms of both the SNR and ODG.

Reproducible Research

Preprint version of the paper is available here.
Following the idea of reproducible research, we make all the implementations freely available at the GitHub repository.
Please note that LTFAT toolbox (version>=2.4.0, available here) must be installed and loaded in order to run the scripts and reproduce the data.

Algorithms

The following table contains abbreviations and full names of the algorithms used in the overall evaluation. Last four were included as the candidates for the current state of the art.

Abbreviation Full name
1 DR Simple synthesis model, ℓ1 minimization using Douglas–Rachford
1 CP Simple analysis model, ℓ1 minimization using Chambolle–Pock
1 wDR Weighted synthesis model, ℓ1 minimization using Douglas–Rachford
1 wCP Weighted analysis model, ℓ1 minimization using Chambolle–Pock
1 reDR Iteratively reweighted synthesis model, repeatedly using Douglas–Rachford
1 reCP Iteratively reweighted analysis model, repeatedly using Chambolle–Pock
TDC Time-domain compensation approach
S-SPAIN_H Synthesis Sparse Audio Inpainter (with hard thesholding)
A-SPAIN Analysis Sparse Audio Inpainter
OMP Orthogonal Matching Pursuit
Janssen Janssen autoregressive method

Audio Excerpts

The audio database used for the evaluation consists of 10 musical excerpts in mono, sampled at 44.1 kHz, with an approximate length of 7 seconds. They were extracted from the EBU SQAM database. The excerpts were thoroughly selected to cover a wide range of audio signal characteristics. Since a significant number of methods is based on signal sparsity, the selection took care about including different levels of sparsity in the signals (w.r.t. the Gabor transform).

01. violin
02. clarinet
03. bassoon
04. harp
05. glockenspiel
06. celesta
07. accordion
08. guitar
09. piano
10. wind ensemble

Playback can be started by clicking on one of the table cells (the cells turn light blue when the cursor hovers over them). Your browser must support a HTML5 audio player. Alternativelly, the file path is shown just below the player and the file can be downloaded by Save Link As ...

Select gap length: 5 ms 10 ms 15 ms 20 ms 25 ms 30 ms 35 ms 40 ms 45 ms 50 ms
Select evaluation metric: None SNR PEMO-Q ODG


Loaded file: None
01 02 03 04 05 06 07 08 09 10
Original × × × × × × × × × ×
With gaps × × × × × × × × × ×
1 DR × × × × × × × × × ×
1 CP × × × × × × × × × ×
1 wDR × × × × × × × × × ×
1 wCP × × × × × × × × × ×
1 reDR × × × × × × × × × ×
1 reCP × × × × × × × × × ×
TDC × × × × × × × × × ×
S-SPAIN_H × × × × × × × × × ×
A-SPAIN × × × × × × × × × ×
OMP × × × × × × × × × ×
Janssen × × × × × × × × × ×

Results

The following figures present the final evaluation of several proposed methods. Conventional methods are included as well. The comparison is done in terms of two objective metrics — SNR (Signal to Noise Ratio) and PEMO-Q ODG (Objective Difference Grade).

Average SNR results.
Average PEMO-Q ODG results.