Efficient bandwidth extension of musical signals using a differentiable harmonic plus noise model

Pierre-Amaury Grumiaux, Mathieu Lagrange

Contact: mathieu dot lagrange at ls2n dot fr

Abstract

The task of bandwidth extension addresses the generation of missing high frequencies of audio signals based on knowledge of the low-frequency part of the sound. This task applies to various problems, such as audio coding or audio restoration. In this article, we focus on bandwidth extension of monophonic and polyphonic musical signals using a differentiable digital signal processing (DDSP) model. Such a model is composed of a neural network part trained to infer the parameters of a differentiable digital signal processing model, which efficiently generates the output full-band audio signal.
We first address bandwidth extension of monophonic signals, and then propose two methods to handle polyphonic signals. The benefits of the proposed models are shown on monophonic and polyphonic synthetic data against a baseline and a deep-learning-based state-of-the-art model. The models are then evaluated on real data, in both monophonic and polyphonic context, and for a wide variety of instruments and musical genres. We show that all proposed models surpass the baseline and the state-of-the-art model for an objective metric computed in the frequency domain. A MUSHRA listening test confirms the superiority of the proposed approach.

Experimental code is available at this github repo.

Sound examples

The following samples allow you to listen to the high frequency reconstruction of the different proposed models and baselines. The low-band input signal and ground-truth signals are given, as well as the output of the considered baseline models and DDSP-based models.

Monophonic samples

The references are taken from the Medley-solos-db dataset

Instrument	Input signal	Ground-truth signal	SBR	Resnet	DDSP-noise	DDSP-mono-dec	DDSP-mono-dec-cyclic	DDSP-poly-dec
Clarinet
Distorted electric guitar
Female singer
Flute
Piano
Tenor saxophone
Trumpet
Violin

Polyphonic samples

The references are taken from the GTZAN dataset

Genre	Input signal	Ground-truth signal	SBR	Resnet	DDSP-noise	DDSP-mono-dec	DDSP-mono-dec-cyclic	DDSP-poly-dec
Blues
Classical
Country
Disco
Hip-hop
Jazz
Metal
Pop
Reggae
Rock