Time-Frequency Networks for Audio Super-Resolution

Teck-Yian Lim, Raymond A. Yeh, Yijia Xu, Minh N. Do, Mark Hasegawa-Johnson

Accepted at ICASSP 2018

Abstract

Audio super-resolution (a.k.a. bandwidth extension) is the challenging task of increasing the temporal resolution of audio signals. Recent deep networks approaches achieved promising results by modeling the task as a regression problem in either time or frequency domain. In this paper, we introduced Time-Frequency Network (TFNet), a deep network that utilizes supervision in both the time and frequency domain. We proposed a novel model architecture which allows the two domains to be jointly optimized. Results demonstrate that our method outperforms the state-of-the-art both quantitatively and qualitatively.

Implementation Details

ICASSP 2018 Poster(Sigport alternate link)

ICASSP 2018 Paper

Code

Github

Sampling of Results

This is a sampling of 4x bandwidth expansion using our model. Similar to Kuleshov et. al., we trained our model on 99 speakers of the VCTK dataset. Speakers in this set were not seen by the network during training.
As compared to the time domain only method by Kuleshov et. al., our model appears to have fewer artifacts in form of occasional pops and squeaks in the higher frequency ranges.

The incidents are not believed to be linked. (Speaker p360, Utterance 059)