Time-Frequency Networks for Audio Super-Resolution

Teck-Yian Lim*, Raymond A. Yeh*, Yijia Xu, Minh N. Do, Mark Hasegawa-Johnson

Accepted at ICASSP 2018

Abstract

Audio super-resolution (a.k.a. bandwidth extension) is the challenging task of increasing the temporal resolution of audio signals. Recent deep networks approaches achieved promising results by modeling the task as a regression problem in either time or frequency domain. In this paper, we introduced Time-Frequency Network (TFNet), a deep network that utilizes supervision in both the time and frequency domain. We proposed a novel model architecture which allows the two domains to be jointly optimized. Results demonstrate that our method outperforms the state-of-the-art both quantitatively and qualitatively.

Implementation Details

ICASSP 2018 Poster(Sigport alternate link)

ICASSP 2018 Paper

Code

Github

Sampling of Results

This is a sampling of 4x bandwidth expansion using our model. Similar to Kuleshov et. al., we trained our model on 99 speakers of the VCTK dataset. Speakers in this set were not seen by the network during training.
As compared to the time domain only method by Kuleshov et. al., our model appears to have fewer artifacts in form of occasional pops and squeaks in the higher frequency ranges.

The incidents are not believed to be linked. (Speaker p360, Utterance 059)

High Resolution

Low Resolution

Kuleshov et. al.(*)

Ours

One is investment, one is reform (Speaker p362, Utterance 087)

High Resolution

Low Resolution

Kuleshov et. al.(*)

Ours

The difference in the rainbow depends... (Speaker p347, Utterance 021)

High Resolution

Low Resolution

Kuleshov et. al.(*)

Ours

* The audio samples for Kuleshov et. al. were obtained from the author's website and volume was normalized to match ours.

We plan to share more results together with our implementation at a later date.

References

Kuleshov et. al, Audio Super Resolution with Neural Networks