Time-Frequency Networks for Audio Super-Resolution
Teck-Yian Lim*, Raymond A. Yeh*, Yijia Xu, Minh N. Do, Mark Hasegawa-Johnson
Accepted at ICASSP 2018
Abstract
Audio super-resolution (a.k.a. bandwidth extension) is the challenging task of increasing the temporal resolution of audio signals. Recent deep networks approaches achieved promising results by modeling the task as a regression problem in either time or frequency domain. In this paper, we introduced Time-Frequency Network (TFNet), a deep network that utilizes supervision in both the time and frequency domain. We proposed a novel model architecture which allows the two domains to be jointly optimized. Results demonstrate that our method outperforms the state-of-the-art both quantitatively and qualitatively.
Implementation Details
ICASSP 2018 Poster(Sigport alternate link)
Code
Sampling of Results
This is a sampling of 4x bandwidth expansion using our model.
Similar to Kuleshov et. al., we trained our model on 99 speakers of the
VCTK dataset. Speakers in this set were not seen by the network during training.
As compared to the time domain only method by Kuleshov et. al., our
model appears to have fewer artifacts in form of occasional pops and squeaks in
the higher frequency ranges.
High Resolution
Low Resolution
Kuleshov et. al.(*)
Ours
High Resolution
Low Resolution
Kuleshov et. al.(*)
Ours
High Resolution
Low Resolution
Kuleshov et. al.(*)
Ours
* The audio samples for Kuleshov et. al. were obtained from the author's website and volume was normalized to match ours.
We plan to share more results together with our implementation at a later date.
References
Kuleshov et. al, Audio Super Resolution with Neural Networks