Audio super-resolution (a.k.a. bandwidth extension) is the challenging task of increasing the temporal resolution of audio signals. Recent deep networks approaches achieved promising results by modeling the task as a regression problem in either time or frequency domain. In this paper, we introduced Time-Frequency Network (TFNet), a deep network that utilizes supervision in both the time and frequency domain. We proposed a novel model architecture which allows the two domains to be jointly optimized. Results demonstrate that our method outperforms the state-of-the-art both quantitatively and qualitatively.
ICASSP 2018 Poster(Sigport alternate link)
This is a sampling of 4x bandwidth expansion using our model. Similar to Kuleshov et. al., we trained our model on 99 speakers of the VCTK dataset. Speakers in this set were not seen by the network during training.
As compared to the time domain only method by Kuleshov et. al., our model appears to have fewer artifacts in form of occasional pops and squeaks in the higher frequency ranges.
High Resolution
Low Resolution
Kuleshov et. al.(*)
Ours
High Resolution
Low Resolution
Kuleshov et. al.(*)
Ours
High Resolution
Low Resolution
Kuleshov et. al.(*)
Ours
* The audio samples for Kuleshov et. al. were obtained from the author's website and volume was normalized to match ours.
We plan to share more results together with our implementation at a later date.
Kuleshov et. al, Audio Super Resolution with Neural Networks