Demo - Low-complexity Real-time Neural Network for Blind Bandwidth Extension of Wideband Speech

Low-complexity Real-time Neural Network for Blind Bandwidth Extension of Wideband Speech

Esteban Gómez^*†, Mohammad Hassan Vali and Tom Bäckström

^*Department of Information and Communications Engineering, Aalto University, Espoo, Finland
^†Voicemod S.L., Valencia, Spain

{esteban.gomezmellado, mohammad.vali, tom.backstrom}@aalto.fi

bbwexnet-overview

Abstract

Speech is streamed at 16 kHz or lower sample rates in many applications (e.g. VoIP, Bluetooth headsets). Extending its bandwidth can produce significant quality improvements. We introduce BBWEXNet, a lightweight neural network that performs blind bandwidth extension of speech from 16 kHz (wideband) to 48 kHz (fullband) in real-time in CPU. Our low latency approach allows running the model with a maximum algorithmic delay of 16 ms, enabling end-to-end communication in streaming services and scenarios where the GPU is busy or unavailable. We propose a series of optimizations that take advantage of the U-Net architecture and vector quantization methods commonly used in speech coding, to produce a model whose performance is comparable to previous real-time solutions, but approximately halving the memory footprint and computational cost. Moreover, we show that the model complexity can be further reduced with a marginal impact on the perceived output quality.

Bandwidth extension Speech processing Real-time Deep learning

Paper

Click here to download the paper.