VoCodec:

An Efficient Lightweight Low-Bitrate Neural Speech Codec

Leyan Yang1,2, Ronghui Hu1,2, Yang Xu1,2, Jing Lu1,2

1Key Laboratory of Modern Acoustics, Nanjing University, Nanjing 210093, Jiangsu, China

2NJU-Horizon Intelligent Audio Lab, Horizon Robotics, Beijing 100094, China

Abstract:

Recent advancements in end-to-end neural speech codecs enable compressing audio at extremely low bitrates while maintaining high-fidelity reconstruction. However, low computational complexity and low latency remain crucial for real-time communication. In this paper, we propose VoCodec, a speech codec model featuring a computational complexity of only 349.29 M multiply-accumulate operations per second (MAC/s) and a latency of 30 ms. The proposed method ranked fourth on Track 1 in the 2025 LRAC Challenge and achieved the highest subjective evaluation score (MUSHRA) on the clean speech test set.

This page is for research demonstration purposes only.

Audio Samples

Different Audio Conditions

Clean Speech
Ground Truth 1
VoCodec (6 kbps)
VoCodec (1 kbps)
Ground Truth 2
VoCodec (6 kbps)
VoCodec (1 kbps)
Ground Truth 3
VoCodec (6 kbps)
VoCodec (1 kbps)
Noisy Speech
Ground Truth 1
VoCodec (6 kbps)
VoCodec (1 kbps)
Ground Truth 2
VoCodec (6 kbps)
VoCodec (1 kbps)
Reverberant Speech
Ground Truth 1
VoCodec (6 kbps)
VoCodec (1 kbps)
Ground Truth 2
VoCodec (6 kbps)
VoCodec (1 kbps)