VoCodec:

An Efficient Lightweight Low-Bitrate Neural Speech Codec

[Paper] [Code]

Leyan Yang^1,2, Ronghui Hu^1,2, Yang Xu^1,2, Jing Lu^1,2

¹Key Laboratory of Modern Acoustics, Nanjing University, Nanjing 210093, Jiangsu, China

²NJU-Horizon Intelligent Audio Lab, Horizon Robotics, Beijing 100094, China

Abstract:

Recent advancements in end-to-end neural speech codecs enable compressing audio at extremely low bitrates while maintaining high-fidelity reconstruction. However, low computational complexity and low latency remain crucial for real-time communication. In this paper, we propose VoCodec, a speech codec model featuring a computational complexity of only 349.29 M multiply-accumulate operations per second (MAC/s) and a latency of 30 ms. The proposed method ranked fourth on Track 1 in the 2025 LRAC Challenge and achieved the highest subjective evaluation score (MUSHRA) on the clean speech test set.

This page is for research demonstration purposes only.

Audio Samples

Different Audio Conditions

Clean Speech

Ground Truth 1

VoCodec (6 kbps)

VoCodec (1 kbps)

Ground Truth 2

VoCodec (6 kbps)

VoCodec (1 kbps)

Ground Truth 3

VoCodec (6 kbps)

VoCodec (1 kbps)

Noisy Speech

Ground Truth 1

VoCodec (6 kbps)

VoCodec (1 kbps)

Ground Truth 2

VoCodec (6 kbps)

VoCodec (1 kbps)

Reverberant Speech

Ground Truth 1

VoCodec (6 kbps)

VoCodec (1 kbps)

Ground Truth 2

VoCodec (6 kbps)

VoCodec (1 kbps)