:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mack, Wolfgang, Mustafa, Ahmed, Łaganowski, Rafał, Hijazy, Samer
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Machine Learning
Online Access:	https://arxiv.org/abs/2502.04770
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Low-Resource Audio Codec (LRAC): 2025 Challenge Description
by: Wojcicki, Kamil, et al.
Published: (2025)

SNAC: Multi-Scale Neural Audio Codec
by: Siuzdak, Hubert, et al.
Published: (2024)

RepCodec: A Speech Representation Codec for Speech Tokenization
by: Huang, Zhichao, et al.
Published: (2023)

Latent Granular Resynthesis using Neural Audio Codecs
by: Tokui, Nao, et al.
Published: (2025)

Generating Sample-Based Musical Instruments Using Neural Audio Codec Language Models
by: Nercessian, Shahan, et al.
Published: (2024)

A Closer Look at Neural Codec Resynthesis: Bridging the Gap between Codec and Waveform Generation
by: Liu, Alexander H., et al.
Published: (2024)

Learning Source Disentanglement in Neural Audio Codec
by: Bie, Xiaoyu, et al.
Published: (2024)

Towards Audio Codec-based Speech Separation
by: Yip, Jia Qi, et al.
Published: (2024)

Speech Enhancement Using Continuous Embeddings of Neural Audio Codec
by: Li, Haoyang, et al.
Published: (2025)

On the Relation Between Speech Quality and Quantized Latent Representations of Neural Codecs
by: Halimeh, Mhd Modar, et al.
Published: (2025)

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
by: Ji, Shengpeng, et al.
Published: (2024)

Enhancing Noise Robustness for Neural Speech Codecs through Resource-Efficient Progressive Quantization Perturbation Simulation
by: Zheng, Rui-Chen, et al.
Published: (2025)

EnCodecMAE: Leveraging neural codecs for universal audio representation learning
by: Pepino, Leonardo, et al.
Published: (2023)

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
by: Wang, Xiaofei, et al.
Published: (2023)

MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation
by: Song, Yakun, et al.
Published: (2025)

TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling
by: Wang, Yuancheng, et al.
Published: (2025)

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
by: Ji, Shengpeng, et al.
Published: (2023)

SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization
by: Chen, Wenxi, et al.
Published: (2025)

PTQ4ADM: Post-Training Quantization for Efficient Text Conditional Audio Diffusion Models
by: Vora, Jayneel, et al.
Published: (2024)

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech
by: Shi, Jiatong, et al.
Published: (2024)

ERVQ: Enhanced Residual Vector Quantization with Intra-and-Inter-Codebook Optimization for Neural Audio Codecs
by: Zheng, Rui-Chen, et al.
Published: (2024)

SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization
by: Wang, Jin, et al.
Published: (2025)

NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization
by: Niu, Zhikang, et al.
Published: (2024)

A Context-Based Numerical Format Prediction for a Text-To-Speech System
by: Darwesh, Yaser, et al.
Published: (2024)

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
by: Dhawan, Kunal, et al.
Published: (2024)

VoCodec: An Efficient Lightweight Low-Bitrate Speech Codec
by: Yang, Leyan, et al.
Published: (2026)

Gull: A Generative Multifunctional Audio Codec
by: Luo, Yi, et al.
Published: (2024)

LiVeAction: a Lightweight, Versatile, and Asymmetric Neural Codec Design for Real-time Operation
by: Jacobellis, Dan, et al.
Published: (2026)

Towards Generalized Source Tracing for Codec-Based Deepfake Speech
by: Chen, Xuanjun, et al.
Published: (2025)

Distinctive Feature Codec: An Adaptive Efficient Speech Representation for Depression Detection
by: Zhang, Xiangyu, et al.
Published: (2025)

Variable Bitrate Residual Vector Quantization for Audio Coding
by: Chae, Yunkee, et al.
Published: (2024)

CoDiCodec: Unifying Continuous and Discrete Compressed Representations of Audio
by: Pasini, Marco, et al.
Published: (2025)

Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
by: Karapiperis, Sotirios, et al.
Published: (2024)

Bringing Interpretability to Neural Audio Codecs
by: Sadok, Samir, et al.
Published: (2025)

Evaluation of Neural Surrogates for Physical Modelling Synthesis of Nonlinear Elastic Plates
by: Martin, Carlos De La Vega, et al.
Published: (2025)

FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks
by: Della Libera, Luca, et al.
Published: (2025)

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
by: Wang, Yuancheng, et al.
Published: (2024)

Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models
by: Kwon, Taegyun, et al.
Published: (2024)

SECP: A Speech Enhancement-Based Curation Pipeline For Scalable Acquisition Of Clean Speech
by: Sabra, Adam, et al.
Published: (2024)

SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech
by: Kim, Minchan, et al.
Published: (2024)