:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gu, Yuzhe, Diao, Enmao
Format:	Preprint
Published:	2024
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2404.19441
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Variable Bitrate Residual Vector Quantization for Audio Coding
by: Chae, Yunkee, et al.
Published: (2024)

Vector Quantized Diffusion Model Based Speech Bandwidth Extension
by: Fang, Yuan, et al.
Published: (2024)

SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization
by: Chen, Wenxi, et al.
Published: (2025)

Autoregressive Speech Synthesis without Vector Quantization
by: Meng, Lingwei, et al.
Published: (2024)

Neural Speech Coding for Real-time Communications using Constant Bitrate Scalar Quantization
by: Brendel, Andreas, et al.
Published: (2024)

Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling
by: Zhang, Leying, et al.
Published: (2024)

SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods
by: Huang, Wen, et al.
Published: (2025)

Efficient Sparse Coding with the Adaptive Locally Competitive Algorithm for Speech Classification
by: Bahadi, Soufiyan, et al.
Published: (2024)

SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
by: Yang, Dongchao, et al.
Published: (2024)

AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook
by: Chen, Yushen, et al.
Published: (2025)

Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ
by: Chae, Yunkee, et al.
Published: (2025)

CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition
by: Wang, He, et al.
Published: (2024)

FLASepformer: Efficient Speech Separation with Gated Focused Linear Attention Transformer
by: Wang, Haoxu, et al.
Published: (2025)

USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
by: Ding, Shaojin, et al.
Published: (2023)

GALD-SE: Guided Anisotropic Lightweight Diffusion for Efficient Speech Enhancement
by: Wang, Chengzhong, et al.
Published: (2024)

NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization
by: Niu, Zhikang, et al.
Published: (2024)

Progressive Residual Extraction based Pre-training for Speech Representation Learning
by: Wang, Tianrui, et al.
Published: (2024)

PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning
by: Shi, Jiatong, et al.
Published: (2025)

A Neural Speech Codec for Noise Robust Speech Coding
by: Huang, Jiayi, et al.
Published: (2023)

Post-Training Quantization for Audio Diffusion Transformers
by: Khandelwal, Tanmay, et al.
Published: (2025)

SpatialCodec: Neural Spatial Speech Coding
by: Xu, Zhongweiyang, et al.
Published: (2023)

Effective and Efficient Mixed Precision Quantization of Speech Foundation Models
by: Xu, Haoning, et al.
Published: (2025)

ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis
by: Tang, Haobin, et al.
Published: (2024)

Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation
by: Guo, Haohan, et al.
Published: (2024)

Causal Speech Enhancement with Predicting Semantics based on Quantized Self-supervised Learning Features
by: Tsunoo, Emiru, et al.
Published: (2024)

Cross-Talk Speech Reduction, by Separation, for Separation
by: Wang, Zhong-Qiu, et al.
Published: (2026)

Attention-Guided Adaptation for Code-Switching Speech Recognition
by: Aditya, Bobbi, et al.
Published: (2023)

Vision-Integrated High-Quality Neural Speech Coding
by: Guo, Yao, et al.
Published: (2025)

ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution Shifts
by: Garg, Ashi, et al.
Published: (2025)

DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition
by: Shao, Hang, et al.
Published: (2023)

Rethinking Mean Opinion Scores in Speech Quality Assessment: Aggregation through Quantized Distribution Fitting
by: Kondo, Yuto, et al.
Published: (2025)

Accelerating Diffusion Transformer-Based Text-to-Speech with Transformer Layer Caching
by: Sakpiboonchit, Siratish
Published: (2025)

Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text
by: Xue, Hongfei, et al.
Published: (2024)

Efficient Long-Form Speech Recognition for General Speech In-Context Learning
by: Yen, Hao, et al.
Published: (2024)

Speaker-agnostic Emotion Vector for Cross-speaker Emotion Intensity Control
by: Murata, Masato, et al.
Published: (2025)

Rare Word Recognition and Translation Without Fine-Tuning via Task Vector in Speech Models
by: Jing, Ruihao, et al.
Published: (2025)

UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension
by: Gupta, Kishan, et al.
Published: (2025)

Fewer-token Neural Speech Codec with Time-invariant Codes
by: Ren, Yong, et al.
Published: (2023)

Metadata-Enhanced Speech Emotion Recognition: Augmented Residual Integration and Co-Attention in Two-Stage Fine-Tuning
by: Wan, Zixiang, et al.
Published: (2024)

LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
by: Dang, Trung, et al.
Published: (2024)