Saved in:
| Main Authors: | Gu, Yuzhe, Diao, Enmao |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.19441 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Variable Bitrate Residual Vector Quantization for Audio Coding
by: Chae, Yunkee, et al.
Published: (2024)
by: Chae, Yunkee, et al.
Published: (2024)
Vector Quantized Diffusion Model Based Speech Bandwidth Extension
by: Fang, Yuan, et al.
Published: (2024)
by: Fang, Yuan, et al.
Published: (2024)
SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization
by: Chen, Wenxi, et al.
Published: (2025)
by: Chen, Wenxi, et al.
Published: (2025)
Autoregressive Speech Synthesis without Vector Quantization
by: Meng, Lingwei, et al.
Published: (2024)
by: Meng, Lingwei, et al.
Published: (2024)
Neural Speech Coding for Real-time Communications using Constant Bitrate Scalar Quantization
by: Brendel, Andreas, et al.
Published: (2024)
by: Brendel, Andreas, et al.
Published: (2024)
Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling
by: Zhang, Leying, et al.
Published: (2024)
by: Zhang, Leying, et al.
Published: (2024)
SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods
by: Huang, Wen, et al.
Published: (2025)
by: Huang, Wen, et al.
Published: (2025)
Efficient Sparse Coding with the Adaptive Locally Competitive Algorithm for Speech Classification
by: Bahadi, Soufiyan, et al.
Published: (2024)
by: Bahadi, Soufiyan, et al.
Published: (2024)
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
by: Yang, Dongchao, et al.
Published: (2024)
by: Yang, Dongchao, et al.
Published: (2024)
AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook
by: Chen, Yushen, et al.
Published: (2025)
by: Chen, Yushen, et al.
Published: (2025)
Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ
by: Chae, Yunkee, et al.
Published: (2025)
by: Chae, Yunkee, et al.
Published: (2025)
CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition
by: Wang, He, et al.
Published: (2024)
by: Wang, He, et al.
Published: (2024)
FLASepformer: Efficient Speech Separation with Gated Focused Linear Attention Transformer
by: Wang, Haoxu, et al.
Published: (2025)
by: Wang, Haoxu, et al.
Published: (2025)
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
by: Ding, Shaojin, et al.
Published: (2023)
by: Ding, Shaojin, et al.
Published: (2023)
GALD-SE: Guided Anisotropic Lightweight Diffusion for Efficient Speech Enhancement
by: Wang, Chengzhong, et al.
Published: (2024)
by: Wang, Chengzhong, et al.
Published: (2024)
NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization
by: Niu, Zhikang, et al.
Published: (2024)
by: Niu, Zhikang, et al.
Published: (2024)
Progressive Residual Extraction based Pre-training for Speech Representation Learning
by: Wang, Tianrui, et al.
Published: (2024)
by: Wang, Tianrui, et al.
Published: (2024)
PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning
by: Shi, Jiatong, et al.
Published: (2025)
by: Shi, Jiatong, et al.
Published: (2025)
A Neural Speech Codec for Noise Robust Speech Coding
by: Huang, Jiayi, et al.
Published: (2023)
by: Huang, Jiayi, et al.
Published: (2023)
Post-Training Quantization for Audio Diffusion Transformers
by: Khandelwal, Tanmay, et al.
Published: (2025)
by: Khandelwal, Tanmay, et al.
Published: (2025)
SpatialCodec: Neural Spatial Speech Coding
by: Xu, Zhongweiyang, et al.
Published: (2023)
by: Xu, Zhongweiyang, et al.
Published: (2023)
Effective and Efficient Mixed Precision Quantization of Speech Foundation Models
by: Xu, Haoning, et al.
Published: (2025)
by: Xu, Haoning, et al.
Published: (2025)
ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis
by: Tang, Haobin, et al.
Published: (2024)
by: Tang, Haobin, et al.
Published: (2024)
Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation
by: Guo, Haohan, et al.
Published: (2024)
by: Guo, Haohan, et al.
Published: (2024)
Causal Speech Enhancement with Predicting Semantics based on Quantized Self-supervised Learning Features
by: Tsunoo, Emiru, et al.
Published: (2024)
by: Tsunoo, Emiru, et al.
Published: (2024)
Cross-Talk Speech Reduction, by Separation, for Separation
by: Wang, Zhong-Qiu, et al.
Published: (2026)
by: Wang, Zhong-Qiu, et al.
Published: (2026)
Attention-Guided Adaptation for Code-Switching Speech Recognition
by: Aditya, Bobbi, et al.
Published: (2023)
by: Aditya, Bobbi, et al.
Published: (2023)
Vision-Integrated High-Quality Neural Speech Coding
by: Guo, Yao, et al.
Published: (2025)
by: Guo, Yao, et al.
Published: (2025)
ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution Shifts
by: Garg, Ashi, et al.
Published: (2025)
by: Garg, Ashi, et al.
Published: (2025)
DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition
by: Shao, Hang, et al.
Published: (2023)
by: Shao, Hang, et al.
Published: (2023)
Rethinking Mean Opinion Scores in Speech Quality Assessment: Aggregation through Quantized Distribution Fitting
by: Kondo, Yuto, et al.
Published: (2025)
by: Kondo, Yuto, et al.
Published: (2025)
Accelerating Diffusion Transformer-Based Text-to-Speech with Transformer Layer Caching
by: Sakpiboonchit, Siratish
Published: (2025)
by: Sakpiboonchit, Siratish
Published: (2025)
Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text
by: Xue, Hongfei, et al.
Published: (2024)
by: Xue, Hongfei, et al.
Published: (2024)
Efficient Long-Form Speech Recognition for General Speech In-Context Learning
by: Yen, Hao, et al.
Published: (2024)
by: Yen, Hao, et al.
Published: (2024)
Speaker-agnostic Emotion Vector for Cross-speaker Emotion Intensity Control
by: Murata, Masato, et al.
Published: (2025)
by: Murata, Masato, et al.
Published: (2025)
Rare Word Recognition and Translation Without Fine-Tuning via Task Vector in Speech Models
by: Jing, Ruihao, et al.
Published: (2025)
by: Jing, Ruihao, et al.
Published: (2025)
UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension
by: Gupta, Kishan, et al.
Published: (2025)
by: Gupta, Kishan, et al.
Published: (2025)
Fewer-token Neural Speech Codec with Time-invariant Codes
by: Ren, Yong, et al.
Published: (2023)
by: Ren, Yong, et al.
Published: (2023)
Metadata-Enhanced Speech Emotion Recognition: Augmented Residual Integration and Co-Attention in Two-Stage Fine-Tuning
by: Wan, Zixiang, et al.
Published: (2024)
by: Wan, Zixiang, et al.
Published: (2024)
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
by: Dang, Trung, et al.
Published: (2024)
by: Dang, Trung, et al.
Published: (2024)
Similar Items
-
Variable Bitrate Residual Vector Quantization for Audio Coding
by: Chae, Yunkee, et al.
Published: (2024) -
Vector Quantized Diffusion Model Based Speech Bandwidth Extension
by: Fang, Yuan, et al.
Published: (2024) -
SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization
by: Chen, Wenxi, et al.
Published: (2025) -
Autoregressive Speech Synthesis without Vector Quantization
by: Meng, Lingwei, et al.
Published: (2024) -
Neural Speech Coding for Real-time Communications using Constant Bitrate Scalar Quantization
by: Brendel, Andreas, et al.
Published: (2024)