Saved in:
| Main Authors: | Li, Zehan, Yang, Yan, Li, Xueqing, Kang, Jian, Zhang, Xiao-Lei, Li, Jie |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.01900 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Bridging the Gap between Continuous and Informative Discrete Representations by Random Product Quantization
by: Li, Xueqing, et al.
Published: (2025)
by: Li, Xueqing, et al.
Published: (2025)
EmoSSLSphere: Multilingual Emotional Speech Synthesis with Spherical Vectors and Discrete Speech Tokens
by: Park, Joonyong, et al.
Published: (2025)
by: Park, Joonyong, et al.
Published: (2025)
Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition
by: Wang, Huimeng, et al.
Published: (2025)
by: Wang, Huimeng, et al.
Published: (2025)
Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models
by: Li, Longhao, et al.
Published: (2026)
by: Li, Longhao, et al.
Published: (2026)
SELM: Speech Enhancement Using Discrete Tokens and Language Models
by: Wang, Ziqian, et al.
Published: (2023)
by: Wang, Ziqian, et al.
Published: (2023)
Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis
by: Xu, Tianyi, et al.
Published: (2025)
by: Xu, Tianyi, et al.
Published: (2025)
BoSS: Beyond-Semantic Speech
by: Wang, Qing, et al.
Published: (2025)
by: Wang, Qing, et al.
Published: (2025)
Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
by: Wang, Shih-heng, et al.
Published: (2024)
by: Wang, Shih-heng, et al.
Published: (2024)
GOAT-TTS: Expressive and Realistic Speech Generation via A Dual-Branch LLM
by: Song, Yaodong, et al.
Published: (2025)
by: Song, Yaodong, et al.
Published: (2025)
Exploring SSL Discrete Tokens for Multilingual ASR
by: Cui, Mingyu, et al.
Published: (2024)
by: Cui, Mingyu, et al.
Published: (2024)
Children's Speech Recognition through Discrete Token Enhancement
by: Sukhadia, Vrunda N., et al.
Published: (2024)
by: Sukhadia, Vrunda N., et al.
Published: (2024)
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
by: Chen, Peikun, et al.
Published: (2024)
by: Chen, Peikun, et al.
Published: (2024)
High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model
by: Lee, Joun Yeop, et al.
Published: (2024)
by: Lee, Joun Yeop, et al.
Published: (2024)
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation
by: Kim, Minsu, et al.
Published: (2024)
by: Kim, Minsu, et al.
Published: (2024)
Confidence-based Filtering for Speech Dataset Curation with Generative Speech Enhancement Using Discrete Tokens
by: Yamauchi, Kazuki, et al.
Published: (2026)
by: Yamauchi, Kazuki, et al.
Published: (2026)
Discrete Diffusion for Generative Modeling of Text-Aligned Speech Tokens
by: Ku, Pin-Jui, et al.
Published: (2025)
by: Ku, Pin-Jui, et al.
Published: (2025)
A Composite Predictive-Generative Approach to Monaural Universal Speech Enhancement
by: Zhang, Jie, et al.
Published: (2025)
by: Zhang, Jie, et al.
Published: (2025)
Acoustic BPE for Speech Generation with Discrete Tokens
by: Shen, Feiyu, et al.
Published: (2023)
by: Shen, Feiyu, et al.
Published: (2023)
Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty
by: Xue, Hongfei, et al.
Published: (2025)
by: Xue, Hongfei, et al.
Published: (2025)
Exploring Cross-Utterance Speech Contexts for Conformer-Transducer Speech Recognition Systems
by: Cui, Mingyu, et al.
Published: (2025)
by: Cui, Mingyu, et al.
Published: (2025)
Low Bitrate High-Quality RVQGAN-based Discrete Speech Tokenizer
by: Shechtman, Slava, et al.
Published: (2024)
by: Shechtman, Slava, et al.
Published: (2024)
Lessons Learnt: Revisit Key Training Strategies for Effective Speech Emotion Recognition in the Wild
by: Tzeng, Jing-Tong, et al.
Published: (2025)
by: Tzeng, Jing-Tong, et al.
Published: (2025)
Continuous Speech Tokenizer in Text To Speech
by: Li, Yixing, et al.
Published: (2024)
by: Li, Yixing, et al.
Published: (2024)
Benchmarking Large Pretrained Multilingual Models on Québec French Speech Recognition
by: Serrand, Coralie, et al.
Published: (2025)
by: Serrand, Coralie, et al.
Published: (2025)
SSHR: Leveraging Self-supervised Hierarchical Representations for Multilingual Automatic Speech Recognition
by: Xue, Hongfei, et al.
Published: (2023)
by: Xue, Hongfei, et al.
Published: (2023)
Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization
by: You, Jian, et al.
Published: (2025)
by: You, Jian, et al.
Published: (2025)
Recent Advances in Discrete Speech Tokens: A Review
by: Guo, Yiwei, et al.
Published: (2025)
by: Guo, Yiwei, et al.
Published: (2025)
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
by: Zhang, Xin, et al.
Published: (2023)
by: Zhang, Xin, et al.
Published: (2023)
Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition
by: Girish, et al.
Published: (2026)
by: Girish, et al.
Published: (2026)
Bridging the Modality Gap: Softly Discretizing Audio Representation for LLM-based Automatic Speech Recognition
by: Yang, Mu, et al.
Published: (2025)
by: Yang, Mu, et al.
Published: (2025)
Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation
by: Liu, Wenrui, et al.
Published: (2025)
by: Liu, Wenrui, et al.
Published: (2025)
Identifying and Calibrating Overconfidence in Noisy Speech Recognition
by: Huo, Mingyue, et al.
Published: (2025)
by: Huo, Mingyue, et al.
Published: (2025)
Speech Emotion Recognition with ASR Integration
by: Li, Yuanchao
Published: (2026)
by: Li, Yuanchao
Published: (2026)
Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition
by: Yang, Zhengdong, et al.
Published: (2025)
by: Yang, Zhengdong, et al.
Published: (2025)
Speaker Contrastive Learning for Source Speaker Tracing
by: Wang, Qing, et al.
Published: (2024)
by: Wang, Qing, et al.
Published: (2024)
Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data
by: Bai, Qibing, et al.
Published: (2025)
by: Bai, Qibing, et al.
Published: (2025)
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model
by: Wang, Siyang, et al.
Published: (2024)
by: Wang, Siyang, et al.
Published: (2024)
Rare Word Recognition and Translation Without Fine-Tuning via Task Vector in Speech Models
by: Jing, Ruihao, et al.
Published: (2025)
by: Jing, Ruihao, et al.
Published: (2025)
S2ST-Omni: Hierarchical Language-Aware SpeechLLM Adaptation for Multilingual Speech-to-Speech Translation
by: Pan, Yu, et al.
Published: (2025)
by: Pan, Yu, et al.
Published: (2025)
Multi-Scale Temporal Transformer For Speech Emotion Recognition
by: Li, Zhipeng, et al.
Published: (2024)
by: Li, Zhipeng, et al.
Published: (2024)
Similar Items
-
Bridging the Gap between Continuous and Informative Discrete Representations by Random Product Quantization
by: Li, Xueqing, et al.
Published: (2025) -
EmoSSLSphere: Multilingual Emotional Speech Synthesis with Spherical Vectors and Discrete Speech Tokens
by: Park, Joonyong, et al.
Published: (2025) -
Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition
by: Wang, Huimeng, et al.
Published: (2025) -
Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models
by: Li, Longhao, et al.
Published: (2026) -
SELM: Speech Enhancement Using Discrete Tokens and Language Models
by: Wang, Ziqian, et al.
Published: (2023)