Saved in:
| Main Authors: | Cao, Di, Fu, Dongjie, Yu, Hai, Zheng, Siqi, Tan, Xu, Jin, Tao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.24596 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TTA: Transcribe, Translate and Alignment for Cross-lingual Speech Representation
by: Liu, Wei, et al.
Published: (2025)
by: Liu, Wei, et al.
Published: (2025)
Improving Speech Emotion Recognition Through Cross Modal Attention Alignment and Balanced Stacking Model
by: Ueda, Lucas, et al.
Published: (2025)
by: Ueda, Lucas, et al.
Published: (2025)
Dynamic Frequency-Adaptive Knowledge Distillation for Speech Enhancement
by: Yuan, Xihao, et al.
Published: (2025)
by: Yuan, Xihao, et al.
Published: (2025)
UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition
by: Fu, Li, et al.
Published: (2024)
by: Fu, Li, et al.
Published: (2024)
TASU: Text-Only Alignment for Speech Understanding
by: Peng, Jing, et al.
Published: (2025)
by: Peng, Jing, et al.
Published: (2025)
SSR: Alignment-Aware Modality Connector for Speech Language Models
by: Tan, Weiting, et al.
Published: (2024)
by: Tan, Weiting, et al.
Published: (2024)
Dual-Branch Knowledge Distillation for Noise-Robust Synthetic Speech Detection
by: Fan, Cunhang, et al.
Published: (2023)
by: Fan, Cunhang, et al.
Published: (2023)
ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
by: Tao, Dehua, et al.
Published: (2024)
by: Tao, Dehua, et al.
Published: (2024)
Cross-Modal Bottleneck Fusion For Noise Robust Audio-Visual Speech Recognition
by: Ok, Seaone, et al.
Published: (2026)
by: Ok, Seaone, et al.
Published: (2026)
Reducing Linguistic Hallucination in LM-Based Speech Enhancement via Noise-Invariant Acoustic-Semantic Distillation
by: Wang, Zheng, et al.
Published: (2026)
by: Wang, Zheng, et al.
Published: (2026)
Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment
by: Choi, Jeongsoo, et al.
Published: (2025)
by: Choi, Jeongsoo, et al.
Published: (2025)
Speech-Omni-Lite: Portable Speech Interfaces for Vision-Language Models
by: Tao, Dehua, et al.
Published: (2026)
by: Tao, Dehua, et al.
Published: (2026)
Complex Recurrent Variational Autoencoder with Application to Speech Enhancement
by: Xie, Yuying, et al.
Published: (2022)
by: Xie, Yuying, et al.
Published: (2022)
WhisperVC: Decoupled Cross-Domain Alignment and Speech Generation for Low-Resource Whisper-to-Normal Conversion
by: Liu, Dong, et al.
Published: (2025)
by: Liu, Dong, et al.
Published: (2025)
ARTT: Augmented Reverberant-Target Training for Unsupervised Monaural Speech Dereverberation
by: Song, Siqi, et al.
Published: (2026)
by: Song, Siqi, et al.
Published: (2026)
Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
by: Yuan, Ze, et al.
Published: (2024)
by: Yuan, Ze, et al.
Published: (2024)
TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs
by: Peng, Jing, et al.
Published: (2026)
by: Peng, Jing, et al.
Published: (2026)
Exploring the Capability of Mamba in Speech Applications
by: Miyazaki, Koichi, et al.
Published: (2024)
by: Miyazaki, Koichi, et al.
Published: (2024)
SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing
by: Zhang, Hanlin, et al.
Published: (2026)
by: Zhang, Hanlin, et al.
Published: (2026)
Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation
by: Liu, Wenrui, et al.
Published: (2025)
by: Liu, Wenrui, et al.
Published: (2025)
Multi-Distillation from Speech and Music Representation Models
by: Wei, Jui-Chiang, et al.
Published: (2025)
by: Wei, Jui-Chiang, et al.
Published: (2025)
Adaptive Duration Model for Text Speech Alignment
by: Cao, Junjie
Published: (2025)
by: Cao, Junjie
Published: (2025)
FlexSpeech: Towards Stable, Controllable and Expressive Text-to-Speech
by: Ma, Linhan, et al.
Published: (2025)
by: Ma, Linhan, et al.
Published: (2025)
Robust One-step Speech Enhancement via Consistency Distillation
by: Xu, Liang, et al.
Published: (2025)
by: Xu, Liang, et al.
Published: (2025)
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
by: Lu, Ke-Han, et al.
Published: (2024)
by: Lu, Ke-Han, et al.
Published: (2024)
Distil-DCCRN: A Small-footprint DCCRN Leveraging Feature-based Knowledge Distillation in Speech Enhancement
by: Han, Runduo, et al.
Published: (2024)
by: Han, Runduo, et al.
Published: (2024)
AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling
by: Kalkhorani, Vahid Ahmadi, et al.
Published: (2024)
by: Kalkhorani, Vahid Ahmadi, et al.
Published: (2024)
USpeech: Ultrasound-Enhanced Speech with Minimal Human Effort via Cross-Modal Synthesis
by: Yu, Luca Jiang-Tao, et al.
Published: (2024)
by: Yu, Luca Jiang-Tao, et al.
Published: (2024)
LLMs and Speech: Integration vs. Combination
by: Schmitt, Robin, et al.
Published: (2026)
by: Schmitt, Robin, et al.
Published: (2026)
Attention-weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied to Speech Emotion Recognition
by: Yang, Qingran, et al.
Published: (2026)
by: Yang, Qingran, et al.
Published: (2026)
Knowledge Distillation for Speech Denoising by Latent Representation Alignment with Cosine Distance
by: Luong, Diep, et al.
Published: (2025)
by: Luong, Diep, et al.
Published: (2025)
Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision
by: Chen, Yafeng, et al.
Published: (2024)
by: Chen, Yafeng, et al.
Published: (2024)
SecoustiCodec: Cross-Modal Aligned Streaming Single-Codecbook Speech Codec
by: Qiang, Chunyu, et al.
Published: (2025)
by: Qiang, Chunyu, et al.
Published: (2025)
Anatomy of the Modality Gap: Dissecting the Internal States of End-to-End Speech LLMs
by: Hsu, Ming-Hao, et al.
Published: (2026)
by: Hsu, Ming-Hao, et al.
Published: (2026)
Group Relative Policy Optimization for Speech Recognition
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2025)
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2025)
CUEMPATHY: A Counseling Speech Dataset for Psychotherapy Research
by: Tao, Dehua, et al.
Published: (2024)
by: Tao, Dehua, et al.
Published: (2024)
Text-aware Speech Separation for Multi-talker Keyword Spotting
by: Li, Haoyu, et al.
Published: (2024)
by: Li, Haoyu, et al.
Published: (2024)
AS-Speech: Adaptive Style For Speech Synthesis
by: Li, Zhipeng, et al.
Published: (2024)
by: Li, Zhipeng, et al.
Published: (2024)
Enhancing Code-switched Text-to-Speech Synthesis Capability in Large Language Models with only Monolingual Corpora
by: Xu, Jing, et al.
Published: (2024)
by: Xu, Jing, et al.
Published: (2024)
DISPATCH: Distilling Selective Patches for Speech Enhancement
by: Kim, Dohwan, et al.
Published: (2025)
by: Kim, Dohwan, et al.
Published: (2025)
Similar Items
-
TTA: Transcribe, Translate and Alignment for Cross-lingual Speech Representation
by: Liu, Wei, et al.
Published: (2025) -
Improving Speech Emotion Recognition Through Cross Modal Attention Alignment and Balanced Stacking Model
by: Ueda, Lucas, et al.
Published: (2025) -
Dynamic Frequency-Adaptive Knowledge Distillation for Speech Enhancement
by: Yuan, Xihao, et al.
Published: (2025) -
UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition
by: Fu, Li, et al.
Published: (2024) -
TASU: Text-Only Alignment for Speech Understanding
by: Peng, Jing, et al.
Published: (2025)