Saved in:
| Main Authors: | Wang, Tianrui, Ma, Ziyang, Peng, Yizhou, Wang, Haoyu, Niu, Zhikang, Huang, Zikang, Wu, Yihao, Chao, Yi-Wen, Jiang, Yu, Lu, Yuheng, Yang, Guanrou, Li, Xuanchen, Liu, Hexin, Qiang, Chunyu, Gong, Cheng, Yang, Yifan, Liu, Tianchi, Wang, Junyu, Hou, Nana, Ge, Meng, You, Fuming, Yang, Wei, Sun, Zhongqian, Hu, Haifeng, Wang, Xiaobao, Chng, Eng Siong, Chen, Xie, Wang, Longbiao, Dang, Jianwu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.09413 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis
by: Wang, Tianrui, et al.
Published: (2025)
by: Wang, Tianrui, et al.
Published: (2025)
MSR-HuBERT: Self-supervised Pre-training for Adaptation to Multiple Sampling Rates
by: Huang, Zikang, et al.
Published: (2026)
by: Huang, Zikang, et al.
Published: (2026)
Bi-directional Context-Enhanced Speech Large Language Models for Multilingual Conversational ASR
by: Peng, Yizhou, et al.
Published: (2025)
by: Peng, Yizhou, et al.
Published: (2025)
Separate First, Fuse Later: Mitigating Cross-Modal Interference in Audio-Visual LLMs Reasoning with Modality-Specific Chain-of-Thought
by: Li, Xuanchen, et al.
Published: (2026)
by: Li, Xuanchen, et al.
Published: (2026)
Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
by: Wang, Haoyu, et al.
Published: (2024)
by: Wang, Haoyu, et al.
Published: (2024)
Efficient Emotion and Speaker Adaptation in LLM-Based TTS via Characteristic-Specific Partial Fine-Tuning
by: Wang, Tianrui, et al.
Published: (2025)
by: Wang, Tianrui, et al.
Published: (2025)
ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning
by: Wang, Junyu, et al.
Published: (2025)
by: Wang, Junyu, et al.
Published: (2025)
POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
by: Li, Xuanchen, et al.
Published: (2025)
by: Li, Xuanchen, et al.
Published: (2025)
Prosodic Boundary-Aware Streaming Generation for LLM-Based TTS with Streaming Text Input
by: Liu, Changsong, et al.
Published: (2026)
by: Liu, Changsong, et al.
Published: (2026)
VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models
by: Chen, Yukun, et al.
Published: (2026)
by: Chen, Yukun, et al.
Published: (2026)
LORT: Locally Refined Convolution and Taylor Transformer for Monaural Speech Enhancement
by: Wang, Junyu, et al.
Published: (2025)
by: Wang, Junyu, et al.
Published: (2025)
Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement
by: Wang, Junyu, et al.
Published: (2024)
by: Wang, Junyu, et al.
Published: (2024)
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs
by: Yuhang, Yang, et al.
Published: (2024)
by: Yuhang, Yang, et al.
Published: (2024)
Pay More Attention To Audio: Mitigating Imbalance of Cross-Modal Attention in Large Audio Language Models
by: Wang, Junyu, et al.
Published: (2025)
by: Wang, Junyu, et al.
Published: (2025)
EASY: Emotion-aware Speaker Anonymization via Factorized Distillation
by: Yao, Jixun, et al.
Published: (2025)
by: Yao, Jixun, et al.
Published: (2025)
Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content
by: Wu, Sheng, et al.
Published: (2024)
by: Wu, Sheng, et al.
Published: (2024)
Zero-shot Context Biasing with Trie-based Decoding using Synthetic Multi-Pronunciation
by: Liu, Changsong, et al.
Published: (2025)
by: Liu, Changsong, et al.
Published: (2025)
CECOR: Correction-oriented synthetic data construction for factual error correction
by: Zhu, Lei, et al.
Published: (2026)
by: Zhu, Lei, et al.
Published: (2026)
Improving Code-Switching Speech Recognition with TTS Data Augmentation
by: Yeo, Yue Heng, et al.
Published: (2026)
by: Yeo, Yue Heng, et al.
Published: (2026)
Integration of Old and New Knowledge for Generalized Intent Discovery: A Consistency-driven Prototype-Prompting Framework
by: Wei, Xiao, et al.
Published: (2025)
by: Wei, Xiao, et al.
Published: (2025)
NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025
by: Peng, Yizhou, et al.
Published: (2025)
by: Peng, Yizhou, et al.
Published: (2025)
Perturbation Self-Supervised Representations for Cross-Lingual Emotion TTS: Stage-Wise Modeling of Emotion and Speaker
by: Gong, Cheng, et al.
Published: (2025)
by: Gong, Cheng, et al.
Published: (2025)
AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations
by: Wu, Sheng, et al.
Published: (2024)
by: Wu, Sheng, et al.
Published: (2024)
Progressive Residual Extraction based Pre-training for Speech Representation Learning
by: Wang, Tianrui, et al.
Published: (2024)
by: Wang, Tianrui, et al.
Published: (2024)
Rethinking Contrastive Learning in Graph Anomaly Detection: A Clean-View Perspective
by: Jin, Di, et al.
Published: (2025)
by: Jin, Di, et al.
Published: (2025)
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
by: Yao, Jixun, et al.
Published: (2025)
by: Yao, Jixun, et al.
Published: (2025)
Language-Aware Distillation for Multilingual Instruction-Following Speech LLMs with ASR-Only Supervision
by: Gopal, Shreyas, et al.
Published: (2026)
by: Gopal, Shreyas, et al.
Published: (2026)
Explainable Disentanglement on Discrete Speech Representations for Noise-Robust ASR
by: Gopal, Shreyas, et al.
Published: (2025)
by: Gopal, Shreyas, et al.
Published: (2025)
InstructAudio: Unified speech and music generation with natural language instruction
by: Qiang, Chunyu, et al.
Published: (2025)
by: Qiang, Chunyu, et al.
Published: (2025)
Chronological Thinking in Full-Duplex Spoken Dialogue Language Models
by: Wu, Donghang, et al.
Published: (2025)
by: Wu, Donghang, et al.
Published: (2025)
Breaking Data Efficiency Dilemma: A Federated and Augmented Learning Framework For Alzheimer's Disease Detection via Speech
by: Wei, Xiao, et al.
Published: (2026)
by: Wei, Xiao, et al.
Published: (2026)
Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding
by: Zhao, Jiahui, et al.
Published: (2024)
by: Zhao, Jiahui, et al.
Published: (2024)
Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech
by: Li, Yuxin, et al.
Published: (2025)
by: Li, Yuxin, et al.
Published: (2025)
Code-switching Speech Recognition Under the Lens: Model- and Data-Centric Perspectives
by: Liu, Hexin, et al.
Published: (2025)
by: Liu, Hexin, et al.
Published: (2025)
Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
by: Ma, Ziyang, et al.
Published: (2025)
by: Ma, Ziyang, et al.
Published: (2025)
Text-based Talking Video Editing with Cascaded Conditional Diffusion
by: Han, Bo, et al.
Published: (2024)
by: Han, Bo, et al.
Published: (2024)
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
QAMO: Quality-aware Multi-centroid One-class Learning For Speech Deepfake Detection
by: Truong, Duc-Tuan, et al.
Published: (2025)
by: Truong, Duc-Tuan, et al.
Published: (2025)
Addressing Gradient Misalignment in Data-Augmented Training for Robust Speech Deepfake Detection
by: Truong, Duc-Tuan, et al.
Published: (2025)
by: Truong, Duc-Tuan, et al.
Published: (2025)
UniArray: Unified Spectral-Spatial Modeling for Array-Geometry-Agnostic Speech Separation
by: Chen, Weiguang, et al.
Published: (2025)
by: Chen, Weiguang, et al.
Published: (2025)
Similar Items
-
Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis
by: Wang, Tianrui, et al.
Published: (2025) -
MSR-HuBERT: Self-supervised Pre-training for Adaptation to Multiple Sampling Rates
by: Huang, Zikang, et al.
Published: (2026) -
Bi-directional Context-Enhanced Speech Large Language Models for Multilingual Conversational ASR
by: Peng, Yizhou, et al.
Published: (2025) -
Separate First, Fuse Later: Mitigating Cross-Modal Interference in Audio-Visual LLMs Reasoning with Modality-Specific Chain-of-Thought
by: Li, Xuanchen, et al.
Published: (2026) -
Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
by: Wang, Haoyu, et al.
Published: (2024)