Saved in:
| Main Authors: | Wang, Siyi, Tan, Shihong, Liu, Siyi, Jia, Hong, Huang, Gongping, Bailey, James, Dang, Ting |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.03420 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Emotion-Aware Quantization for Discrete Speech Representations: An Analysis of Emotion Preservation
by: Zhou, Haoguang, et al.
Published: (2026)
by: Zhou, Haoguang, et al.
Published: (2026)
EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering
by: Xie, Tianxin, et al.
Published: (2025)
by: Xie, Tianxin, et al.
Published: (2025)
Token-Level Logits Matter: A Closer Look at Speech Foundation Models for Ambiguous Emotion Recognition
by: Halim, Jule Valendo, et al.
Published: (2025)
by: Halim, Jule Valendo, et al.
Published: (2025)
Edge-Cloud Collaborative Speech Emotion Captioning via Token-Level Speculative Decoding in Audio-Language Models
by: Xue, Xiangyuan, et al.
Published: (2026)
by: Xue, Xiangyuan, et al.
Published: (2026)
Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models
by: Zhang, Wenda, et al.
Published: (2026)
by: Zhang, Wenda, et al.
Published: (2026)
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
by: Cho, Deok-Hyeon, et al.
Published: (2024)
by: Cho, Deok-Hyeon, et al.
Published: (2024)
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
by: Deng, Wei, et al.
Published: (2025)
by: Deng, Wei, et al.
Published: (2025)
EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis
by: Zhou, Li, et al.
Published: (2026)
by: Zhou, Li, et al.
Published: (2026)
IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
by: Zhou, Siyi, et al.
Published: (2025)
by: Zhou, Siyi, et al.
Published: (2025)
Rethinking Continual Learning for Speech and Audio: A Representation-Centric Taxonomy and Open Problems
by: Xiao, Yang, et al.
Published: (2026)
by: Xiao, Yang, et al.
Published: (2026)
Why Can't They Remember? Uncovering Representation and Retrieval Bottlenecks in Multi-Turn Acoustic Memory
by: Xiao, Yang, et al.
Published: (2026)
by: Xiao, Yang, et al.
Published: (2026)
IndexTTS 2.5 Technical Report
by: Li, Yunpei, et al.
Published: (2026)
by: Li, Yunpei, et al.
Published: (2026)
EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS
by: Li, Haoxun, et al.
Published: (2025)
by: Li, Haoxun, et al.
Published: (2025)
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
by: Cho, Deok-Hyeon, et al.
Published: (2025)
by: Cho, Deok-Hyeon, et al.
Published: (2025)
Test-Time Adaptation for Speech Emotion Recognition
by: Dong, Jiaheng, et al.
Published: (2026)
by: Dong, Jiaheng, et al.
Published: (2026)
Lightweight Front-end Enhancement for Robust ASR via Frame Resampling and Sub-Band Pruning
by: Zhao, Siyi, et al.
Published: (2025)
by: Zhao, Siyi, et al.
Published: (2025)
CLAIP-Emo: Parameter-Efficient Adaptation of Language-supervised models for In-the-Wild Audiovisual Emotion Recognition
by: Chen, Yin, et al.
Published: (2025)
by: Chen, Yin, et al.
Published: (2025)
Diffusion-based Speech Enhancement with Schrödinger Bridge and Symmetric Noise Schedule
by: Wang, Siyi, et al.
Published: (2024)
by: Wang, Siyi, et al.
Published: (2024)
Perturbation Self-Supervised Representations for Cross-Lingual Emotion TTS: Stage-Wise Modeling of Emotion and Speaker
by: Gong, Cheng, et al.
Published: (2025)
by: Gong, Cheng, et al.
Published: (2025)
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
by: Cong, Gaoxiang, et al.
Published: (2024)
by: Cong, Gaoxiang, et al.
Published: (2024)
TED-TTS: Training-Free Intra-Utterance Emotion and Duration Control for Text-to-Speech Synthesis
by: Liang, Qifan, et al.
Published: (2026)
by: Liang, Qifan, et al.
Published: (2026)
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
by: Cho, Deok-Hyeon, et al.
Published: (2024)
by: Cho, Deok-Hyeon, et al.
Published: (2024)
E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models
by: Dong, Jiaheng, et al.
Published: (2025)
by: Dong, Jiaheng, et al.
Published: (2025)
Efficient Emotion and Speaker Adaptation in LLM-Based TTS via Characteristic-Specific Partial Fine-Tuning
by: Wang, Tianrui, et al.
Published: (2025)
by: Wang, Tianrui, et al.
Published: (2025)
Scaling Auditory Cognition via Test-Time Compute in Audio Language Models
by: Dang, Ting, et al.
Published: (2025)
by: Dang, Ting, et al.
Published: (2025)
EmoOmni: Bridging Emotional Understanding and Expression in Omni-Modal LLMs
by: Tian, Wenjie, et al.
Published: (2026)
by: Tian, Wenjie, et al.
Published: (2026)
Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
by: Gao, Xiaoxue, et al.
Published: (2024)
by: Gao, Xiaoxue, et al.
Published: (2024)
Genre Controlled Music Generation via Activation Steering
by: Narashiman, Swathi, et al.
Published: (2025)
by: Narashiman, Swathi, et al.
Published: (2025)
Disentangling Reasoning in Large Audio-Language Models for Ambiguous Emotion Prediction
by: Yu, Xiaofeng, et al.
Published: (2026)
by: Yu, Xiaofeng, et al.
Published: (2026)
EmoSURA: Towards Accurate Evaluation of Detailed and Long-Context Emotional Speech Captions
by: Jing, Xin, et al.
Published: (2026)
by: Jing, Xin, et al.
Published: (2026)
EmoFake: An Initial Dataset for Emotion Fake Audio Detection
by: Zhao, Yan, et al.
Published: (2022)
by: Zhao, Yan, et al.
Published: (2022)
EmoTransCap: Dataset and Pipeline for Emotion Transition-Aware Speech Captioning in Discourses
by: Xu, Shuhao, et al.
Published: (2026)
by: Xu, Shuhao, et al.
Published: (2026)
DTT-BSR: GAN-based DTTNet with RoPE Transformer Enhancement for Music Source Restoration
by: Tan, Shihong, et al.
Published: (2026)
by: Tan, Shihong, et al.
Published: (2026)
EmoAttack: Utilizing Emotional Voice Conversion for Speech Backdoor Attacks on Deep Speech Classification Models
by: Yao, Wenhan, et al.
Published: (2024)
by: Yao, Wenhan, et al.
Published: (2024)
VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs
by: Zhang, Hezhao, et al.
Published: (2026)
by: Zhang, Hezhao, et al.
Published: (2026)
EmoHRNet: High-Resolution Neural Network Based Speech Emotion Recognition
by: Muppidi, Akshay, et al.
Published: (2025)
by: Muppidi, Akshay, et al.
Published: (2025)
Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
by: Wang, Haoyu, et al.
Published: (2024)
by: Wang, Haoyu, et al.
Published: (2024)
EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model
by: Yang, Yiqing, et al.
Published: (2025)
by: Yang, Yiqing, et al.
Published: (2025)
EME-TTS: Unlocking the Emphasis and Emotion Link in Speech Synthesis
by: Li, Haoxun, et al.
Published: (2025)
by: Li, Haoxun, et al.
Published: (2025)
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control
by: Chen, Haozhe, et al.
Published: (2024)
by: Chen, Haozhe, et al.
Published: (2024)
Similar Items
-
Emotion-Aware Quantization for Discrete Speech Representations: An Analysis of Emotion Preservation
by: Zhou, Haoguang, et al.
Published: (2026) -
EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering
by: Xie, Tianxin, et al.
Published: (2025) -
Token-Level Logits Matter: A Closer Look at Speech Foundation Models for Ambiguous Emotion Recognition
by: Halim, Jule Valendo, et al.
Published: (2025) -
Edge-Cloud Collaborative Speech Emotion Captioning via Token-Level Speculative Decoding in Audio-Language Models
by: Xue, Xiangyuan, et al.
Published: (2026) -
Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models
by: Zhang, Wenda, et al.
Published: (2026)