:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Siyi, Tan, Shihong, Liu, Siyi, Jia, Hong, Huang, Gongping, Bailey, James, Dang, Ting
Format:	Preprint
Published:	2026
Subjects:	Sound Machine Learning
Online Access:	https://arxiv.org/abs/2602.03420
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Emotion-Aware Quantization for Discrete Speech Representations: An Analysis of Emotion Preservation
by: Zhou, Haoguang, et al.
Published: (2026)

EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering
by: Xie, Tianxin, et al.
Published: (2025)

Token-Level Logits Matter: A Closer Look at Speech Foundation Models for Ambiguous Emotion Recognition
by: Halim, Jule Valendo, et al.
Published: (2025)

Edge-Cloud Collaborative Speech Emotion Captioning via Token-Level Speculative Decoding in Audio-Language Models
by: Xue, Xiangyuan, et al.
Published: (2026)

Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models
by: Zhang, Wenda, et al.
Published: (2026)

EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
by: Cho, Deok-Hyeon, et al.
Published: (2024)

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
by: Deng, Wei, et al.
Published: (2025)

EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis
by: Zhou, Li, et al.
Published: (2026)

IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
by: Zhou, Siyi, et al.
Published: (2025)

Rethinking Continual Learning for Speech and Audio: A Representation-Centric Taxonomy and Open Problems
by: Xiao, Yang, et al.
Published: (2026)

Why Can't They Remember? Uncovering Representation and Retrieval Bottlenecks in Multi-Turn Acoustic Memory
by: Xiao, Yang, et al.
Published: (2026)

IndexTTS 2.5 Technical Report
by: Li, Yunpei, et al.
Published: (2026)

EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS
by: Li, Haoxun, et al.
Published: (2025)

DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
by: Cho, Deok-Hyeon, et al.
Published: (2025)

Test-Time Adaptation for Speech Emotion Recognition
by: Dong, Jiaheng, et al.
Published: (2026)

Lightweight Front-end Enhancement for Robust ASR via Frame Resampling and Sub-Band Pruning
by: Zhao, Siyi, et al.
Published: (2025)

CLAIP-Emo: Parameter-Efficient Adaptation of Language-supervised models for In-the-Wild Audiovisual Emotion Recognition
by: Chen, Yin, et al.
Published: (2025)

Diffusion-based Speech Enhancement with Schrödinger Bridge and Symmetric Noise Schedule
by: Wang, Siyi, et al.
Published: (2024)

Perturbation Self-Supervised Representations for Cross-Lingual Emotion TTS: Stage-Wise Modeling of Emotion and Speaker
by: Gong, Cheng, et al.
Published: (2025)

EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
by: Cong, Gaoxiang, et al.
Published: (2024)

TED-TTS: Training-Free Intra-Utterance Emotion and Duration Control for Text-to-Speech Synthesis
by: Liang, Qifan, et al.
Published: (2026)

EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
by: Cho, Deok-Hyeon, et al.
Published: (2024)

E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models
by: Dong, Jiaheng, et al.
Published: (2025)

Efficient Emotion and Speaker Adaptation in LLM-Based TTS via Characteristic-Specific Partial Fine-Tuning
by: Wang, Tianrui, et al.
Published: (2025)

Scaling Auditory Cognition via Test-Time Compute in Audio Language Models
by: Dang, Ting, et al.
Published: (2025)

EmoOmni: Bridging Emotional Understanding and Expression in Omni-Modal LLMs
by: Tian, Wenjie, et al.
Published: (2026)

Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
by: Gao, Xiaoxue, et al.
Published: (2024)

Genre Controlled Music Generation via Activation Steering
by: Narashiman, Swathi, et al.
Published: (2025)

Disentangling Reasoning in Large Audio-Language Models for Ambiguous Emotion Prediction
by: Yu, Xiaofeng, et al.
Published: (2026)

EmoSURA: Towards Accurate Evaluation of Detailed and Long-Context Emotional Speech Captions
by: Jing, Xin, et al.
Published: (2026)

EmoFake: An Initial Dataset for Emotion Fake Audio Detection
by: Zhao, Yan, et al.
Published: (2022)

EmoTransCap: Dataset and Pipeline for Emotion Transition-Aware Speech Captioning in Discourses
by: Xu, Shuhao, et al.
Published: (2026)

DTT-BSR: GAN-based DTTNet with RoPE Transformer Enhancement for Music Source Restoration
by: Tan, Shihong, et al.
Published: (2026)

EmoAttack: Utilizing Emotional Voice Conversion for Speech Backdoor Attacks on Deep Speech Classification Models
by: Yao, Wenhan, et al.
Published: (2024)

VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs
by: Zhang, Hezhao, et al.
Published: (2026)

EmoHRNet: High-Resolution Neural Network Based Speech Emotion Recognition
by: Muppidi, Akshay, et al.
Published: (2025)

Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
by: Wang, Haoyu, et al.
Published: (2024)

EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model
by: Yang, Yiqing, et al.
Published: (2025)

EME-TTS: Unlocking the Emphasis and Emotion Link in Speech Synthesis
by: Li, Haoxun, et al.
Published: (2025)

EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control
by: Chen, Haozhe, et al.
Published: (2024)