Saved in:
| Main Authors: | Li, Aoduo, Lv, Haoran, Xu, Hongjian, Li, Shengmin, Qin, Sihao, Li, Zimeng, Pun, Chi Man, Chen, Xuhang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.19055 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VEDAL: Variational Error-Driven Asynchronous Learning for 3D Gaussian Splatting Pruning
by: Li, Aoduo, et al.
Published: (2026)
by: Li, Aoduo, et al.
Published: (2026)
Self-Attention and Hybrid Features for Replay and Deep-Fake Audio Detection
by: Huang, Lian, et al.
Published: (2024)
by: Huang, Lian, et al.
Published: (2024)
PROEMO: Prompt-Driven Text-to-Speech Synthesis Based on Emotion and Intensity Control
by: Zhang, Shaozuo, et al.
Published: (2025)
by: Zhang, Shaozuo, et al.
Published: (2025)
Hierarchical Control of Emotion Rendering in Speech Synthesis
by: Inoue, Sho, et al.
Published: (2024)
by: Inoue, Sho, et al.
Published: (2024)
EME-TTS: Unlocking the Emphasis and Emotion Link in Speech Synthesis
by: Li, Haoxun, et al.
Published: (2025)
by: Li, Haoxun, et al.
Published: (2025)
Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis
by: Inoue, Sho, et al.
Published: (2024)
by: Inoue, Sho, et al.
Published: (2024)
FS-RWKV: Leveraging Frequency Spatial-Aware RWKV for 3T-to-7T MRI Translation
by: Lei, Yingtie, et al.
Published: (2025)
by: Lei, Yingtie, et al.
Published: (2025)
Robust and Efficient Autoregressive Speech Synthesis with Dynamic Chunk-wise Prediction Policy
by: Li, Bohan, et al.
Published: (2025)
by: Li, Bohan, et al.
Published: (2025)
Multi-Channel Speech Enhancement for Cocktail Party Speech Emotion Recognition
by: Chen, Youjun, et al.
Published: (2026)
by: Chen, Youjun, et al.
Published: (2026)
When Tone and Words Disagree: Towards Robust Speech Emotion Recognition under Acoustic-Semantic Conflict
by: Huang, Dawei, et al.
Published: (2026)
by: Huang, Dawei, et al.
Published: (2026)
Speech Emotion Recognition with ASR Integration
by: Li, Yuanchao
Published: (2026)
by: Li, Yuanchao
Published: (2026)
AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis
by: Qi, Tianhua, et al.
Published: (2026)
by: Qi, Tianhua, et al.
Published: (2026)
Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis
by: Feng, Pengchao, et al.
Published: (2025)
by: Feng, Pengchao, et al.
Published: (2025)
Multi-Step Prediction and Control of Hierarchical Emotion Distribution in Text-to-Speech Synthesis
by: Inoue, Sho, et al.
Published: (2025)
by: Inoue, Sho, et al.
Published: (2025)
EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model
by: Yang, Yiqing, et al.
Published: (2025)
by: Yang, Yiqing, et al.
Published: (2025)
EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning
by: Wang, Dingdong, et al.
Published: (2026)
by: Wang, Dingdong, et al.
Published: (2026)
AST: Adaptive, Seamless, and Training-Free Precise Speech Editing
by: Lv, Sihan, et al.
Published: (2026)
by: Lv, Sihan, et al.
Published: (2026)
EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis
by: Zhou, Li, et al.
Published: (2026)
by: Zhou, Li, et al.
Published: (2026)
DTEA: Dynamic Topology Weaving and Instability-Driven Entropic Attenuation for Medical Image Segmentation
by: Li, Weixuan, et al.
Published: (2025)
by: Li, Weixuan, et al.
Published: (2025)
BridgeCode: A Dual Speech Representation Paradigm for Autoregressive Zero-Shot Text-to-Speech Synthesis
by: Xing, Jingyuan, et al.
Published: (2025)
by: Xing, Jingyuan, et al.
Published: (2025)
Affectron: Emotional Speech Synthesis with Affective and Contextually Aligned Nonverbal Vocalizations
by: Cho, Deok-Hyeon, et al.
Published: (2026)
by: Cho, Deok-Hyeon, et al.
Published: (2026)
MSF-SER: Enriching Acoustic Modeling with Multi-Granularity Semantics for Speech Emotion Recognition
by: Li, Haoxun, et al.
Published: (2025)
by: Li, Haoxun, et al.
Published: (2025)
Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention
by: Wang, Cong, et al.
Published: (2025)
by: Wang, Cong, et al.
Published: (2025)
ProMist-5K: A Comprehensive Dataset for Digital Emulation of Cinematic Pro-Mist Filter Effects
by: Lei, Yingtie, et al.
Published: (2026)
by: Lei, Yingtie, et al.
Published: (2026)
SFormer: SNR-guided Transformer for Underwater Image Enhancement from the Frequency Domain
by: Tian, Xin, et al.
Published: (2025)
by: Tian, Xin, et al.
Published: (2025)
A Comprehensive Study on the Effectiveness of ASR Representations for Noise-Robust Speech Emotion Recognition
by: Shi, Xiaohan, et al.
Published: (2023)
by: Shi, Xiaohan, et al.
Published: (2023)
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
Persian Speech Emotion Recognition by Fine-Tuning Transformers
by: Shayaninasab, Minoo, et al.
Published: (2024)
by: Shayaninasab, Minoo, et al.
Published: (2024)
Fine-Grained Quantitative Emotion Editing for Speech Generation
by: Inoue, Sho, et al.
Published: (2024)
by: Inoue, Sho, et al.
Published: (2024)
AffectCodec: Emotion-Preserving Neural Speech Codec with Block-Diagonal Residual FSQ
by: Meng, Zhaoyang, et al.
Published: (2026)
by: Meng, Zhaoyang, et al.
Published: (2026)
DUET: Unified Dual-Space Emotion Control for Diffusion and Flow-Matching Driven Text-to-Speech
by: Zhang, Xu, et al.
Published: (2026)
by: Zhang, Xu, et al.
Published: (2026)
Adaptive Speech Emotion Representation Learning Based On Dynamic Graph
by: Gao, Yingxue, et al.
Published: (2024)
by: Gao, Yingxue, et al.
Published: (2024)
SOLIDO: A Robust Watermarking Method for Speech Synthesis via Low-Rank Adaptation
by: Li, Yue, et al.
Published: (2025)
by: Li, Yue, et al.
Published: (2025)
Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
by: Chakrabarty, Sudip, et al.
Published: (2025)
by: Chakrabarty, Sudip, et al.
Published: (2025)
ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis
by: Tang, Haobin, et al.
Published: (2024)
by: Tang, Haobin, et al.
Published: (2024)
Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models
by: Derington, Anna, et al.
Published: (2023)
by: Derington, Anna, et al.
Published: (2023)
TED-TTS: Training-Free Intra-Utterance Emotion and Duration Control for Text-to-Speech Synthesis
by: Liang, Qifan, et al.
Published: (2026)
by: Liang, Qifan, et al.
Published: (2026)
MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
by: Sailor, Hardik B., et al.
Published: (2025)
by: Sailor, Hardik B., et al.
Published: (2025)
IO-RAE: Information-Obfuscation Reversible Adversarial Example for Audio Privacy Protection
by: Zhu, Jiajie, et al.
Published: (2026)
by: Zhu, Jiajie, et al.
Published: (2026)
Scaling Speech-Text Pre-training with Synthetic Interleaved Data
by: Zeng, Aohan, et al.
Published: (2024)
by: Zeng, Aohan, et al.
Published: (2024)
Similar Items
-
VEDAL: Variational Error-Driven Asynchronous Learning for 3D Gaussian Splatting Pruning
by: Li, Aoduo, et al.
Published: (2026) -
Self-Attention and Hybrid Features for Replay and Deep-Fake Audio Detection
by: Huang, Lian, et al.
Published: (2024) -
PROEMO: Prompt-Driven Text-to-Speech Synthesis Based on Emotion and Intensity Control
by: Zhang, Shaozuo, et al.
Published: (2025) -
Hierarchical Control of Emotion Rendering in Speech Synthesis
by: Inoue, Sho, et al.
Published: (2024) -
EME-TTS: Unlocking the Emphasis and Emotion Link in Speech Synthesis
by: Li, Haoxun, et al.
Published: (2025)