:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Aoduo, Lv, Haoran, Xu, Hongjian, Li, Shengmin, Qin, Sihao, Li, Zimeng, Pun, Chi Man, Chen, Xuhang
Format:	Preprint
Published:	2026
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2604.19055
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VEDAL: Variational Error-Driven Asynchronous Learning for 3D Gaussian Splatting Pruning
by: Li, Aoduo, et al.
Published: (2026)

Self-Attention and Hybrid Features for Replay and Deep-Fake Audio Detection
by: Huang, Lian, et al.
Published: (2024)

PROEMO: Prompt-Driven Text-to-Speech Synthesis Based on Emotion and Intensity Control
by: Zhang, Shaozuo, et al.
Published: (2025)

Hierarchical Control of Emotion Rendering in Speech Synthesis
by: Inoue, Sho, et al.
Published: (2024)

EME-TTS: Unlocking the Emphasis and Emotion Link in Speech Synthesis
by: Li, Haoxun, et al.
Published: (2025)

Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis
by: Inoue, Sho, et al.
Published: (2024)

FS-RWKV: Leveraging Frequency Spatial-Aware RWKV for 3T-to-7T MRI Translation
by: Lei, Yingtie, et al.
Published: (2025)

Robust and Efficient Autoregressive Speech Synthesis with Dynamic Chunk-wise Prediction Policy
by: Li, Bohan, et al.
Published: (2025)

Multi-Channel Speech Enhancement for Cocktail Party Speech Emotion Recognition
by: Chen, Youjun, et al.
Published: (2026)

When Tone and Words Disagree: Towards Robust Speech Emotion Recognition under Acoustic-Semantic Conflict
by: Huang, Dawei, et al.
Published: (2026)

Speech Emotion Recognition with ASR Integration
by: Li, Yuanchao
Published: (2026)

AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis
by: Qi, Tianhua, et al.
Published: (2026)

Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis
by: Feng, Pengchao, et al.
Published: (2025)

Multi-Step Prediction and Control of Hierarchical Emotion Distribution in Text-to-Speech Synthesis
by: Inoue, Sho, et al.
Published: (2025)

EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model
by: Yang, Yiqing, et al.
Published: (2025)

EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning
by: Wang, Dingdong, et al.
Published: (2026)

AST: Adaptive, Seamless, and Training-Free Precise Speech Editing
by: Lv, Sihan, et al.
Published: (2026)

EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis
by: Zhou, Li, et al.
Published: (2026)

DTEA: Dynamic Topology Weaving and Instability-Driven Entropic Attenuation for Medical Image Segmentation
by: Li, Weixuan, et al.
Published: (2025)

BridgeCode: A Dual Speech Representation Paradigm for Autoregressive Zero-Shot Text-to-Speech Synthesis
by: Xing, Jingyuan, et al.
Published: (2025)

Affectron: Emotional Speech Synthesis with Affective and Contextually Aligned Nonverbal Vocalizations
by: Cho, Deok-Hyeon, et al.
Published: (2026)

MSF-SER: Enriching Acoustic Modeling with Multi-Granularity Semantics for Speech Emotion Recognition
by: Li, Haoxun, et al.
Published: (2025)

Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention
by: Wang, Cong, et al.
Published: (2025)

ProMist-5K: A Comprehensive Dataset for Digital Emulation of Cinematic Pro-Mist Filter Effects
by: Lei, Yingtie, et al.
Published: (2026)

SFormer: SNR-guided Transformer for Underwater Image Enhancement from the Frequency Domain
by: Tian, Xin, et al.
Published: (2025)

A Comprehensive Study on the Effectiveness of ASR Representations for Noise-Robust Speech Emotion Recognition
by: Shi, Xiaohan, et al.
Published: (2023)

Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
by: Hu, Yuchen, et al.
Published: (2024)

Persian Speech Emotion Recognition by Fine-Tuning Transformers
by: Shayaninasab, Minoo, et al.
Published: (2024)

Fine-Grained Quantitative Emotion Editing for Speech Generation
by: Inoue, Sho, et al.
Published: (2024)

AffectCodec: Emotion-Preserving Neural Speech Codec with Block-Diagonal Residual FSQ
by: Meng, Zhaoyang, et al.
Published: (2026)

DUET: Unified Dual-Space Emotion Control for Diffusion and Flow-Matching Driven Text-to-Speech
by: Zhang, Xu, et al.
Published: (2026)

Adaptive Speech Emotion Representation Learning Based On Dynamic Graph
by: Gao, Yingxue, et al.
Published: (2024)

SOLIDO: A Robust Watermarking Method for Speech Synthesis via Low-Rank Adaptation
by: Li, Yue, et al.
Published: (2025)

Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
by: Chakrabarty, Sudip, et al.
Published: (2025)

ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis
by: Tang, Haobin, et al.
Published: (2024)

Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models
by: Derington, Anna, et al.
Published: (2023)

TED-TTS: Training-Free Intra-Utterance Emotion and Duration Control for Text-to-Speech Synthesis
by: Liang, Qifan, et al.
Published: (2026)

MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
by: Sailor, Hardik B., et al.
Published: (2025)

IO-RAE: Information-Obfuscation Reversible Adversarial Example for Audio Privacy Protection
by: Zhu, Jiajie, et al.
Published: (2026)

Scaling Speech-Text Pre-training with Synthetic Interleaved Data
by: Zeng, Aohan, et al.
Published: (2024)