Saved in:
| Main Authors: | Huang, Dawei, Lv, Yongjie, Xiong, Ruijie, Jin, Chunxiang, Peng, Xiaojiang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.04564 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VividVoice: A Unified Framework for Scene-Aware Visually-Driven Speech Synthesis
by: Ma, Chengyuan, et al.
Published: (2026)
by: Ma, Chengyuan, et al.
Published: (2026)
MSF-SER: Enriching Acoustic Modeling with Multi-Granularity Semantics for Speech Emotion Recognition
by: Li, Haoxun, et al.
Published: (2025)
by: Li, Haoxun, et al.
Published: (2025)
Investigation on the Robustness of Acoustic Foundation Models on Post Exercise Speech
by: Xue, Xiangyuan, et al.
Published: (2026)
by: Xue, Xiangyuan, et al.
Published: (2026)
SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition
by: Cheng, Zebang, et al.
Published: (2024)
by: Cheng, Zebang, et al.
Published: (2024)
Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition
by: Saliba, Alexandra, et al.
Published: (2024)
by: Saliba, Alexandra, et al.
Published: (2024)
MATER: Multi-level Acoustic and Textual Emotion Representation for Interpretable Speech Emotion Recognition
by: Jon, Hyo Jin, et al.
Published: (2025)
by: Jon, Hyo Jin, et al.
Published: (2025)
XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs
by: Gong, Yitian, et al.
Published: (2025)
by: Gong, Yitian, et al.
Published: (2025)
CAT-Net: A Cross-Attention Tone Network for Cross-Subject EEG-EMG Fusion Tone Decoding
by: Zhuang, Yifan, et al.
Published: (2025)
by: Zhuang, Yifan, et al.
Published: (2025)
Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models
by: Derington, Anna, et al.
Published: (2023)
by: Derington, Anna, et al.
Published: (2023)
Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
by: Chakrabarty, Sudip, et al.
Published: (2025)
by: Chakrabarty, Sudip, et al.
Published: (2025)
Before the Mic: Physical-Layer Voiceprint Anonymization with Acoustic Metamaterials
by: Ning, Zhiyuan, et al.
Published: (2026)
by: Ning, Zhiyuan, et al.
Published: (2026)
Large Speech Model Enabled Semantic Communication
by: Tian, Yun, et al.
Published: (2025)
by: Tian, Yun, et al.
Published: (2025)
Multi-Channel Speech Enhancement for Cocktail Party Speech Emotion Recognition
by: Chen, Youjun, et al.
Published: (2026)
by: Chen, Youjun, et al.
Published: (2026)
Explaining Deep Learning Embeddings for Speech Emotion Recognition by Predicting Interpretable Acoustic Features
by: Dixit, Satvik, et al.
Published: (2024)
by: Dixit, Satvik, et al.
Published: (2024)
Semantic-Emotional Resonance Embedding: A Semi-Supervised Paradigm for Cross-Lingual Speech Emotion Recognition
by: Zhao, Ya, et al.
Published: (2026)
by: Zhao, Ya, et al.
Published: (2026)
ATRIE: Adaptive Tuning for Robust Inference and Emotion in Persona-Driven Speech Synthesis
by: Li, Aoduo, et al.
Published: (2026)
by: Li, Aoduo, et al.
Published: (2026)
Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics
by: Zhang, Ziqian, et al.
Published: (2025)
by: Zhang, Ziqian, et al.
Published: (2025)
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
by: Yan, Canxiang, et al.
Published: (2025)
by: Yan, Canxiang, et al.
Published: (2025)
Cross-Corpus Validation of Speech Emotion Recognition in Urdu using Domain-Knowledge Acoustic Features
by: Talpur, Unzela, et al.
Published: (2025)
by: Talpur, Unzela, et al.
Published: (2025)
Speech Emotion Recognition with ASR Integration
by: Li, Yuanchao
Published: (2026)
by: Li, Yuanchao
Published: (2026)
MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
by: Sailor, Hardik B., et al.
Published: (2025)
by: Sailor, Hardik B., et al.
Published: (2025)
Beyond Acoustic Emotion Recognition: Multimodal Pathos Analysis in Political Speech Using LLM-Based and Acoustic Emotion Models
by: Dietrich, Juergen
Published: (2026)
by: Dietrich, Juergen
Published: (2026)
Dataset-Distillation Generative Model for Speech Emotion Recognition
by: Ritter-Gutierrez, Fabian, et al.
Published: (2024)
by: Ritter-Gutierrez, Fabian, et al.
Published: (2024)
WESR: Scaling and Evaluating Word-level Event-Speech Recognition
by: Yang, Chenchen, et al.
Published: (2026)
by: Yang, Chenchen, et al.
Published: (2026)
EMO-SUPERB: An In-depth Look at Speech Emotion Recognition
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis
by: Wang, Tianrui, et al.
Published: (2025)
by: Wang, Tianrui, et al.
Published: (2025)
From Human Speech to Ocean Signals: Transferring Speech Large Models for Underwater Acoustic Target Recognition
by: Huang, Mengcheng, et al.
Published: (2026)
by: Huang, Mengcheng, et al.
Published: (2026)
Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition
by: Shen, Siyuan, et al.
Published: (2024)
by: Shen, Siyuan, et al.
Published: (2024)
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs
by: Zhang, Yuhao, et al.
Published: (2025)
by: Zhang, Yuhao, et al.
Published: (2025)
Toward Efficient Speech Emotion Recognition via Spectral Learning and Attention
by: Lee, HyeYoung, et al.
Published: (2025)
by: Lee, HyeYoung, et al.
Published: (2025)
Word Level Timestamp Generation for Automatic Speech Recognition and Translation
by: Hu, Ke, et al.
Published: (2025)
by: Hu, Ke, et al.
Published: (2025)
Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques
by: Li, Yuanchao, et al.
Published: (2024)
by: Li, Yuanchao, et al.
Published: (2024)
Color-based Emotion Representation for Speech Emotion Recognition
by: Nagase, Ryotaro, et al.
Published: (2026)
by: Nagase, Ryotaro, et al.
Published: (2026)
Amplifying Emotional Signals: Data-Efficient Deep Learning for Robust Speech Emotion Recognition
by: Vu, Tai
Published: (2025)
by: Vu, Tai
Published: (2025)
ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
by: Tao, Dehua, et al.
Published: (2024)
by: Tao, Dehua, et al.
Published: (2024)
From Coarse to Fine: Recursive Audio-Visual Semantic Enhancement for Speech Separation
by: Xue, Ke, et al.
Published: (2025)
by: Xue, Ke, et al.
Published: (2025)
VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs
by: Zhang, Hezhao, et al.
Published: (2026)
by: Zhang, Hezhao, et al.
Published: (2026)
TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition
by: Chen, Chengxin, et al.
Published: (2024)
by: Chen, Chengxin, et al.
Published: (2024)
ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis
by: Li, Haitao, et al.
Published: (2026)
by: Li, Haitao, et al.
Published: (2026)
Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition
by: Wang, Peng, et al.
Published: (2026)
by: Wang, Peng, et al.
Published: (2026)
Similar Items
-
VividVoice: A Unified Framework for Scene-Aware Visually-Driven Speech Synthesis
by: Ma, Chengyuan, et al.
Published: (2026) -
MSF-SER: Enriching Acoustic Modeling with Multi-Granularity Semantics for Speech Emotion Recognition
by: Li, Haoxun, et al.
Published: (2025) -
Investigation on the Robustness of Acoustic Foundation Models on Post Exercise Speech
by: Xue, Xiangyuan, et al.
Published: (2026) -
SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition
by: Cheng, Zebang, et al.
Published: (2024) -
Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition
by: Saliba, Alexandra, et al.
Published: (2024)