:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Dawei, Lv, Yongjie, Xiong, Ruijie, Jin, Chunxiang, Peng, Xiaojiang
Format:	Preprint
Published:	2026
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2601.04564
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VividVoice: A Unified Framework for Scene-Aware Visually-Driven Speech Synthesis
by: Ma, Chengyuan, et al.
Published: (2026)

MSF-SER: Enriching Acoustic Modeling with Multi-Granularity Semantics for Speech Emotion Recognition
by: Li, Haoxun, et al.
Published: (2025)

Investigation on the Robustness of Acoustic Foundation Models on Post Exercise Speech
by: Xue, Xiangyuan, et al.
Published: (2026)

SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition
by: Cheng, Zebang, et al.
Published: (2024)

Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition
by: Saliba, Alexandra, et al.
Published: (2024)

MATER: Multi-level Acoustic and Textual Emotion Representation for Interpretable Speech Emotion Recognition
by: Jon, Hyo Jin, et al.
Published: (2025)

XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs
by: Gong, Yitian, et al.
Published: (2025)

CAT-Net: A Cross-Attention Tone Network for Cross-Subject EEG-EMG Fusion Tone Decoding
by: Zhuang, Yifan, et al.
Published: (2025)

Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models
by: Derington, Anna, et al.
Published: (2023)

Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
by: Chakrabarty, Sudip, et al.
Published: (2025)

Before the Mic: Physical-Layer Voiceprint Anonymization with Acoustic Metamaterials
by: Ning, Zhiyuan, et al.
Published: (2026)

Large Speech Model Enabled Semantic Communication
by: Tian, Yun, et al.
Published: (2025)

Multi-Channel Speech Enhancement for Cocktail Party Speech Emotion Recognition
by: Chen, Youjun, et al.
Published: (2026)

Explaining Deep Learning Embeddings for Speech Emotion Recognition by Predicting Interpretable Acoustic Features
by: Dixit, Satvik, et al.
Published: (2024)

Semantic-Emotional Resonance Embedding: A Semi-Supervised Paradigm for Cross-Lingual Speech Emotion Recognition
by: Zhao, Ya, et al.
Published: (2026)

ATRIE: Adaptive Tuning for Robust Inference and Emotion in Persona-Driven Speech Synthesis
by: Li, Aoduo, et al.
Published: (2026)

Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics
by: Zhang, Ziqian, et al.
Published: (2025)

Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
by: Yan, Canxiang, et al.
Published: (2025)

Cross-Corpus Validation of Speech Emotion Recognition in Urdu using Domain-Knowledge Acoustic Features
by: Talpur, Unzela, et al.
Published: (2025)

Speech Emotion Recognition with ASR Integration
by: Li, Yuanchao
Published: (2026)

MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
by: Sailor, Hardik B., et al.
Published: (2025)

Beyond Acoustic Emotion Recognition: Multimodal Pathos Analysis in Political Speech Using LLM-Based and Acoustic Emotion Models
by: Dietrich, Juergen
Published: (2026)

Dataset-Distillation Generative Model for Speech Emotion Recognition
by: Ritter-Gutierrez, Fabian, et al.
Published: (2024)

WESR: Scaling and Evaluating Word-level Event-Speech Recognition
by: Yang, Chenchen, et al.
Published: (2026)

EMO-SUPERB: An In-depth Look at Speech Emotion Recognition
by: Wu, Haibin, et al.
Published: (2024)

Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis
by: Wang, Tianrui, et al.
Published: (2025)

From Human Speech to Ocean Signals: Transferring Speech Large Models for Underwater Acoustic Target Recognition
by: Huang, Mengcheng, et al.
Published: (2026)

Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition
by: Shen, Siyuan, et al.
Published: (2024)

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs
by: Zhang, Yuhao, et al.
Published: (2025)

Toward Efficient Speech Emotion Recognition via Spectral Learning and Attention
by: Lee, HyeYoung, et al.
Published: (2025)

Word Level Timestamp Generation for Automatic Speech Recognition and Translation
by: Hu, Ke, et al.
Published: (2025)

Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques
by: Li, Yuanchao, et al.
Published: (2024)

Color-based Emotion Representation for Speech Emotion Recognition
by: Nagase, Ryotaro, et al.
Published: (2026)

Amplifying Emotional Signals: Data-Efficient Deep Learning for Robust Speech Emotion Recognition
by: Vu, Tai
Published: (2025)

ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
by: Tao, Dehua, et al.
Published: (2024)

From Coarse to Fine: Recursive Audio-Visual Semantic Enhancement for Speech Separation
by: Xue, Ke, et al.
Published: (2025)

VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs
by: Zhang, Hezhao, et al.
Published: (2026)

TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition
by: Chen, Chengxin, et al.
Published: (2024)

ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis
by: Li, Haitao, et al.
Published: (2026)

Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition
by: Wang, Peng, et al.
Published: (2026)