:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Su, Xun, Wang, Huamin, Zhang, Qi
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Sound
Online Access:	https://arxiv.org/abs/2602.08240
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Temporal Information Reconstruction and Non-Aligned Residual in Spiking Neural Networks for Speech Classification
by: Zhang, Qi, et al.
Published: (2024)

Persian Speech Emotion Recognition by Fine-Tuning Transformers
by: Shayaninasab, Minoo, et al.
Published: (2024)

TS-SNN: Temporal Shift Module for Spiking Neural Networks
by: Yu, Kairong, et al.
Published: (2025)

Efficient Finetuning for Dimensional Speech Emotion Recognition in the Age of Transformers
by: Sampath, Aneesha, et al.
Published: (2025)

Speech Emotion Recognition via Entropy-Aware Score Selection
by: Chua, ChenYi, et al.
Published: (2025)

Focal Loss based Residual Convolutional Neural Network for Speech Emotion Recognition
by: Tripathi, Suraj, et al.
Published: (2019)

Color-based Emotion Representation for Speech Emotion Recognition
by: Nagase, Ryotaro, et al.
Published: (2026)

Toward Efficient Speech Emotion Recognition via Spectral Learning and Attention
by: Lee, HyeYoung, et al.
Published: (2025)

MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
by: Sailor, Hardik B., et al.
Published: (2025)

Deep Learning for Speech Emotion Recognition: A CNN Approach Utilizing Mel Spectrograms
by: Penumajji, Niketa
Published: (2025)

Cross-Learning Fine-Tuning Strategy for Dysarthric Speech Recognition Via CDSD database
by: Xiao, Qing, et al.
Published: (2025)

SyncSpeech: Efficient and Low-Latency Text-to-Speech based on Temporal Masked Transformer
by: Sheng, Zhengyan, et al.
Published: (2025)

Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition
by: Zhang, Wei, et al.
Published: (2025)

Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
by: Kim, Minsu, et al.
Published: (2023)

Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition
by: Li, Dongyuan, et al.
Published: (2024)

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
by: Ma, Ziyang, et al.
Published: (2023)

Amplifying Emotional Signals: Data-Efficient Deep Learning for Robust Speech Emotion Recognition
by: Vu, Tai
Published: (2025)

Hybrid CNN-Transformer Architecture for Arabic Speech Emotion Recognition
by: Gheffari, Youcef Soufiane, et al.
Published: (2026)

EMO-TTA: Improving Test-Time Adaptation of Audio-Language Models for Speech Emotion Recognition
by: Shi, Jiacheng, et al.
Published: (2025)

Learning Physiology-Informed Vocal Spectrotemporal Representations for Speech Emotion Recognition
by: Zhang, Xu, et al.
Published: (2026)

PSCodec: A Series of High-Fidelity Low-bitrate Neural Speech Codecs Leveraging Prompt Encoders
by: Pan, Yu, et al.
Published: (2024)

Prompt-Unseen-Emotion: Zero-shot Expressive Speech Synthesis with Prompt-LLM Contextual Knowledge for Mixed Emotions
by: Gao, Xiaoxue, et al.
Published: (2025)

VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs
by: Zhang, Hezhao, et al.
Published: (2026)

ABHINAYA -- A System for Speech Emotion Recognition In Naturalistic Conditions Challenge
by: Dutta, Soumya, et al.
Published: (2025)

Searching for Effective Preprocessing Method and CNN-based Architecture with Efficient Channel Attention on Speech Emotion Recognition
by: Kim, Byunggun, et al.
Published: (2024)

Adapting Whisper for Parameter-efficient Code-Switching Speech Recognition via Soft Prompt Tuning
by: Yang, Hongli, et al.
Published: (2025)

SpikeVox: Towards Energy-Efficient Speech Therapy Framework with Spike-driven Generative Language Models
by: Putra, Rachmad Vidya Wicaksana, et al.
Published: (2025)

Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition
by: Siriwardhana, Shamane, et al.
Published: (2020)

MSAC: Multiple Speech Attribute Control Method for Reliable Speech Emotion Recognition
by: Pan, Yu, et al.
Published: (2023)

Unifying EEG and Speech for Emotion Recognition: A Two-Step Joint Learning Framework for Handling Missing EEG Data During Inference
by: Tiwari, Upasana, et al.
Published: (2025)

Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation
by: Chang, Yi, et al.
Published: (2024)

Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
by: Jiang, Yicong, et al.
Published: (2024)

Bimodal Connection Attention Fusion for Speech Emotion Recognition
by: Luo, Jiachen, et al.
Published: (2025)

Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention
by: Wang, Cong, et al.
Published: (2025)

EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis
by: Zhou, Li, et al.
Published: (2026)

MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention
by: Jiao, Xinxin, et al.
Published: (2024)

MuSpike: A Benchmark and Evaluation Framework for Symbolic Music Generation with Spiking Neural Networks
by: Liang, Qian, et al.
Published: (2025)

Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models
by: Zhang, Wenda, et al.
Published: (2026)

MATER: Multi-level Acoustic and Textual Emotion Representation for Interpretable Speech Emotion Recognition
by: Jon, Hyo Jin, et al.
Published: (2025)

Enabling Automatic Disordered Speech Recognition: An Impaired Speech Dataset in the Akan Language
by: Wiafe, Isaac, et al.
Published: (2026)