Saved in:
| Main Authors: | Gao, Yingying, Zhang, Shilei, Yang, Runyan, Cui, Zihao, Feng, Junlan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.06290 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HarmoniFuse: A Component-Selective and Prompt-Adaptive Framework for Multi-Task Speech Language Modeling
by: Si, Yuke, et al.
Published: (2025)
by: Si, Yuke, et al.
Published: (2025)
Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired by Dynamic Neural Network
by: Chen, Yanan, et al.
Published: (2024)
by: Chen, Yanan, et al.
Published: (2024)
Teaching Audio Models to Reason: A Unified Framework for Source- and Layer-wise Distillation
by: Yang, Runyan, et al.
Published: (2025)
by: Yang, Runyan, et al.
Published: (2025)
MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion Recognition
by: Sun, Haiyang, et al.
Published: (2023)
by: Sun, Haiyang, et al.
Published: (2023)
PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models
by: Yang, Runyan, et al.
Published: (2024)
by: Yang, Runyan, et al.
Published: (2024)
On Calibration of Speech Classification Models: Insights from Energy-Based Model Investigations
by: Hao, Yaqian, et al.
Published: (2024)
by: Hao, Yaqian, et al.
Published: (2024)
GenDistiller: Distilling Pre-trained Language Models based on an Autoregressive Generative Model
by: Gao, Yingying, et al.
Published: (2024)
by: Gao, Yingying, et al.
Published: (2024)
CEC: A Noisy Label Detection Method for Speaker Recognition
by: Shen, Yao, et al.
Published: (2024)
by: Shen, Yao, et al.
Published: (2024)
Exploring Energy-Based Models for Out-of-Distribution Detection in Dialect Identification
by: Hao, Yaqian, et al.
Published: (2024)
by: Hao, Yaqian, et al.
Published: (2024)
Group Relative Policy Optimization for Speech Recognition
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2025)
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2025)
OneVoice: One Model, Triple Scenarios-Towards Unified Zero-shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2026)
by: Wang, Zhichao, et al.
Published: (2026)
Group Relative Policy Optimization for Text-to-Speech with Large Language Models
by: Liu, Chang, et al.
Published: (2025)
by: Liu, Chang, et al.
Published: (2025)
VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark
by: Lin, Yuke, et al.
Published: (2024)
by: Lin, Yuke, et al.
Published: (2024)
Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition
by: Shen, Siyuan, et al.
Published: (2024)
by: Shen, Siyuan, et al.
Published: (2024)
VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology Detection
by: Togootogtokh, Enkhtogtokh, et al.
Published: (2025)
by: Togootogtokh, Enkhtogtokh, et al.
Published: (2025)
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
by: Sun, Xiaohui, et al.
Published: (2025)
by: Sun, Xiaohui, et al.
Published: (2025)
Multi-Scale Temporal Transformer For Speech Emotion Recognition
by: Li, Zhipeng, et al.
Published: (2024)
by: Li, Zhipeng, et al.
Published: (2024)
Unsupervised Online Continual Learning for Automatic Speech Recognition
by: Eeckt, Steven Vander, et al.
Published: (2024)
by: Eeckt, Steven Vander, et al.
Published: (2024)
Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model
by: Ihori, Mana, et al.
Published: (2025)
by: Ihori, Mana, et al.
Published: (2025)
Investigating Group Relative Policy Optimization for Diffusion Transformer based Text-to-Audio Generation
by: Gu, Yi, et al.
Published: (2026)
by: Gu, Yi, et al.
Published: (2026)
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
by: Liu, Jiaxuan, et al.
Published: (2024)
by: Liu, Jiaxuan, et al.
Published: (2024)
Towards Unsupervised Speech Recognition Without Pronunciation Models
by: Ni, Junrui, et al.
Published: (2024)
by: Ni, Junrui, et al.
Published: (2024)
Leveraging Content and Acoustic Representations for Speech Emotion Recognition
by: Dutta, Soumya, et al.
Published: (2024)
by: Dutta, Soumya, et al.
Published: (2024)
Speech Emotion Recognition with ASR Integration
by: Li, Yuanchao
Published: (2026)
by: Li, Yuanchao
Published: (2026)
Speech Emotion Recognition Via CNN-Transformer and Multidimensional Attention Mechanism
by: Tang, Xiaoyu, et al.
Published: (2024)
by: Tang, Xiaoyu, et al.
Published: (2024)
Color-based Emotion Representation for Speech Emotion Recognition
by: Nagase, Ryotaro, et al.
Published: (2026)
by: Nagase, Ryotaro, et al.
Published: (2026)
Reasoning Beyond Majority Vote: An Explainable SpeechLM Framework for Speech Emotion Recognition
by: Su, Bo-Hao, et al.
Published: (2025)
by: Su, Bo-Hao, et al.
Published: (2025)
How Attention Shapes Emotion: A Comparative Study of Attention Mechanisms for Speech Emotion Recognition
by: Casals-Salvador, Marc, et al.
Published: (2026)
by: Casals-Salvador, Marc, et al.
Published: (2026)
Revisiting Modeling and Evaluation Approaches in Speech Emotion Recognition: Considering Subjectivity of Annotators and Ambiguity of Emotions
by: Chou, Huang-Cheng, et al.
Published: (2025)
by: Chou, Huang-Cheng, et al.
Published: (2025)
Bridging Speech Emotion Recognition and Personality: Dataset and Temporal Interaction Condition Network
by: Gao, Yuan, et al.
Published: (2025)
by: Gao, Yuan, et al.
Published: (2025)
PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models
by: Feng, Tiantian, et al.
Published: (2023)
by: Feng, Tiantian, et al.
Published: (2023)
HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model
by: Yang, Yiqing, et al.
Published: (2025)
by: Yang, Yiqing, et al.
Published: (2025)
EMO-SUPERB: An In-depth Look at Speech Emotion Recognition
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
Dataset-Distillation Generative Model for Speech Emotion Recognition
by: Ritter-Gutierrez, Fabian, et al.
Published: (2024)
by: Ritter-Gutierrez, Fabian, et al.
Published: (2024)
THAI Speech Emotion Recognition (THAI-SER) corpus
by: Wongpithayadisai, Jilamika, et al.
Published: (2025)
by: Wongpithayadisai, Jilamika, et al.
Published: (2025)
Iterative Prototype Refinement for Ambiguous Speech Emotion Recognition
by: Sun, Haoqin, et al.
Published: (2024)
by: Sun, Haoqin, et al.
Published: (2024)
Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition
by: Girish, et al.
Published: (2026)
by: Girish, et al.
Published: (2026)
Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
by: Gao, Xiaoxue, et al.
Published: (2024)
by: Gao, Xiaoxue, et al.
Published: (2024)
Improving Speech-based Emotion Recognition with Contextual Utterance Analysis and LLMs
by: Zhang, Enshi, et al.
Published: (2024)
by: Zhang, Enshi, et al.
Published: (2024)
Similar Items
-
HarmoniFuse: A Component-Selective and Prompt-Adaptive Framework for Multi-Task Speech Language Modeling
by: Si, Yuke, et al.
Published: (2025) -
Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired by Dynamic Neural Network
by: Chen, Yanan, et al.
Published: (2024) -
Teaching Audio Models to Reason: A Unified Framework for Source- and Layer-wise Distillation
by: Yang, Runyan, et al.
Published: (2025) -
MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion Recognition
by: Sun, Haiyang, et al.
Published: (2023) -
PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models
by: Yang, Runyan, et al.
Published: (2024)