Saved in:
| Main Authors: | Ahmed, Faria, Chowdhury, Rafi Hassan, Moon, Fatema Tuz Zohora, Ahmed, Sabbir |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.00746 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MangoLeafViT: Leveraging Lightweight Vision Transformer with Runtime Augmentation for Efficient Mango Leaf Disease Classification
by: Chowdhury, Rafi Hassan, et al.
Published: (2025)
by: Chowdhury, Rafi Hassan, et al.
Published: (2025)
Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
by: Chakrabarty, Sudip, et al.
Published: (2025)
by: Chakrabarty, Sudip, et al.
Published: (2025)
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
by: Liu, Huadai, et al.
Published: (2023)
by: Liu, Huadai, et al.
Published: (2023)
Deep Learning for Speech Emotion Recognition: A CNN Approach Utilizing Mel Spectrograms
by: Penumajji, Niketa
Published: (2025)
by: Penumajji, Niketa
Published: (2025)
Enhancing Speech Emotion Recognition with Multi-Task Learning and Dynamic Feature Fusion
by: Wang, Honghong, et al.
Published: (2025)
by: Wang, Honghong, et al.
Published: (2025)
Real-Time Speech Enhancement via a Hybrid ViT: A Dual-Input Acoustic-Image Feature Fusion
by: Bahmei, Behnaz, et al.
Published: (2025)
by: Bahmei, Behnaz, et al.
Published: (2025)
Re-Parameterization of Lightweight Transformer for On-Device Speech Emotion Recognition
by: Zhang, Zixing, et al.
Published: (2024)
by: Zhang, Zixing, et al.
Published: (2024)
DMF2Mel: A Dynamic Multiscale Fusion Network for EEG-Driven Mel Spectrogram Reconstruction
by: Fan, Cunhang, et al.
Published: (2025)
by: Fan, Cunhang, et al.
Published: (2025)
Bimodal Connection Attention Fusion for Speech Emotion Recognition
by: Luo, Jiachen, et al.
Published: (2025)
by: Luo, Jiachen, et al.
Published: (2025)
LPGNet: A Lightweight Network with Parallel Attention and Gated Fusion for Multimodal Emotion Recognition
by: He, Zhining, et al.
Published: (2025)
by: He, Zhining, et al.
Published: (2025)
Leveraging Cross-Attention Transformer and Multi-Feature Fusion for Cross-Linguistic Speech Emotion Recognition
by: Zhao, Ruoyu, et al.
Published: (2025)
by: Zhao, Ruoyu, et al.
Published: (2025)
Enhancing Speech Emotion Recognition with Graph-Based Multimodal Fusion and Prosodic Features for the Speech Emotion Recognition in Naturalistic Conditions Challenge at Interspeech 2025
by: Ferreira, Alef Iury Siqueira, et al.
Published: (2025)
by: Ferreira, Alef Iury Siqueira, et al.
Published: (2025)
Emotion Detection in Speech Using Lightweight and Transformer-Based Models: A Comparative and Ablation Study
by: Onyekwelu-Udoka, Lucky, et al.
Published: (2025)
by: Onyekwelu-Udoka, Lucky, et al.
Published: (2025)
MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention
by: Jiao, Xinxin, et al.
Published: (2024)
by: Jiao, Xinxin, et al.
Published: (2024)
MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion Recognition
by: Sun, Haiyang, et al.
Published: (2023)
by: Sun, Haiyang, et al.
Published: (2023)
MelShield: Robust Mel-Domain Audio Watermarking for Provenance Attribution of AI Generated Synthesized Speech
by: Jin, Yutong, et al.
Published: (2026)
by: Jin, Yutong, et al.
Published: (2026)
CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
by: Shao, Nian, et al.
Published: (2025)
by: Shao, Nian, et al.
Published: (2025)
Mel-McNet: A Mel-Scale Framework for Online Multichannel Speech Enhancement
by: Yang, Yujie, et al.
Published: (2025)
by: Yang, Yujie, et al.
Published: (2025)
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition
by: Liu, Lei, et al.
Published: (2024)
by: Liu, Lei, et al.
Published: (2024)
Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques
by: Li, Yuanchao, et al.
Published: (2024)
by: Li, Yuanchao, et al.
Published: (2024)
Graph Embedding with Mel-spectrograms for Underwater Acoustic Target Recognition
by: Feng, Sheng, et al.
Published: (2025)
by: Feng, Sheng, et al.
Published: (2025)
Persian Speech Emotion Recognition by Fine-Tuning Transformers
by: Shayaninasab, Minoo, et al.
Published: (2024)
by: Shayaninasab, Minoo, et al.
Published: (2024)
Recovering Performance in Speech Emotion Recognition from Discrete Tokens via Multi-Layer Fusion and Paralinguistic Feature Integration
by: Sun, Esther, et al.
Published: (2026)
by: Sun, Esther, et al.
Published: (2026)
Hybrid CNN-Transformer Architecture for Arabic Speech Emotion Recognition
by: Gheffari, Youcef Soufiane, et al.
Published: (2026)
by: Gheffari, Youcef Soufiane, et al.
Published: (2026)
WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition
by: Li, Feng, et al.
Published: (2024)
by: Li, Feng, et al.
Published: (2024)
Multi-Channel Speech Enhancement for Cocktail Party Speech Emotion Recognition
by: Chen, Youjun, et al.
Published: (2026)
by: Chen, Youjun, et al.
Published: (2026)
Spectro-Temporal Modulation Representation Framework for Human-Imitated Speech Detection
by: Zaman, Khalid, et al.
Published: (2026)
by: Zaman, Khalid, et al.
Published: (2026)
Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
by: Wang, Shih-heng, et al.
Published: (2024)
by: Wang, Shih-heng, et al.
Published: (2024)
dMel: Speech Tokenization made Simple
by: Bai, Richard He, et al.
Published: (2024)
by: Bai, Richard He, et al.
Published: (2024)
Speech Representation Analysis based on Inter- and Intra-Model Similarities
by: Kheir, Yassine El, et al.
Published: (2024)
by: Kheir, Yassine El, et al.
Published: (2024)
MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction
by: He, Jiajun, et al.
Published: (2024)
by: He, Jiajun, et al.
Published: (2024)
Efficient Finetuning for Dimensional Speech Emotion Recognition in the Age of Transformers
by: Sampath, Aneesha, et al.
Published: (2025)
by: Sampath, Aneesha, et al.
Published: (2025)
Speech Emotion Recognition with ASR Integration
by: Li, Yuanchao
Published: (2026)
by: Li, Yuanchao
Published: (2026)
Emotion Recognition in Multi-Speaker Conversations through Speaker Identification, Knowledge Distillation, and Hierarchical Fusion
by: Li, Xiao, et al.
Published: (2025)
by: Li, Xiao, et al.
Published: (2025)
Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition
by: Li, Qifei, et al.
Published: (2024)
by: Li, Qifei, et al.
Published: (2024)
Efficient VoIP Communications through LLM-based Real-Time Speech Reconstruction and Call Prioritization for Emergency Services
by: Venkateshperumal, Danush, et al.
Published: (2024)
by: Venkateshperumal, Danush, et al.
Published: (2024)
A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling Framework
by: Nan, Zheng, et al.
Published: (2024)
by: Nan, Zheng, et al.
Published: (2024)
Bridging ASR and LLMs for Dysarthric Speech Recognition: Benchmarking Self-Supervised and Generative Approaches
by: Aboeitta, Ahmed, et al.
Published: (2025)
by: Aboeitta, Ahmed, et al.
Published: (2025)
An LSTM-Based Chord Generation System Using Chroma Histogram Representations
by: Hardwick, Jack
Published: (2024)
by: Hardwick, Jack
Published: (2024)
Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition
by: Shen, Siyuan, et al.
Published: (2024)
by: Shen, Siyuan, et al.
Published: (2024)
Similar Items
-
MangoLeafViT: Leveraging Lightweight Vision Transformer with Runtime Augmentation for Efficient Mango Leaf Disease Classification
by: Chowdhury, Rafi Hassan, et al.
Published: (2025) -
Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
by: Chakrabarty, Sudip, et al.
Published: (2025) -
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
by: Liu, Huadai, et al.
Published: (2023) -
Deep Learning for Speech Emotion Recognition: A CNN Approach Utilizing Mel Spectrograms
by: Penumajji, Niketa
Published: (2025) -
Enhancing Speech Emotion Recognition with Multi-Task Learning and Dynamic Feature Fusion
by: Wang, Honghong, et al.
Published: (2025)