:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ahmed, Faria, Chowdhury, Rafi Hassan, Moon, Fatema Tuz Zohora, Ahmed, Sabbir
Format:	Preprint
Published:	2026
Subjects:	Sound Machine Learning
Online Access:	https://arxiv.org/abs/2603.00746
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MangoLeafViT: Leveraging Lightweight Vision Transformer with Runtime Augmentation for Efficient Mango Leaf Disease Classification
by: Chowdhury, Rafi Hassan, et al.
Published: (2025)

Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
by: Chakrabarty, Sudip, et al.
Published: (2025)

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
by: Liu, Huadai, et al.
Published: (2023)

Deep Learning for Speech Emotion Recognition: A CNN Approach Utilizing Mel Spectrograms
by: Penumajji, Niketa
Published: (2025)

Enhancing Speech Emotion Recognition with Multi-Task Learning and Dynamic Feature Fusion
by: Wang, Honghong, et al.
Published: (2025)

Real-Time Speech Enhancement via a Hybrid ViT: A Dual-Input Acoustic-Image Feature Fusion
by: Bahmei, Behnaz, et al.
Published: (2025)

Re-Parameterization of Lightweight Transformer for On-Device Speech Emotion Recognition
by: Zhang, Zixing, et al.
Published: (2024)

DMF2Mel: A Dynamic Multiscale Fusion Network for EEG-Driven Mel Spectrogram Reconstruction
by: Fan, Cunhang, et al.
Published: (2025)

Bimodal Connection Attention Fusion for Speech Emotion Recognition
by: Luo, Jiachen, et al.
Published: (2025)

LPGNet: A Lightweight Network with Parallel Attention and Gated Fusion for Multimodal Emotion Recognition
by: He, Zhining, et al.
Published: (2025)

Leveraging Cross-Attention Transformer and Multi-Feature Fusion for Cross-Linguistic Speech Emotion Recognition
by: Zhao, Ruoyu, et al.
Published: (2025)

Enhancing Speech Emotion Recognition with Graph-Based Multimodal Fusion and Prosodic Features for the Speech Emotion Recognition in Naturalistic Conditions Challenge at Interspeech 2025
by: Ferreira, Alef Iury Siqueira, et al.
Published: (2025)

Emotion Detection in Speech Using Lightweight and Transformer-Based Models: A Comparative and Ablation Study
by: Onyekwelu-Udoka, Lucky, et al.
Published: (2025)

MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention
by: Jiao, Xinxin, et al.
Published: (2024)

MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion Recognition
by: Sun, Haiyang, et al.
Published: (2023)

MelShield: Robust Mel-Domain Audio Watermarking for Provenance Attribution of AI Generated Synthesized Speech
by: Jin, Yutong, et al.
Published: (2026)

CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
by: Shao, Nian, et al.
Published: (2025)

Mel-McNet: A Mel-Scale Framework for Online Multichannel Speech Enhancement
by: Yang, Yujie, et al.
Published: (2025)

Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition
by: Liu, Lei, et al.
Published: (2024)

Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques
by: Li, Yuanchao, et al.
Published: (2024)

Graph Embedding with Mel-spectrograms for Underwater Acoustic Target Recognition
by: Feng, Sheng, et al.
Published: (2025)

Persian Speech Emotion Recognition by Fine-Tuning Transformers
by: Shayaninasab, Minoo, et al.
Published: (2024)

Recovering Performance in Speech Emotion Recognition from Discrete Tokens via Multi-Layer Fusion and Paralinguistic Feature Integration
by: Sun, Esther, et al.
Published: (2026)

Hybrid CNN-Transformer Architecture for Arabic Speech Emotion Recognition
by: Gheffari, Youcef Soufiane, et al.
Published: (2026)

WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition
by: Li, Feng, et al.
Published: (2024)

Multi-Channel Speech Enhancement for Cocktail Party Speech Emotion Recognition
by: Chen, Youjun, et al.
Published: (2026)

Spectro-Temporal Modulation Representation Framework for Human-Imitated Speech Detection
by: Zaman, Khalid, et al.
Published: (2026)

Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
by: Wang, Shih-heng, et al.
Published: (2024)

dMel: Speech Tokenization made Simple
by: Bai, Richard He, et al.
Published: (2024)

Speech Representation Analysis based on Inter- and Intra-Model Similarities
by: Kheir, Yassine El, et al.
Published: (2024)

MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction
by: He, Jiajun, et al.
Published: (2024)

Efficient Finetuning for Dimensional Speech Emotion Recognition in the Age of Transformers
by: Sampath, Aneesha, et al.
Published: (2025)

Speech Emotion Recognition with ASR Integration
by: Li, Yuanchao
Published: (2026)

Emotion Recognition in Multi-Speaker Conversations through Speaker Identification, Knowledge Distillation, and Hierarchical Fusion
by: Li, Xiao, et al.
Published: (2025)

Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition
by: Li, Qifei, et al.
Published: (2024)

Efficient VoIP Communications through LLM-based Real-Time Speech Reconstruction and Call Prioritization for Emergency Services
by: Venkateshperumal, Danush, et al.
Published: (2024)

A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling Framework
by: Nan, Zheng, et al.
Published: (2024)

Bridging ASR and LLMs for Dysarthric Speech Recognition: Benchmarking Self-Supervised and Generative Approaches
by: Aboeitta, Ahmed, et al.
Published: (2025)

An LSTM-Based Chord Generation System Using Chroma Histogram Representations
by: Hardwick, Jack
Published: (2024)

Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition
by: Shen, Siyuan, et al.
Published: (2024)