:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Girish, Akhtar, Mohd Mujtaba, Sheth, Farhan, Singh, Muskaan
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2511.10790
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Bridging Attribution and Open-Set Detection using Graph-Augmented Instance Learning in Synthetic Speech
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)

Curved Worlds, Clear Boundaries: Generalizing Speech Deepfake Detection using Hyperbolic and Spherical Geometry Spaces
by: Sheth, Farhan, et al.
Published: (2025)

Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition
by: Girish, et al.
Published: (2026)

HCFD: A Benchmark for Audio Deepfake Detection in Healthcare
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)

DIVINE: Coordinating Multimodal Disentangled Representations for Oro-Facial Neurological Disorder Assessment
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)

NeuRO: An Application for Code-Switched Autism Detection in Children
by: Akhtar, Mohd Mujtaba, et al.
Published: (2024)

Towards Machine Unlearning for Paralinguistic Speech Processing
by: Phukan, Orchid Chetia, et al.
Published: (2025)

Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages
by: Girish, et al.
Published: (2026)

Rethinking Cross-Corpus Speech Emotion Recognition Benchmarking: Are Paralinguistic Pre-Trained Representations Sufficient?
by: Phukan, Orchid Chetia, et al.
Published: (2025)

Enhancing In-Domain and Out-Domain EmoFake Detection via Cooperative Multilingual Speech Foundation Models
by: Phukan, Orchid Chetia, et al.
Published: (2025)

HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2025)

Source Tracing of Synthetic Speech Systems Through Paralinguistic Pre-Trained Representations
by: Girish, et al.
Published: (2025)

Towards Neural Audio Codec Source Parsing
by: Phukan, Orchid Chetia, et al.
Published: (2025)

Investigating Polyglot Speech Foundation Models for Learning Collective Emotion from Crowds
by: Phukan, Orchid Chetia, et al.
Published: (2025)

PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2025)

Towards Source Attribution of Singing Voice Deepfake with Multimodal Foundation Models
by: Phukan, Orchid Chetia, et al.
Published: (2025)

Towards Fusion of Neural Audio Codec-based Representations with Spectral for Heart Murmur Classification via Bandit-based Cross-Attention Mechanism
by: Phukan, Orchid Chetia, et al.
Published: (2025)

Are Mamba-based Audio Foundation Models the Best Fit for Non-Verbal Emotion Recognition?
by: Akhtar, Mohd Mujtaba, et al.
Published: (2025)

Beyond Speech and More: Investigating the Emergent Ability of Speech Foundation Models for Classifying Physiological Time-Series Signals
by: Phukan, Orchid Chetia, et al.
Published: (2024)

Speech Recognition Transformers: Topological-lingualism Perspective
by: Singh, Shruti, et al.
Published: (2024)

Investigating the Reasonable Effectiveness of Speaker Pre-Trained Models and their Synergistic Power for SingMOS Prediction
by: Phukan, Orchid Chetia, et al.
Published: (2025)

Semantic-Emotional Resonance Embedding: A Semi-Supervised Paradigm for Cross-Lingual Speech Emotion Recognition
by: Zhao, Ya, et al.
Published: (2026)

VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
by: Gudmalwar, Ashishkumar, et al.
Published: (2024)

Are Multimodal Foundation Models All That Is Needed for Emofake Detection?
by: Akhtar, Mohd Mujtaba, et al.
Published: (2025)

Fine-Tuning ASR for Stuttered Speech: Personalized vs. Generalized Approaches
by: Mujtaba, Dena, et al.
Published: (2025)

XEmoRAG: Cross-Lingual Emotion Transfer with Controllable Intensity Using Retrieval-Augmented Generation
by: Zuo, Tianlun, et al.
Published: (2025)

A Layer-Anchoring Strategy for Enhancing Cross-Lingual Speech Emotion Recognition
by: Upadhyay, Shreya G., et al.
Published: (2024)

Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2024)

Quantifying Cross-Lingual Transfer in Paralinguistic Speech Tasks
by: Buitrago, Pol, et al.
Published: (2026)

Towards Improved Speech Recognition through Optimized Synthetic Data Generation
by: Perrin, Yanis, et al.
Published: (2025)

ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis
by: Tang, Haobin, et al.
Published: (2024)

Towards Frame-level Quality Predictions of Synthetic Speech
by: Kuhlmann, Michael, et al.
Published: (2025)

Improved Cross-Lingual Transfer Learning For Automatic Speech Translation
by: Khurana, Sameer, et al.
Published: (2023)

ASR for Affective Speech: Investigating Impact of Emotion and Speech Generative Strategy
by: Wu, Ya-Tse, et al.
Published: (2026)

Towards Explainable Spoofed Speech Attribution and Detection:a Probabilistic Approach for Characterizing Speech Synthesizer Components
by: Mishra, Jagabandhu, et al.
Published: (2025)

XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
by: Han, HyoJung, et al.
Published: (2024)

One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech
by: Abebe, Amanuel Gizachew, et al.
Published: (2026)

Toward using Speech to Sense Student Emotion in Remote Learning Environments
by: Vyas, Sargam, et al.
Published: (2026)

Emotion-Aware Contrastive Adaptation Network for Source-Free Cross-Corpus Speech Emotion Recognition
by: Zhao, Yan, et al.
Published: (2024)

ComFeAT: Combination of Neural and Spectral Features for Improved Depression Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)