Saved in:
| Main Authors: | Girish, Akhtar, Mohd Mujtaba, Sheth, Farhan, Singh, Muskaan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.10790 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Bridging Attribution and Open-Set Detection using Graph-Augmented Instance Learning in Synthetic Speech
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)
Curved Worlds, Clear Boundaries: Generalizing Speech Deepfake Detection using Hyperbolic and Spherical Geometry Spaces
by: Sheth, Farhan, et al.
Published: (2025)
by: Sheth, Farhan, et al.
Published: (2025)
Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition
by: Girish, et al.
Published: (2026)
by: Girish, et al.
Published: (2026)
HCFD: A Benchmark for Audio Deepfake Detection in Healthcare
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)
DIVINE: Coordinating Multimodal Disentangled Representations for Oro-Facial Neurological Disorder Assessment
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)
NeuRO: An Application for Code-Switched Autism Detection in Children
by: Akhtar, Mohd Mujtaba, et al.
Published: (2024)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2024)
Towards Machine Unlearning for Paralinguistic Speech Processing
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages
by: Girish, et al.
Published: (2026)
by: Girish, et al.
Published: (2026)
Rethinking Cross-Corpus Speech Emotion Recognition Benchmarking: Are Paralinguistic Pre-Trained Representations Sufficient?
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Enhancing In-Domain and Out-Domain EmoFake Detection via Cooperative Multilingual Speech Foundation Models
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Source Tracing of Synthetic Speech Systems Through Paralinguistic Pre-Trained Representations
by: Girish, et al.
Published: (2025)
by: Girish, et al.
Published: (2025)
Towards Neural Audio Codec Source Parsing
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Investigating Polyglot Speech Foundation Models for Learning Collective Emotion from Crowds
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Towards Source Attribution of Singing Voice Deepfake with Multimodal Foundation Models
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Towards Fusion of Neural Audio Codec-based Representations with Spectral for Heart Murmur Classification via Bandit-based Cross-Attention Mechanism
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Are Mamba-based Audio Foundation Models the Best Fit for Non-Verbal Emotion Recognition?
by: Akhtar, Mohd Mujtaba, et al.
Published: (2025)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2025)
Beyond Speech and More: Investigating the Emergent Ability of Speech Foundation Models for Classifying Physiological Time-Series Signals
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Speech Recognition Transformers: Topological-lingualism Perspective
by: Singh, Shruti, et al.
Published: (2024)
by: Singh, Shruti, et al.
Published: (2024)
Investigating the Reasonable Effectiveness of Speaker Pre-Trained Models and their Synergistic Power for SingMOS Prediction
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Semantic-Emotional Resonance Embedding: A Semi-Supervised Paradigm for Cross-Lingual Speech Emotion Recognition
by: Zhao, Ya, et al.
Published: (2026)
by: Zhao, Ya, et al.
Published: (2026)
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
by: Gudmalwar, Ashishkumar, et al.
Published: (2024)
by: Gudmalwar, Ashishkumar, et al.
Published: (2024)
Are Multimodal Foundation Models All That Is Needed for Emofake Detection?
by: Akhtar, Mohd Mujtaba, et al.
Published: (2025)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2025)
Fine-Tuning ASR for Stuttered Speech: Personalized vs. Generalized Approaches
by: Mujtaba, Dena, et al.
Published: (2025)
by: Mujtaba, Dena, et al.
Published: (2025)
XEmoRAG: Cross-Lingual Emotion Transfer with Controllable Intensity Using Retrieval-Augmented Generation
by: Zuo, Tianlun, et al.
Published: (2025)
by: Zuo, Tianlun, et al.
Published: (2025)
A Layer-Anchoring Strategy for Enhancing Cross-Lingual Speech Emotion Recognition
by: Upadhyay, Shreya G., et al.
Published: (2024)
by: Upadhyay, Shreya G., et al.
Published: (2024)
Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Quantifying Cross-Lingual Transfer in Paralinguistic Speech Tasks
by: Buitrago, Pol, et al.
Published: (2026)
by: Buitrago, Pol, et al.
Published: (2026)
Towards Improved Speech Recognition through Optimized Synthetic Data Generation
by: Perrin, Yanis, et al.
Published: (2025)
by: Perrin, Yanis, et al.
Published: (2025)
ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis
by: Tang, Haobin, et al.
Published: (2024)
by: Tang, Haobin, et al.
Published: (2024)
Towards Frame-level Quality Predictions of Synthetic Speech
by: Kuhlmann, Michael, et al.
Published: (2025)
by: Kuhlmann, Michael, et al.
Published: (2025)
Improved Cross-Lingual Transfer Learning For Automatic Speech Translation
by: Khurana, Sameer, et al.
Published: (2023)
by: Khurana, Sameer, et al.
Published: (2023)
ASR for Affective Speech: Investigating Impact of Emotion and Speech Generative Strategy
by: Wu, Ya-Tse, et al.
Published: (2026)
by: Wu, Ya-Tse, et al.
Published: (2026)
Towards Explainable Spoofed Speech Attribution and Detection:a Probabilistic Approach for Characterizing Speech Synthesizer Components
by: Mishra, Jagabandhu, et al.
Published: (2025)
by: Mishra, Jagabandhu, et al.
Published: (2025)
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
by: Han, HyoJung, et al.
Published: (2024)
by: Han, HyoJung, et al.
Published: (2024)
One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech
by: Abebe, Amanuel Gizachew, et al.
Published: (2026)
by: Abebe, Amanuel Gizachew, et al.
Published: (2026)
Toward using Speech to Sense Student Emotion in Remote Learning Environments
by: Vyas, Sargam, et al.
Published: (2026)
by: Vyas, Sargam, et al.
Published: (2026)
Emotion-Aware Contrastive Adaptation Network for Source-Free Cross-Corpus Speech Emotion Recognition
by: Zhao, Yan, et al.
Published: (2024)
by: Zhao, Yan, et al.
Published: (2024)
ComFeAT: Combination of Neural and Spectral Features for Improved Depression Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Similar Items
-
Bridging Attribution and Open-Set Detection using Graph-Augmented Instance Learning in Synthetic Speech
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026) -
Curved Worlds, Clear Boundaries: Generalizing Speech Deepfake Detection using Hyperbolic and Spherical Geometry Spaces
by: Sheth, Farhan, et al.
Published: (2025) -
Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition
by: Girish, et al.
Published: (2026) -
HCFD: A Benchmark for Audio Deepfake Detection in Healthcare
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026) -
DIVINE: Coordinating Multimodal Disentangled Representations for Oro-Facial Neurological Disorder Assessment
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)