Saved in:
| Main Authors: | Akhtar, Mohd Mujtaba, Girish, Sheth, Farhan, Singh, Muskaan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.07064 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Attribution of Generators and Emotional Manipulation in Cross-Lingual Synthetic Speech using Geometric Learning
by: Girish, et al.
Published: (2025)
by: Girish, et al.
Published: (2025)
Curved Worlds, Clear Boundaries: Generalizing Speech Deepfake Detection using Hyperbolic and Spherical Geometry Spaces
by: Sheth, Farhan, et al.
Published: (2025)
by: Sheth, Farhan, et al.
Published: (2025)
Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition
by: Girish, et al.
Published: (2026)
by: Girish, et al.
Published: (2026)
HCFD: A Benchmark for Audio Deepfake Detection in Healthcare
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)
DIVINE: Coordinating Multimodal Disentangled Representations for Oro-Facial Neurological Disorder Assessment
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)
NeuRO: An Application for Code-Switched Autism Detection in Children
by: Akhtar, Mohd Mujtaba, et al.
Published: (2024)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2024)
Enhancing In-Domain and Out-Domain EmoFake Detection via Cooperative Multilingual Speech Foundation Models
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages
by: Girish, et al.
Published: (2026)
by: Girish, et al.
Published: (2026)
Towards Machine Unlearning for Paralinguistic Speech Processing
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Source Tracing of Synthetic Speech Systems Through Paralinguistic Pre-Trained Representations
by: Girish, et al.
Published: (2025)
by: Girish, et al.
Published: (2025)
HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Rethinking Cross-Corpus Speech Emotion Recognition Benchmarking: Are Paralinguistic Pre-Trained Representations Sufficient?
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Towards Neural Audio Codec Source Parsing
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Investigating Polyglot Speech Foundation Models for Learning Collective Emotion from Crowds
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Towards Source Attribution of Singing Voice Deepfake with Multimodal Foundation Models
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Beyond Speech and More: Investigating the Emergent Ability of Speech Foundation Models for Classifying Physiological Time-Series Signals
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Are Multimodal Foundation Models All That Is Needed for Emofake Detection?
by: Akhtar, Mohd Mujtaba, et al.
Published: (2025)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2025)
Speech Recognition Transformers: Topological-lingualism Perspective
by: Singh, Shruti, et al.
Published: (2024)
by: Singh, Shruti, et al.
Published: (2024)
Investigating the Reasonable Effectiveness of Speaker Pre-Trained Models and their Synergistic Power for SingMOS Prediction
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Towards Fusion of Neural Audio Codec-based Representations with Spectral for Heart Murmur Classification via Bandit-based Cross-Attention Mechanism
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Are Mamba-based Audio Foundation Models the Best Fit for Non-Verbal Emotion Recognition?
by: Akhtar, Mohd Mujtaba, et al.
Published: (2025)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2025)
Fine-Tuning ASR for Stuttered Speech: Personalized vs. Generalized Approaches
by: Mujtaba, Dena, et al.
Published: (2025)
by: Mujtaba, Dena, et al.
Published: (2025)
ComFeAT: Combination of Neural and Spectral Features for Improved Depression Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
SynHate: Detecting Hate Speech in Synthetic Deepfake Audio
by: Ranjan, Rishabh, et al.
Published: (2025)
by: Ranjan, Rishabh, et al.
Published: (2025)
Representation Loss Minimization with Randomized Selection Strategy for Efficient Environmental Fake Audio Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Speech-to-See: End-to-End Speech-Driven Open-Set Object Detection
by: Lu, Wenhuan, et al.
Published: (2025)
by: Lu, Wenhuan, et al.
Published: (2025)
Neural Encoding Detection is Not All You Need for Synthetic Speech Detection
by: Cuccovillo, Luca, et al.
Published: (2026)
by: Cuccovillo, Luca, et al.
Published: (2026)
Advancing Zero-Shot Open-Set Speech Deepfake Source Tracing
by: Chhibber, Manasi, et al.
Published: (2025)
by: Chhibber, Manasi, et al.
Published: (2025)
Augmenting Polish Automatic Speech Recognition System With Synthetic Data
by: Bondaruk, Łukasz, et al.
Published: (2024)
by: Bondaruk, Łukasz, et al.
Published: (2024)
Towards Explainable Spoofed Speech Attribution and Detection:a Probabilistic Approach for Characterizing Speech Synthesizer Components
by: Mishra, Jagabandhu, et al.
Published: (2025)
by: Mishra, Jagabandhu, et al.
Published: (2025)
Augmenting Open-Vocabulary Dysarthric Speech Assessment with Human Perceptual Supervision
by: Jia, Kaimeng, et al.
Published: (2025)
by: Jia, Kaimeng, et al.
Published: (2025)
Speaker Attributed Automatic Speech Recognition Using Speech Aware LLMS
by: Aronowitz, Hagai, et al.
Published: (2026)
by: Aronowitz, Hagai, et al.
Published: (2026)
Lightweight Model Attribution and Detection of Synthetic Speech via Audio Residual Fingerprints
by: Pizarro, Matías, et al.
Published: (2024)
by: Pizarro, Matías, et al.
Published: (2024)
Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information
by: Zhang, Xiangyu, et al.
Published: (2025)
by: Zhang, Xiangyu, et al.
Published: (2025)
FakeMark: Deepfake Speech Attribution With Watermarked Artifacts
by: Ge, Wanying, et al.
Published: (2025)
by: Ge, Wanying, et al.
Published: (2025)
Schrödinger Bridge for Generative Speech Enhancement
by: Jukić, Ante, et al.
Published: (2024)
by: Jukić, Ante, et al.
Published: (2024)
Creating Personalized Synthetic Voices from Articulation Impaired Speech Using Augmented Reconstruction Loss
by: Tian, Yusheng, et al.
Published: (2024)
by: Tian, Yusheng, et al.
Published: (2024)
Dual-Branch Knowledge Distillation for Noise-Robust Synthetic Speech Detection
by: Fan, Cunhang, et al.
Published: (2023)
by: Fan, Cunhang, et al.
Published: (2023)
Similar Items
-
Towards Attribution of Generators and Emotional Manipulation in Cross-Lingual Synthetic Speech using Geometric Learning
by: Girish, et al.
Published: (2025) -
Curved Worlds, Clear Boundaries: Generalizing Speech Deepfake Detection using Hyperbolic and Spherical Geometry Spaces
by: Sheth, Farhan, et al.
Published: (2025) -
Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition
by: Girish, et al.
Published: (2026) -
HCFD: A Benchmark for Audio Deepfake Detection in Healthcare
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026) -
DIVINE: Coordinating Multimodal Disentangled Representations for Oro-Facial Neurological Disorder Assessment
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)