Saved in:
| Main Authors: | Akhtar, Mohd Mujtaba, Girish, Singh, Muskaan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.07014 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition
by: Girish, et al.
Published: (2026)
by: Girish, et al.
Published: (2026)
HCFD: A Benchmark for Audio Deepfake Detection in Healthcare
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)
Towards Machine Unlearning for Paralinguistic Speech Processing
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Bridging Attribution and Open-Set Detection using Graph-Augmented Instance Learning in Synthetic Speech
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)
Towards Attribution of Generators and Emotional Manipulation in Cross-Lingual Synthetic Speech using Geometric Learning
by: Girish, et al.
Published: (2025)
by: Girish, et al.
Published: (2025)
Curved Worlds, Clear Boundaries: Generalizing Speech Deepfake Detection using Hyperbolic and Spherical Geometry Spaces
by: Sheth, Farhan, et al.
Published: (2025)
by: Sheth, Farhan, et al.
Published: (2025)
NeuRO: An Application for Code-Switched Autism Detection in Children
by: Akhtar, Mohd Mujtaba, et al.
Published: (2024)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2024)
Towards Neural Audio Codec Source Parsing
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Source Tracing of Synthetic Speech Systems Through Paralinguistic Pre-Trained Representations
by: Girish, et al.
Published: (2025)
by: Girish, et al.
Published: (2025)
Towards Fusion of Neural Audio Codec-based Representations with Spectral for Heart Murmur Classification via Bandit-based Cross-Attention Mechanism
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Towards Source Attribution of Singing Voice Deepfake with Multimodal Foundation Models
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Investigating the Reasonable Effectiveness of Speaker Pre-Trained Models and their Synergistic Power for SingMOS Prediction
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Are Mamba-based Audio Foundation Models the Best Fit for Non-Verbal Emotion Recognition?
by: Akhtar, Mohd Mujtaba, et al.
Published: (2025)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2025)
Investigating Polyglot Speech Foundation Models for Learning Collective Emotion from Crowds
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Speech Recognition Transformers: Topological-lingualism Perspective
by: Singh, Shruti, et al.
Published: (2024)
by: Singh, Shruti, et al.
Published: (2024)
Representation Loss Minimization with Randomized Selection Strategy for Efficient Environmental Fake Audio Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Learning Multidimensional Disentangled Representations of Instrumental Sounds for Musical Similarity Assessment
by: Hashizume, Yuka, et al.
Published: (2024)
by: Hashizume, Yuka, et al.
Published: (2024)
ComFeAT: Combination of Neural and Spectral Features for Improved Depression Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Fine-Tuning ASR for Stuttered Speech: Personalized vs. Generalized Approaches
by: Mujtaba, Dena, et al.
Published: (2025)
by: Mujtaba, Dena, et al.
Published: (2025)
HASRD: Hierarchical Acoustic and Semantic Representation Disentanglement
by: Hussein, Amir, et al.
Published: (2025)
by: Hussein, Amir, et al.
Published: (2025)
Disentangled Representation Learning for Environment-agnostic Speaker Recognition
by: Nam, KiHyun, et al.
Published: (2024)
by: Nam, KiHyun, et al.
Published: (2024)
Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages
by: Girish, et al.
Published: (2026)
by: Girish, et al.
Published: (2026)
Enhancing In-Domain and Out-Domain EmoFake Detection via Cooperative Multilingual Speech Foundation Models
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Geometric Analysis of Speech Representation Spaces: Topological Disentanglement and Confound Detection
by: Kashyap, Bipasha, et al.
Published: (2026)
by: Kashyap, Bipasha, et al.
Published: (2026)
Balancing Information Preservation and Disentanglement in Self-Supervised Music Representation Learning
by: Wilkins, Julia, et al.
Published: (2025)
by: Wilkins, Julia, et al.
Published: (2025)
Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval
by: Deng, Yimin, et al.
Published: (2024)
by: Deng, Yimin, et al.
Published: (2024)
Self-Supervised Multi-View Learning for Disentangled Music Audio Representations
by: Wilkins, Julia, et al.
Published: (2024)
by: Wilkins, Julia, et al.
Published: (2024)
Self-supervised Multimodal Speech Representations for the Assessment of Schizophrenia Symptoms
by: Premananth, Gowtham, et al.
Published: (2024)
by: Premananth, Gowtham, et al.
Published: (2024)
Quantifying Dimensional Independence in Speech: An Information-Theoretic Framework for Disentangled Representation Learning
by: Kashyap, Bipasha, et al.
Published: (2026)
by: Kashyap, Bipasha, et al.
Published: (2026)
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
by: Deng, Yimin, et al.
Published: (2024)
by: Deng, Yimin, et al.
Published: (2024)
Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation
by: Xin, Yifei, et al.
Published: (2024)
by: Xin, Yifei, et al.
Published: (2024)
Disentangled Acoustic Fields For Multimodal Physical Scene Understanding
by: Yin, Jie, et al.
Published: (2024)
by: Yin, Jie, et al.
Published: (2024)
Learning Disentangled Speech Representations
by: Brima, Yusuf, et al.
Published: (2023)
by: Brima, Yusuf, et al.
Published: (2023)
A Frequency-aware Augmentation Network for Mental Disorders Assessment from Audio
by: Li, Shuanglin, et al.
Published: (2025)
by: Li, Shuanglin, et al.
Published: (2025)
Deep Speech Synthesis from Multimodal Articulatory Representations
by: Wu, Peter, et al.
Published: (2024)
by: Wu, Peter, et al.
Published: (2024)
Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning
by: Shi, Runwu, et al.
Published: (2024)
by: Shi, Runwu, et al.
Published: (2024)
Evaluating Disentangled Representations for Controllable Music Generation
by: Ibáñez-Martínez, Laura, et al.
Published: (2026)
by: Ibáñez-Martínez, Laura, et al.
Published: (2026)
Distillation and Pruning for Scalable Self-Supervised Representation-Based Speech Quality Assessment
by: Stahl, Benjamin, et al.
Published: (2025)
by: Stahl, Benjamin, et al.
Published: (2025)
Similar Items
-
Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition
by: Girish, et al.
Published: (2026) -
HCFD: A Benchmark for Audio Deepfake Detection in Healthcare
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026) -
Towards Machine Unlearning for Paralinguistic Speech Processing
by: Phukan, Orchid Chetia, et al.
Published: (2025) -
Bridging Attribution and Open-Set Detection using Graph-Augmented Instance Learning in Synthetic Speech
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026) -
Towards Attribution of Generators and Emotional Manipulation in Cross-Lingual Synthetic Speech using Geometric Learning
by: Girish, et al.
Published: (2025)