Saved in:
| Main Authors: | Akhtar, Mohd Mujtaba, Girish, Phukan, Orchid Chetia, Behera, Swarup Ranjan, Reddy, Pailla Balakrishna, Nayak, Ananda Chandra, Nayak, Sanjib Kumar, Buduru, Arun Balaji |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.16193 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Are Mamba-based Audio Foundation Models the Best Fit for Non-Verbal Emotion Recognition?
by: Akhtar, Mohd Mujtaba, et al.
Published: (2025)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2025)
HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Rethinking Cross-Corpus Speech Emotion Recognition Benchmarking: Are Paralinguistic Pre-Trained Representations Sufficient?
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Investigating Polyglot Speech Foundation Models for Learning Collective Emotion from Crowds
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Investigating the Reasonable Effectiveness of Speaker Pre-Trained Models and their Synergistic Power for SingMOS Prediction
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Towards Source Attribution of Singing Voice Deepfake with Multimodal Foundation Models
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Source Tracing of Synthetic Speech Systems Through Paralinguistic Pre-Trained Representations
by: Girish, et al.
Published: (2025)
by: Girish, et al.
Published: (2025)
Enhancing In-Domain and Out-Domain EmoFake Detection via Cooperative Multilingual Speech Foundation Models
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Beyond Speech and More: Investigating the Emergent Ability of Speech Foundation Models for Classifying Physiological Time-Series Signals
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages
by: Girish, et al.
Published: (2026)
by: Girish, et al.
Published: (2026)
Towards Neural Audio Codec Source Parsing
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Towards Fusion of Neural Audio Codec-based Representations with Spectral for Heart Murmur Classification via Bandit-based Cross-Attention Mechanism
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Towards Machine Unlearning for Paralinguistic Speech Processing
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
SNIFR : Boosting Fine-Grained Child Harmful Content Detection Through Audio-Visual Alignment with Cascaded Cross-Transformer
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Representation Loss Minimization with Randomized Selection Strategy for Efficient Environmental Fake Audio Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Multi-View Multi-Task Modeling with Speech Foundation Models for Speech Forensic Tasks
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Investigating Prosodic Signatures via Speech Pre-Trained Models for Audio Deepfake Source Attribution
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
SeQuiFi: Mitigating Catastrophic Forgetting in Speech Emotion Recognition with Sequential Class-Finetuning
by: Jain, Sarthak, et al.
Published: (2024)
by: Jain, Sarthak, et al.
Published: (2024)
NeuRO: An Application for Code-Switched Autism Detection in Children
by: Akhtar, Mohd Mujtaba, et al.
Published: (2024)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2024)
Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
CoLLAB: A Collaborative Approach for Multilingual Abuse Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Towards Multilingual Audio-Visual Question Answering
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
AVR: Synergizing Foundation Models for Audio-Visual Humor Detection
by: Sharma, Sarthak, et al.
Published: (2024)
by: Sharma, Sarthak, et al.
Published: (2024)
The Reasonable Effectiveness of Speaker Embeddings for Violence Detection
by: Jain, Sarthak, et al.
Published: (2024)
by: Jain, Sarthak, et al.
Published: (2024)
Heterogeneity over Homogeneity: Investigating Multilingual Speech Pre-Trained Models for Detecting Audio Deepfake
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Are Paralinguistic Representations all that is needed for Speech Emotion Recognition?
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
PERSONA: An Application for Emotion Recognition, Gender Recognition and Age Estimation
by: Koshal, Devyani, et al.
Published: (2024)
by: Koshal, Devyani, et al.
Published: (2024)
VoxMed: One-Step Respiratory Disease Classifier using Digital Stethoscope Sounds
by: Mundra, Paridhi, et al.
Published: (2024)
by: Mundra, Paridhi, et al.
Published: (2024)
ASGIR: Audio Spectrogram Transformer Guided Classification And Information Retrieval For Birds
by: Chaudhuri, Yashwardhan, et al.
Published: (2024)
by: Chaudhuri, Yashwardhan, et al.
Published: (2024)
Avengers Assemble: Amalgamation of Non-Semantic Features for Depression Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
ComFeAT: Combination of Neural and Spectral Features for Improved Depression Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Modality-Order Matters! A Novel Hierarchical Feature Fusion Method for CoSAm: A Code-Switched Autism Corpus
by: Akhtar, Mohd Mujtaba, et al.
Published: (2024)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2024)
SONIC: Synergizing VisiON Foundation Models for Stress RecogNItion from ECG signals
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition
by: Girish, et al.
Published: (2026)
by: Girish, et al.
Published: (2026)
HCFD: A Benchmark for Audio Deepfake Detection in Healthcare
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)
AQUALLM: Audio Question Answering Data Generation Using Large Language Models
by: Behera, Swarup Ranjan, et al.
Published: (2023)
by: Behera, Swarup Ranjan, et al.
Published: (2023)
DIVINE: Coordinating Multimodal Disentangled Representations for Oro-Facial Neurological Disorder Assessment
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)
by: Akhtar, Mohd Mujtaba, et al.
Published: (2026)
A Lightweight Feature Fusion Architecture For Resource-Constrained Crowd Counting
by: Chaudhuri, Yashwardhan, et al.
Published: (2024)
by: Chaudhuri, Yashwardhan, et al.
Published: (2024)
Similar Items
-
Are Mamba-based Audio Foundation Models the Best Fit for Non-Verbal Emotion Recognition?
by: Akhtar, Mohd Mujtaba, et al.
Published: (2025) -
HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2025) -
Rethinking Cross-Corpus Speech Emotion Recognition Benchmarking: Are Paralinguistic Pre-Trained Representations Sufficient?
by: Phukan, Orchid Chetia, et al.
Published: (2025) -
Investigating Polyglot Speech Foundation Models for Learning Collective Emotion from Crowds
by: Phukan, Orchid Chetia, et al.
Published: (2025) -
Investigating the Reasonable Effectiveness of Speaker Pre-Trained Models and their Synergistic Power for SingMOS Prediction
by: Phukan, Orchid Chetia, et al.
Published: (2025)