Saved in:
| Main Authors: | Kankanala, Sai Samrat, Chandra, Ram, Ganapathy, Sriram |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.17965 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Uncovering the role of semantic and acoustic cues in normal and dichotic listening
by: Kankanala, Sai Samrat, et al.
Published: (2024)
by: Kankanala, Sai Samrat, et al.
Published: (2024)
Leveraging Content and Acoustic Representations for Speech Emotion Recognition
by: Dutta, Soumya, et al.
Published: (2024)
by: Dutta, Soumya, et al.
Published: (2024)
Spoken Language Understanding on Unseen Tasks With In-Context Learning
by: Agrawal, Neeraj, et al.
Published: (2025)
by: Agrawal, Neeraj, et al.
Published: (2025)
Textless and Non-Parallel Speech-to-Speech Emotion Style Transfer
by: Dutta, Soumya, et al.
Published: (2025)
by: Dutta, Soumya, et al.
Published: (2025)
ULTRAS -- Unified Learning of Transformer Representations for Audio and Speech Signals
by: E, Ameenudeen P, et al.
Published: (2026)
by: E, Ameenudeen P, et al.
Published: (2026)
LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations
by: Dutta, Soumya, et al.
Published: (2025)
by: Dutta, Soumya, et al.
Published: (2025)
STAB: Speech Tokenizer Assessment Benchmark
by: Vashishth, Shikhar, et al.
Published: (2024)
by: Vashishth, Shikhar, et al.
Published: (2024)
Towards the Next Frontier in Speech Representation Learning Using Disentanglement
by: Krishna, Varun, et al.
Published: (2024)
by: Krishna, Varun, et al.
Published: (2024)
HCAM -- Hierarchical Cross Attention Model for Multi-modal Emotion Recognition
by: Dutta, Soumya, et al.
Published: (2023)
by: Dutta, Soumya, et al.
Published: (2023)
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
by: Dutta, Soumya, et al.
Published: (2024)
by: Dutta, Soumya, et al.
Published: (2024)
End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization
by: Singh, Prachi, et al.
Published: (2024)
by: Singh, Prachi, et al.
Published: (2024)
Benchmarking Speech Systems for Frontline Health Conversations: The DISPLACE-M Challenge
by: E, Dhanya, et al.
Published: (2026)
by: E, Dhanya, et al.
Published: (2026)
A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations
by: Dutta, Soumya, et al.
Published: (2026)
by: Dutta, Soumya, et al.
Published: (2026)
ABHINAYA -- A System for Speech Emotion Recognition In Naturalistic Conditions Challenge
by: Dutta, Soumya, et al.
Published: (2025)
by: Dutta, Soumya, et al.
Published: (2025)
Benchmarking Large Pretrained Multilingual Models on Québec French Speech Recognition
by: Serrand, Coralie, et al.
Published: (2025)
by: Serrand, Coralie, et al.
Published: (2025)
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning
by: Bhattacharya, Debarpan, et al.
Published: (2025)
by: Bhattacharya, Debarpan, et al.
Published: (2025)
Comparison of Speech Tasks in Human Expert and Machine Detection of Parkinson's Disease
by: Plantinga, Peter, et al.
Published: (2025)
by: Plantinga, Peter, et al.
Published: (2025)
MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech
by: Chen, Huakang, et al.
Published: (2026)
by: Chen, Huakang, et al.
Published: (2026)
Voices of Civilizations: A Multilingual QA Benchmark for Global Music Understanding
by: Wu, Shangda, et al.
Published: (2026)
by: Wu, Shangda, et al.
Published: (2026)
Summary on The Multilingual Conversational Speech Language Model Challenge: Datasets, Tasks, Baselines, and Methods
by: Mu, Bingshen, et al.
Published: (2025)
by: Mu, Bingshen, et al.
Published: (2025)
EmoSSLSphere: Multilingual Emotional Speech Synthesis with Spherical Vectors and Discrete Speech Tokens
by: Park, Joonyong, et al.
Published: (2025)
by: Park, Joonyong, et al.
Published: (2025)
Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model
by: Li, Guojian, et al.
Published: (2026)
by: Li, Guojian, et al.
Published: (2026)
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
by: Shi, Jiatong, et al.
Published: (2023)
by: Shi, Jiatong, et al.
Published: (2023)
Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation
by: Akarsh, Sai, et al.
Published: (2024)
by: Akarsh, Sai, et al.
Published: (2024)
Open-Source System for Multilingual Translation and Cloned Speech Synthesis
by: Cámara, Mateo, et al.
Published: (2025)
by: Cámara, Mateo, et al.
Published: (2025)
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
by: Zhou, Yixuan, et al.
Published: (2024)
by: Zhou, Yixuan, et al.
Published: (2024)
Multilingual Source Tracing of Speech Deepfakes: A First Benchmark
by: Xuan, Xi, et al.
Published: (2025)
by: Xuan, Xi, et al.
Published: (2025)
Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training
by: Denisov, Pavel, et al.
Published: (2024)
by: Denisov, Pavel, et al.
Published: (2024)
Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition
by: Girish, et al.
Published: (2026)
by: Girish, et al.
Published: (2026)
S2ST-Omni: Hierarchical Language-Aware SpeechLLM Adaptation for Multilingual Speech-to-Speech Translation
by: Pan, Yu, et al.
Published: (2025)
by: Pan, Yu, et al.
Published: (2025)
Layer-wise Analysis for Quality of Multilingual Synthesized Speech
by: Cooper, Erica, et al.
Published: (2025)
by: Cooper, Erica, et al.
Published: (2025)
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
by: Liao, Shijia, et al.
Published: (2024)
by: Liao, Shijia, et al.
Published: (2024)
Multilingual Speech Recognition Using Discrete Tokens with a Two-step Training Strategy
by: Li, Zehan, et al.
Published: (2025)
by: Li, Zehan, et al.
Published: (2025)
P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge
by: Sach, Marvin, et al.
Published: (2025)
by: Sach, Marvin, et al.
Published: (2025)
TASU: Text-Only Alignment for Speech Understanding
by: Peng, Jing, et al.
Published: (2025)
by: Peng, Jing, et al.
Published: (2025)
Objective Soups: Multilingual Multi-Task Modeling for Speech Processing
by: Saif, A F M, et al.
Published: (2025)
by: Saif, A F M, et al.
Published: (2025)
Benchmarking Neural Speech Codec Intelligibility with SITool
by: Leschanowsky, Anna, et al.
Published: (2025)
by: Leschanowsky, Anna, et al.
Published: (2025)
A Multilingual Framework for Dysarthria: Detection, Severity Classification, Speech-to-Text, and Clean Speech Generation
by: Raghu, Ananya, et al.
Published: (2025)
by: Raghu, Ananya, et al.
Published: (2025)
Heterogeneity over Homogeneity: Investigating Multilingual Speech Pre-Trained Models for Detecting Audio Deepfake
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
A Survey on Speech Large Language Models for Understanding
by: Peng, Jing, et al.
Published: (2024)
by: Peng, Jing, et al.
Published: (2024)
Similar Items
-
Uncovering the role of semantic and acoustic cues in normal and dichotic listening
by: Kankanala, Sai Samrat, et al.
Published: (2024) -
Leveraging Content and Acoustic Representations for Speech Emotion Recognition
by: Dutta, Soumya, et al.
Published: (2024) -
Spoken Language Understanding on Unseen Tasks With In-Context Learning
by: Agrawal, Neeraj, et al.
Published: (2025) -
Textless and Non-Parallel Speech-to-Speech Emotion Style Transfer
by: Dutta, Soumya, et al.
Published: (2025) -
ULTRAS -- Unified Learning of Transformer Representations for Audio and Speech Signals
by: E, Ameenudeen P, et al.
Published: (2026)