:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kankanala, Sai Samrat, Chandra, Ram, Ganapathy, Sriram
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2509.17965
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Uncovering the role of semantic and acoustic cues in normal and dichotic listening
by: Kankanala, Sai Samrat, et al.
Published: (2024)

Leveraging Content and Acoustic Representations for Speech Emotion Recognition
by: Dutta, Soumya, et al.
Published: (2024)

Spoken Language Understanding on Unseen Tasks With In-Context Learning
by: Agrawal, Neeraj, et al.
Published: (2025)

Textless and Non-Parallel Speech-to-Speech Emotion Style Transfer
by: Dutta, Soumya, et al.
Published: (2025)

ULTRAS -- Unified Learning of Transformer Representations for Audio and Speech Signals
by: E, Ameenudeen P, et al.
Published: (2026)

LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations
by: Dutta, Soumya, et al.
Published: (2025)

STAB: Speech Tokenizer Assessment Benchmark
by: Vashishth, Shikhar, et al.
Published: (2024)

Towards the Next Frontier in Speech Representation Learning Using Disentanglement
by: Krishna, Varun, et al.
Published: (2024)

HCAM -- Hierarchical Cross Attention Model for Multi-modal Emotion Recognition
by: Dutta, Soumya, et al.
Published: (2023)

Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
by: Dutta, Soumya, et al.
Published: (2024)

End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization
by: Singh, Prachi, et al.
Published: (2024)

Benchmarking Speech Systems for Frontline Health Conversations: The DISPLACE-M Challenge
by: E, Dhanya, et al.
Published: (2026)

A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations
by: Dutta, Soumya, et al.
Published: (2026)

ABHINAYA -- A System for Speech Emotion Recognition In Naturalistic Conditions Challenge
by: Dutta, Soumya, et al.
Published: (2025)

Benchmarking Large Pretrained Multilingual Models on Québec French Speech Recognition
by: Serrand, Coralie, et al.
Published: (2025)

Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning
by: Bhattacharya, Debarpan, et al.
Published: (2025)

Comparison of Speech Tasks in Human Expert and Machine Detection of Parkinson's Disease
by: Plantinga, Peter, et al.
Published: (2025)

MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech
by: Chen, Huakang, et al.
Published: (2026)

Voices of Civilizations: A Multilingual QA Benchmark for Global Music Understanding
by: Wu, Shangda, et al.
Published: (2026)

Summary on The Multilingual Conversational Speech Language Model Challenge: Datasets, Tasks, Baselines, and Methods
by: Mu, Bingshen, et al.
Published: (2025)

EmoSSLSphere: Multilingual Emotional Speech Synthesis with Spherical Vectors and Discrete Speech Tokens
by: Park, Joonyong, et al.
Published: (2025)

Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model
by: Li, Guojian, et al.
Published: (2026)

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
by: Shi, Jiatong, et al.
Published: (2023)

Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation
by: Akarsh, Sai, et al.
Published: (2024)

Open-Source System for Multilingual Translation and Cloned Speech Synthesis
by: Cámara, Mateo, et al.
Published: (2025)

VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
by: Zhou, Yixuan, et al.
Published: (2024)

Multilingual Source Tracing of Speech Deepfakes: A First Benchmark
by: Xuan, Xi, et al.
Published: (2025)

Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training
by: Denisov, Pavel, et al.
Published: (2024)

Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition
by: Girish, et al.
Published: (2026)

S2ST-Omni: Hierarchical Language-Aware SpeechLLM Adaptation for Multilingual Speech-to-Speech Translation
by: Pan, Yu, et al.
Published: (2025)

Layer-wise Analysis for Quality of Multilingual Synthesized Speech
by: Cooper, Erica, et al.
Published: (2025)

Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
by: Liao, Shijia, et al.
Published: (2024)

Multilingual Speech Recognition Using Discrete Tokens with a Two-step Training Strategy
by: Li, Zehan, et al.
Published: (2025)

P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge
by: Sach, Marvin, et al.
Published: (2025)

TASU: Text-Only Alignment for Speech Understanding
by: Peng, Jing, et al.
Published: (2025)

Objective Soups: Multilingual Multi-Task Modeling for Speech Processing
by: Saif, A F M, et al.
Published: (2025)

Benchmarking Neural Speech Codec Intelligibility with SITool
by: Leschanowsky, Anna, et al.
Published: (2025)

A Multilingual Framework for Dysarthria: Detection, Severity Classification, Speech-to-Text, and Clean Speech Generation
by: Raghu, Ananya, et al.
Published: (2025)

Heterogeneity over Homogeneity: Investigating Multilingual Speech Pre-Trained Models for Detecting Audio Deepfake
by: Phukan, Orchid Chetia, et al.
Published: (2024)

A Survey on Speech Large Language Models for Understanding
by: Peng, Jing, et al.
Published: (2024)