:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Agrawal, Neeraj, Ganapathy, Sriram
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Machine Learning Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2505.07731
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards the Next Frontier in Speech Representation Learning Using Disentanglement
by: Krishna, Varun, et al.
Published: (2024)

"KAN you hear me?" Exploring Kolmogorov-Arnold Networks for Spoken Language Understanding
by: Koudounas, Alkis, et al.
Published: (2025)

Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning
by: Bhattacharya, Debarpan, et al.
Published: (2025)

Exploring Fine-Tuning of Large Audio Language Models for Spoken Language Understanding under Limited Speech Data
by: Choi, Youngwon, et al.
Published: (2025)

HCAM -- Hierarchical Cross Attention Model for Multi-modal Emotion Recognition
by: Dutta, Soumya, et al.
Published: (2023)

Improving Self-supervised Pre-training using Accent-Specific Codebooks
by: Prabhu, Darshan, et al.
Published: (2024)

Unseen Speaker and Language Adaptation for Lightweight Text-To-Speech with Adapters
by: Falai, Alessio, et al.
Published: (2025)

Towards End-to-End Spoken Grammatical Error Correction
by: Bannò, Stefano, et al.
Published: (2023)

Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
by: Dutta, Soumya, et al.
Published: (2024)

A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations
by: Dutta, Soumya, et al.
Published: (2026)

Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks
by: Everson, Kevin, et al.
Published: (2024)

Spoken Language Intelligence of Large Language Models for Language Learning
by: Peng, Linkai, et al.
Published: (2023)

Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
by: Futami, Hayato, et al.
Published: (2024)

DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
by: Chang, Heng-Jui, et al.
Published: (2024)

SDiaReward: Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness
by: Lu, Jingyu, et al.
Published: (2026)

Medical Spoken Named Entity Recognition
by: Le-Duc, Khai, et al.
Published: (2024)

End-to-End Spoken Grammatical Error Correction
by: Qian, Mengjie, et al.
Published: (2025)

Spoken DialogSum: An Emotion-Rich Conversational Dataset for Spoken Dialogue Summarization
by: Lu, Yen-Ju, et al.
Published: (2025)

Aligning Spoken Dialogue Models from User Interactions
by: Wu, Anne, et al.
Published: (2025)

UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions
by: Arora, Siddhant, et al.
Published: (2023)

Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
by: Nachmani, Eliya, et al.
Published: (2023)

Zero-Shot End-To-End Spoken Question Answering In Medical Domain
by: Labrak, Yanis, et al.
Published: (2024)

WavChat: A Survey of Spoken Dialogue Models
by: Ji, Shengpeng, et al.
Published: (2024)

A Variational Framework for Improving Naturalness in Generative Spoken Language Models
by: Chen, Li-Wei, et al.
Published: (2025)

Benchmarking Humans and Machines on Complex Multilingual Speech Understanding Tasks
by: Kankanala, Sai Samrat, et al.
Published: (2025)

On the Evaluation of Speech Foundation Models for Spoken Language Understanding
by: Arora, Siddhant, et al.
Published: (2024)

Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
by: Hsiao, Chi-Yuan, et al.
Published: (2025)

Label-Context-Dependent Internal Language Model Estimation for CTC
by: Yang, Zijian, et al.
Published: (2025)

Turbocharge Speech Understanding with Pilot Inference
by: Wang, Rongxiang, et al.
Published: (2023)

Evaluating and Improving Continual Learning in Spoken Language Understanding
by: Yang, Muqiao, et al.
Published: (2024)

Interventional Speech Noise Injection for ASR Generalizable Spoken Language Understanding
by: Jung, Yeonjoon, et al.
Published: (2024)

Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
by: Kuan, Chun-Yi, et al.
Published: (2024)

FeruzaSpeech: A 60 Hour Uzbek Read Speech Corpus with Punctuation, Casing, and Context
by: Povey, Anna, et al.
Published: (2024)

The Speech-LLM Takes It All: A Truly Fully End-to-End Spoken Dialogue State Tracking Approach
by: Ghazal, Nizar El, et al.
Published: (2025)

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
by: Ghosh, Sreyan, et al.
Published: (2025)

MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark
by: Wang, Dingdong, et al.
Published: (2025)

PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding
by: Le, Trang, et al.
Published: (2024)

Towards Hierarchical Spoken Language Dysfluency Modeling
by: Lian, Jiachen, et al.
Published: (2024)

Privacy-Preserving End-to-End Spoken Language Understanding
by: Wang, Yinggui, et al.
Published: (2024)

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
by: Huang, Chien-yu, et al.
Published: (2024)