Saved in:
| Main Authors: | Agrawal, Neeraj, Ganapathy, Sriram |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.07731 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards the Next Frontier in Speech Representation Learning Using Disentanglement
by: Krishna, Varun, et al.
Published: (2024)
by: Krishna, Varun, et al.
Published: (2024)
"KAN you hear me?" Exploring Kolmogorov-Arnold Networks for Spoken Language Understanding
by: Koudounas, Alkis, et al.
Published: (2025)
by: Koudounas, Alkis, et al.
Published: (2025)
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning
by: Bhattacharya, Debarpan, et al.
Published: (2025)
by: Bhattacharya, Debarpan, et al.
Published: (2025)
Exploring Fine-Tuning of Large Audio Language Models for Spoken Language Understanding under Limited Speech Data
by: Choi, Youngwon, et al.
Published: (2025)
by: Choi, Youngwon, et al.
Published: (2025)
HCAM -- Hierarchical Cross Attention Model for Multi-modal Emotion Recognition
by: Dutta, Soumya, et al.
Published: (2023)
by: Dutta, Soumya, et al.
Published: (2023)
Improving Self-supervised Pre-training using Accent-Specific Codebooks
by: Prabhu, Darshan, et al.
Published: (2024)
by: Prabhu, Darshan, et al.
Published: (2024)
Unseen Speaker and Language Adaptation for Lightweight Text-To-Speech with Adapters
by: Falai, Alessio, et al.
Published: (2025)
by: Falai, Alessio, et al.
Published: (2025)
Towards End-to-End Spoken Grammatical Error Correction
by: Bannò, Stefano, et al.
Published: (2023)
by: Bannò, Stefano, et al.
Published: (2023)
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
by: Dutta, Soumya, et al.
Published: (2024)
by: Dutta, Soumya, et al.
Published: (2024)
A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations
by: Dutta, Soumya, et al.
Published: (2026)
by: Dutta, Soumya, et al.
Published: (2026)
Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks
by: Everson, Kevin, et al.
Published: (2024)
by: Everson, Kevin, et al.
Published: (2024)
Spoken Language Intelligence of Large Language Models for Language Learning
by: Peng, Linkai, et al.
Published: (2023)
by: Peng, Linkai, et al.
Published: (2023)
Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
by: Futami, Hayato, et al.
Published: (2024)
by: Futami, Hayato, et al.
Published: (2024)
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
by: Chang, Heng-Jui, et al.
Published: (2024)
by: Chang, Heng-Jui, et al.
Published: (2024)
SDiaReward: Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness
by: Lu, Jingyu, et al.
Published: (2026)
by: Lu, Jingyu, et al.
Published: (2026)
Medical Spoken Named Entity Recognition
by: Le-Duc, Khai, et al.
Published: (2024)
by: Le-Duc, Khai, et al.
Published: (2024)
End-to-End Spoken Grammatical Error Correction
by: Qian, Mengjie, et al.
Published: (2025)
by: Qian, Mengjie, et al.
Published: (2025)
Spoken DialogSum: An Emotion-Rich Conversational Dataset for Spoken Dialogue Summarization
by: Lu, Yen-Ju, et al.
Published: (2025)
by: Lu, Yen-Ju, et al.
Published: (2025)
Aligning Spoken Dialogue Models from User Interactions
by: Wu, Anne, et al.
Published: (2025)
by: Wu, Anne, et al.
Published: (2025)
UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions
by: Arora, Siddhant, et al.
Published: (2023)
by: Arora, Siddhant, et al.
Published: (2023)
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
by: Nachmani, Eliya, et al.
Published: (2023)
by: Nachmani, Eliya, et al.
Published: (2023)
Zero-Shot End-To-End Spoken Question Answering In Medical Domain
by: Labrak, Yanis, et al.
Published: (2024)
by: Labrak, Yanis, et al.
Published: (2024)
WavChat: A Survey of Spoken Dialogue Models
by: Ji, Shengpeng, et al.
Published: (2024)
by: Ji, Shengpeng, et al.
Published: (2024)
A Variational Framework for Improving Naturalness in Generative Spoken Language Models
by: Chen, Li-Wei, et al.
Published: (2025)
by: Chen, Li-Wei, et al.
Published: (2025)
Benchmarking Humans and Machines on Complex Multilingual Speech Understanding Tasks
by: Kankanala, Sai Samrat, et al.
Published: (2025)
by: Kankanala, Sai Samrat, et al.
Published: (2025)
On the Evaluation of Speech Foundation Models for Spoken Language Understanding
by: Arora, Siddhant, et al.
Published: (2024)
by: Arora, Siddhant, et al.
Published: (2024)
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
by: Hsiao, Chi-Yuan, et al.
Published: (2025)
by: Hsiao, Chi-Yuan, et al.
Published: (2025)
Label-Context-Dependent Internal Language Model Estimation for CTC
by: Yang, Zijian, et al.
Published: (2025)
by: Yang, Zijian, et al.
Published: (2025)
Turbocharge Speech Understanding with Pilot Inference
by: Wang, Rongxiang, et al.
Published: (2023)
by: Wang, Rongxiang, et al.
Published: (2023)
Evaluating and Improving Continual Learning in Spoken Language Understanding
by: Yang, Muqiao, et al.
Published: (2024)
by: Yang, Muqiao, et al.
Published: (2024)
Interventional Speech Noise Injection for ASR Generalizable Spoken Language Understanding
by: Jung, Yeonjoon, et al.
Published: (2024)
by: Jung, Yeonjoon, et al.
Published: (2024)
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
by: Kuan, Chun-Yi, et al.
Published: (2024)
by: Kuan, Chun-Yi, et al.
Published: (2024)
FeruzaSpeech: A 60 Hour Uzbek Read Speech Corpus with Punctuation, Casing, and Context
by: Povey, Anna, et al.
Published: (2024)
by: Povey, Anna, et al.
Published: (2024)
The Speech-LLM Takes It All: A Truly Fully End-to-End Spoken Dialogue State Tracking Approach
by: Ghazal, Nizar El, et al.
Published: (2025)
by: Ghazal, Nizar El, et al.
Published: (2025)
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
by: Ghosh, Sreyan, et al.
Published: (2025)
by: Ghosh, Sreyan, et al.
Published: (2025)
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark
by: Wang, Dingdong, et al.
Published: (2025)
by: Wang, Dingdong, et al.
Published: (2025)
PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding
by: Le, Trang, et al.
Published: (2024)
by: Le, Trang, et al.
Published: (2024)
Towards Hierarchical Spoken Language Dysfluency Modeling
by: Lian, Jiachen, et al.
Published: (2024)
by: Lian, Jiachen, et al.
Published: (2024)
Privacy-Preserving End-to-End Spoken Language Understanding
by: Wang, Yinggui, et al.
Published: (2024)
by: Wang, Yinggui, et al.
Published: (2024)
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
by: Huang, Chien-yu, et al.
Published: (2024)
by: Huang, Chien-yu, et al.
Published: (2024)
Similar Items
-
Towards the Next Frontier in Speech Representation Learning Using Disentanglement
by: Krishna, Varun, et al.
Published: (2024) -
"KAN you hear me?" Exploring Kolmogorov-Arnold Networks for Spoken Language Understanding
by: Koudounas, Alkis, et al.
Published: (2025) -
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning
by: Bhattacharya, Debarpan, et al.
Published: (2025) -
Exploring Fine-Tuning of Large Audio Language Models for Spoken Language Understanding under Limited Speech Data
by: Choi, Youngwon, et al.
Published: (2025) -
HCAM -- Hierarchical Cross Attention Model for Multi-modal Emotion Recognition
by: Dutta, Soumya, et al.
Published: (2023)