Saved in:
| Main Authors: | Gopal, Shreyas, Anshul, Ashutosh, Li, Haoyang, Yeo, Yue Heng, Liu, Hexin, Chng, Eng Siong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.25150 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Language-Aware Distillation for Multilingual Instruction-Following Speech LLMs with ASR-Only Supervision
by: Gopal, Shreyas, et al.
Published: (2026)
by: Gopal, Shreyas, et al.
Published: (2026)
Bi-directional Context-Enhanced Speech Large Language Models for Multilingual Conversational ASR
by: Peng, Yizhou, et al.
Published: (2025)
by: Peng, Yizhou, et al.
Published: (2025)
Next-Frame Feature Prediction for Multimodal Deepfake Detection and Temporal Localization
by: Anshul, Ashutosh, et al.
Published: (2025)
by: Anshul, Ashutosh, et al.
Published: (2025)
Improving Code-Switching Speech Recognition with TTS Data Augmentation
by: Yeo, Yue Heng, et al.
Published: (2026)
by: Yeo, Yue Heng, et al.
Published: (2026)
NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025
by: Peng, Yizhou, et al.
Published: (2025)
by: Peng, Yizhou, et al.
Published: (2025)
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs
by: Yuhang, Yang, et al.
Published: (2024)
by: Yuhang, Yang, et al.
Published: (2024)
DepFlow: Disentangled Speech Generation to Mitigate Semantic Bias in Depression Detection
by: Li, Yuxin, et al.
Published: (2026)
by: Li, Yuxin, et al.
Published: (2026)
Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech
by: Li, Yuxin, et al.
Published: (2025)
by: Li, Yuxin, et al.
Published: (2025)
Speechless: Speech Instruction Training Without Speech for Low Resource Languages
by: Dao, Alan, et al.
Published: (2025)
by: Dao, Alan, et al.
Published: (2025)
Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning
by: He, Haorui, et al.
Published: (2024)
by: He, Haorui, et al.
Published: (2024)
Continual Learning Optimizations for Auto-regressive Decoder of Multilingual ASR systems
by: Kwok, Chin Yuen, et al.
Published: (2024)
by: Kwok, Chin Yuen, et al.
Published: (2024)
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
by: Hu, Yuchen, et al.
Published: (2023)
by: Hu, Yuchen, et al.
Published: (2023)
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
Zero-shot Context Biasing with Trie-based Decoding using Synthetic Multi-Pronunciation
by: Liu, Changsong, et al.
Published: (2025)
by: Liu, Changsong, et al.
Published: (2025)
Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English
by: Zhang, Haoyang, et al.
Published: (2025)
by: Zhang, Haoyang, et al.
Published: (2025)
Punctuation Restoration for Singaporean Spoken Languages: English, Malay, and Mandarin
by: Rao, Abhinav, et al.
Published: (2022)
by: Rao, Abhinav, et al.
Published: (2022)
Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models
by: Wu, Donghang, et al.
Published: (2025)
by: Wu, Donghang, et al.
Published: (2025)
Chronological Thinking in Full-Duplex Spoken Dialogue Language Models
by: Wu, Donghang, et al.
Published: (2025)
by: Wu, Donghang, et al.
Published: (2025)
Improving Synthetic Data Training for Contextual Biasing Models with a Keyword-Aware Cost Function
by: Kwok, Chin Yuen, et al.
Published: (2025)
by: Kwok, Chin Yuen, et al.
Published: (2025)
Continual Learning with Embedding Layer Surgery and Task-wise Beam Search using Whisper
by: Kwok, Chin Yuen, et al.
Published: (2025)
by: Kwok, Chin Yuen, et al.
Published: (2025)
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
by: Chen, Chen, et al.
Published: (2024)
by: Chen, Chen, et al.
Published: (2024)
Code-switching Speech Recognition Under the Lens: Model- and Data-Centric Perspectives
by: Liu, Hexin, et al.
Published: (2025)
by: Liu, Hexin, et al.
Published: (2025)
DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications
by: Suresh, Sathya Krishnan, et al.
Published: (2024)
by: Suresh, Sathya Krishnan, et al.
Published: (2024)
Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
CS-Sum: A Benchmark for Code-Switching Dialogue Summarization and the Limits of Large Language Models
by: Suresh, Sathya Krishnan, et al.
Published: (2025)
by: Suresh, Sathya Krishnan, et al.
Published: (2025)
Large Language Models Meet Contrastive Learning: Zero-Shot Emotion Recognition Across Languages
by: Zou, Heqing, et al.
Published: (2025)
by: Zou, Heqing, et al.
Published: (2025)
Cross-modal Consistency Guidance for Robust Emotion Control in Auto-Regressive TTS Models
by: Peng, Yizhou, et al.
Published: (2025)
by: Peng, Yizhou, et al.
Published: (2025)
A Comprehensive Study on the Effectiveness of ASR Representations for Noise-Robust Speech Emotion Recognition
by: Shi, Xiaohan, et al.
Published: (2023)
by: Shi, Xiaohan, et al.
Published: (2023)
EASY: Emotion-aware Speaker Anonymization via Factorized Distillation
by: Yao, Jixun, et al.
Published: (2025)
by: Yao, Jixun, et al.
Published: (2025)
GenTSE: Enhancing Target Speaker Extraction via a Coarse-to-Fine Generative Language Model
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
Aligning Speech to Languages to Enhance Code-switching Speech Recognition
by: Liu, Hexin, et al.
Published: (2024)
by: Liu, Hexin, et al.
Published: (2024)
Training-Free Intelligibility-Guided Observation Addition for Noisy ASR
by: Li, Haoyang, et al.
Published: (2026)
by: Li, Haoyang, et al.
Published: (2026)
Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
by: Ma, Ziyang, et al.
Published: (2025)
by: Ma, Ziyang, et al.
Published: (2025)
Speech Enhancement Using Continuous Embeddings of Neural Audio Codec
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
by: Yao, Jixun, et al.
Published: (2025)
by: Yao, Jixun, et al.
Published: (2025)
Proactive for Uncertainty: Cause-Aware Error Diagnosis and Interactive Clarification for Spoken Dialogue Systems
by: Peng, Yizhou, et al.
Published: (2026)
by: Peng, Yizhou, et al.
Published: (2026)
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
by: Chen, Chen, et al.
Published: (2025)
by: Chen, Chen, et al.
Published: (2025)
Text-based Talking Video Editing with Cascaded Conditional Diffusion
by: Han, Bo, et al.
Published: (2024)
by: Han, Bo, et al.
Published: (2024)
Similar Items
-
Language-Aware Distillation for Multilingual Instruction-Following Speech LLMs with ASR-Only Supervision
by: Gopal, Shreyas, et al.
Published: (2026) -
Bi-directional Context-Enhanced Speech Large Language Models for Multilingual Conversational ASR
by: Peng, Yizhou, et al.
Published: (2025) -
Next-Frame Feature Prediction for Multimodal Deepfake Detection and Temporal Localization
by: Anshul, Ashutosh, et al.
Published: (2025) -
Improving Code-Switching Speech Recognition with TTS Data Augmentation
by: Yeo, Yue Heng, et al.
Published: (2026) -
NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025
by: Peng, Yizhou, et al.
Published: (2025)