Saved in:
| Main Authors: | Zhang, Yuchen, Shekhar, Ravi, Mouratidis, Haralambos |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.18899 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Speak in Context: Multilingual ASR with Speech Context Alignment via Contrastive Learning
by: Zhang, Yuchen, et al.
Published: (2026)
by: Zhang, Yuchen, et al.
Published: (2026)
Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition
by: Wang, Peng, et al.
Published: (2026)
by: Wang, Peng, et al.
Published: (2026)
CMT-LLM: Contextual Multi-Talker ASR Utilizing Large Language Models
by: He, Jiajun, et al.
Published: (2025)
by: He, Jiajun, et al.
Published: (2025)
Revealing the Role of Audio Channels in ASR Performance Degradation
by: Huang, Kuan-Tang, et al.
Published: (2025)
by: Huang, Kuan-Tang, et al.
Published: (2025)
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
by: Ma, Ziyang, et al.
Published: (2024)
by: Ma, Ziyang, et al.
Published: (2024)
On the Role of Encoder Depth: Pruning Whisper and LoRA Fine-Tuning in SLAM-ASR
by: Kolluri, Ganesh Pavan Kartikeya Bharadwaj, et al.
Published: (2026)
by: Kolluri, Ganesh Pavan Kartikeya Bharadwaj, et al.
Published: (2026)
PSRB: A Comprehensive Benchmark for Evaluating Persian ASR Systems
by: Sedghiyeh, Nima, et al.
Published: (2025)
by: Sedghiyeh, Nima, et al.
Published: (2025)
Nwāchā Munā: A Devanagari Speech Corpus and Proximal Transfer Benchmark for Nepal Bhasha ASR
by: Sharma, Rishikesh Kumar, et al.
Published: (2026)
by: Sharma, Rishikesh Kumar, et al.
Published: (2026)
Language-Aware Prompt Tuning for Parameter-Efficient Seamless Language Expansion in Multilingual ASR
by: Yang, Hongli, et al.
Published: (2025)
by: Yang, Hongli, et al.
Published: (2025)
Fun-ASR Technical Report
by: An, Keyu, et al.
Published: (2025)
by: An, Keyu, et al.
Published: (2025)
SloPal: A 60-Million-Word Slovak Parliamentary Corpus with Aligned Speech and Fine-Tuned ASR Models
by: Božík, Erik, et al.
Published: (2025)
by: Božík, Erik, et al.
Published: (2025)
A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario
by: Song, Zheshu, et al.
Published: (2024)
by: Song, Zheshu, et al.
Published: (2024)
Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error Correction
by: Li, Yuanchao, et al.
Published: (2024)
by: Li, Yuanchao, et al.
Published: (2024)
Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation
by: Srivastav, Vaibhav, et al.
Published: (2025)
by: Srivastav, Vaibhav, et al.
Published: (2025)
SUTA-LM: Bridging Test-Time Adaptation and Language Model Rescoring for Robust ASR
by: Huang, Wei-Ping, et al.
Published: (2025)
by: Huang, Wei-Ping, et al.
Published: (2025)
Articulation-Informed ASR: Integrating Articulatory Features into ASR via Auxiliary Speech Inversion and Cross-Attention Fusion
by: Attia, Ahmed Adel, et al.
Published: (2025)
by: Attia, Ahmed Adel, et al.
Published: (2025)
VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining
by: Zhuo, Jianheng, et al.
Published: (2025)
by: Zhuo, Jianheng, et al.
Published: (2025)
Efficient Adaptation of Multilingual Models for Japanese ASR
by: Bajo, Mark, et al.
Published: (2024)
by: Bajo, Mark, et al.
Published: (2024)
PARCO: Phoneme-Augmented Robust Contextual ASR via Contrastive Entity Disambiguation
by: He, Jiajun, et al.
Published: (2025)
by: He, Jiajun, et al.
Published: (2025)
Exploring ASR-Based Wav2Vec2 for Automated Speech Disorder Assessment: Insights and Analysis
by: Nguyen, Tuan, et al.
Published: (2024)
by: Nguyen, Tuan, et al.
Published: (2024)
Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition
by: Ginjala, Srishti, et al.
Published: (2026)
by: Ginjala, Srishti, et al.
Published: (2026)
Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla
by: Ridoy, Md Sazzadul Islam, et al.
Published: (2025)
by: Ridoy, Md Sazzadul Islam, et al.
Published: (2025)
Weak Supervision Techniques towards Enhanced ASR Models in Industry-level CRM Systems
by: Wang, Zhongsheng, et al.
Published: (2025)
by: Wang, Zhongsheng, et al.
Published: (2025)
AdaCS: Adaptive Normalization for Enhanced Code-Switching ASR
by: Chu, The Chuong, et al.
Published: (2025)
by: Chu, The Chuong, et al.
Published: (2025)
VoxRole: A Comprehensive Benchmark for Evaluating Speech-Based Role-Playing Agents
by: Wu, Weihao, et al.
Published: (2025)
by: Wu, Weihao, et al.
Published: (2025)
Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation
by: Feng, Bo-Han, et al.
Published: (2026)
by: Feng, Bo-Han, et al.
Published: (2026)
Decoder-only Conformer with Modality-aware Sparse Mixtures of Experts for ASR
by: Lee, Jaeyoung, et al.
Published: (2026)
by: Lee, Jaeyoung, et al.
Published: (2026)
FlanEC: Exploring Flan-T5 for Post-ASR Error Correction
by: La Quatra, Moreno, et al.
Published: (2025)
by: La Quatra, Moreno, et al.
Published: (2025)
Improving endpoint detection in end-to-end streaming ASR for conversational speech
by: C, Anandh, et al.
Published: (2025)
by: C, Anandh, et al.
Published: (2025)
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
by: Kumar, Shashi, et al.
Published: (2024)
by: Kumar, Shashi, et al.
Published: (2024)
Do Compact SSL Backbones Matter for Audio Deepfake Detection? A Controlled Study with RAPTOR
by: Kulkarni, Ajinkya, et al.
Published: (2026)
by: Kulkarni, Ajinkya, et al.
Published: (2026)
EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs
by: Lin, Liang, et al.
Published: (2026)
by: Lin, Liang, et al.
Published: (2026)
A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data
by: Chou, Cheng-Kang, et al.
Published: (2025)
by: Chou, Cheng-Kang, et al.
Published: (2025)
Assessing Latency in ASR Systems: A Methodological Perspective for Real-Time Use
by: Arriaga, Carlos, et al.
Published: (2024)
by: Arriaga, Carlos, et al.
Published: (2024)
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
by: Xie, Jiamin, et al.
Published: (2023)
by: Xie, Jiamin, et al.
Published: (2023)
Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models
by: Shakhadri, Syed Abdul Gaffar, et al.
Published: (2025)
by: Shakhadri, Syed Abdul Gaffar, et al.
Published: (2025)
Probing the Hidden Talent of ASR Foundation Models for L2 English Oral Assessment
by: Chao, Fu-An, et al.
Published: (2025)
by: Chao, Fu-An, et al.
Published: (2025)
WESR: Scaling and Evaluating Word-level Event-Speech Recognition
by: Yang, Chenchen, et al.
Published: (2026)
by: Yang, Chenchen, et al.
Published: (2026)
TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree
by: Andrusenko, Andrei, et al.
Published: (2025)
by: Andrusenko, Andrei, et al.
Published: (2025)
Temporal Order Preserved Optimal Transport-based Cross-modal Knowledge Transfer Learning for ASR
by: Lu, Xugang, et al.
Published: (2024)
by: Lu, Xugang, et al.
Published: (2024)
Similar Items
-
Speak in Context: Multilingual ASR with Speech Context Alignment via Contrastive Learning
by: Zhang, Yuchen, et al.
Published: (2026) -
Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition
by: Wang, Peng, et al.
Published: (2026) -
CMT-LLM: Contextual Multi-Talker ASR Utilizing Large Language Models
by: He, Jiajun, et al.
Published: (2025) -
Revealing the Role of Audio Channels in ASR Performance Degradation
by: Huang, Kuan-Tang, et al.
Published: (2025) -
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
by: Ma, Ziyang, et al.
Published: (2024)