:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Yuchen, Shekhar, Ravi, Mouratidis, Haralambos
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence Sound
Online Access:	https://arxiv.org/abs/2601.18899
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Speak in Context: Multilingual ASR with Speech Context Alignment via Contrastive Learning
by: Zhang, Yuchen, et al.
Published: (2026)

Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition
by: Wang, Peng, et al.
Published: (2026)

CMT-LLM: Contextual Multi-Talker ASR Utilizing Large Language Models
by: He, Jiajun, et al.
Published: (2025)

Revealing the Role of Audio Channels in ASR Performance Degradation
by: Huang, Kuan-Tang, et al.
Published: (2025)

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
by: Ma, Ziyang, et al.
Published: (2024)

On the Role of Encoder Depth: Pruning Whisper and LoRA Fine-Tuning in SLAM-ASR
by: Kolluri, Ganesh Pavan Kartikeya Bharadwaj, et al.
Published: (2026)

PSRB: A Comprehensive Benchmark for Evaluating Persian ASR Systems
by: Sedghiyeh, Nima, et al.
Published: (2025)

Nwāchā Munā: A Devanagari Speech Corpus and Proximal Transfer Benchmark for Nepal Bhasha ASR
by: Sharma, Rishikesh Kumar, et al.
Published: (2026)

Language-Aware Prompt Tuning for Parameter-Efficient Seamless Language Expansion in Multilingual ASR
by: Yang, Hongli, et al.
Published: (2025)

Fun-ASR Technical Report
by: An, Keyu, et al.
Published: (2025)

SloPal: A 60-Million-Word Slovak Parliamentary Corpus with Aligned Speech and Fine-Tuned ASR Models
by: Božík, Erik, et al.
Published: (2025)

A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario
by: Song, Zheshu, et al.
Published: (2024)

Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error Correction
by: Li, Yuanchao, et al.
Published: (2024)

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation
by: Srivastav, Vaibhav, et al.
Published: (2025)

SUTA-LM: Bridging Test-Time Adaptation and Language Model Rescoring for Robust ASR
by: Huang, Wei-Ping, et al.
Published: (2025)

Articulation-Informed ASR: Integrating Articulatory Features into ASR via Auxiliary Speech Inversion and Cross-Attention Fusion
by: Attia, Ahmed Adel, et al.
Published: (2025)

VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining
by: Zhuo, Jianheng, et al.
Published: (2025)

Efficient Adaptation of Multilingual Models for Japanese ASR
by: Bajo, Mark, et al.
Published: (2024)

PARCO: Phoneme-Augmented Robust Contextual ASR via Contrastive Entity Disambiguation
by: He, Jiajun, et al.
Published: (2025)

Exploring ASR-Based Wav2Vec2 for Automated Speech Disorder Assessment: Insights and Analysis
by: Nguyen, Tuan, et al.
Published: (2024)

Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition
by: Ginjala, Srishti, et al.
Published: (2026)

Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla
by: Ridoy, Md Sazzadul Islam, et al.
Published: (2025)

Weak Supervision Techniques towards Enhanced ASR Models in Industry-level CRM Systems
by: Wang, Zhongsheng, et al.
Published: (2025)

AdaCS: Adaptive Normalization for Enhanced Code-Switching ASR
by: Chu, The Chuong, et al.
Published: (2025)

VoxRole: A Comprehensive Benchmark for Evaluating Speech-Based Role-Playing Agents
by: Wu, Weihao, et al.
Published: (2025)

Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation
by: Feng, Bo-Han, et al.
Published: (2026)

Decoder-only Conformer with Modality-aware Sparse Mixtures of Experts for ASR
by: Lee, Jaeyoung, et al.
Published: (2026)

FlanEC: Exploring Flan-T5 for Post-ASR Error Correction
by: La Quatra, Moreno, et al.
Published: (2025)

Improving endpoint detection in end-to-end streaming ASR for conversational speech
by: C, Anandh, et al.
Published: (2025)

Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
by: Kumar, Shashi, et al.
Published: (2024)

Do Compact SSL Backbones Matter for Audio Deepfake Detection? A Controlled Study with RAPTOR
by: Kulkarni, Ajinkya, et al.
Published: (2026)

EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs
by: Lin, Liang, et al.
Published: (2026)

A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data
by: Chou, Cheng-Kang, et al.
Published: (2025)

Assessing Latency in ASR Systems: A Methodological Perspective for Real-Time Use
by: Arriaga, Carlos, et al.
Published: (2024)

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
by: Xie, Jiamin, et al.
Published: (2023)

Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models
by: Shakhadri, Syed Abdul Gaffar, et al.
Published: (2025)

Probing the Hidden Talent of ASR Foundation Models for L2 English Oral Assessment
by: Chao, Fu-An, et al.
Published: (2025)

WESR: Scaling and Evaluating Word-level Event-Speech Recognition
by: Yang, Chenchen, et al.
Published: (2026)

TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree
by: Andrusenko, Andrei, et al.
Published: (2025)

Temporal Order Preserved Optimal Transport-based Cross-modal Knowledge Transfer Learning for ASR
by: Lu, Xugang, et al.
Published: (2024)