:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Jingjing, Yang, Zijian, Zeyer, Albert, Beck, Eugen, Schlueter, Ralf, Ney, Hermann
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2506.13180
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Efficient Supernet Training with Orthogonal Softmax for Scalable ASR Model Compression
by: Xu, Jingjing, et al.
Published: (2025)

Dynamic Encoder Size Based on Data-Driven Layer-wise Pruning for Speech Recognition
by: Xu, Jingjing, et al.
Published: (2024)

Text-Utilization for Encoder-dominated Speech Recognition Models
by: Zeyer, Albert, et al.
Published: (2026)

Diffusion Language Models for Speech Recognition
by: Naveriani, Davyd, et al.
Published: (2026)

On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
by: Yang, Zijian, et al.
Published: (2023)

Reproducing and Dissecting Denoising Language Models for Speech Recognition
by: Koch, Dorian, et al.
Published: (2025)

A Comparative Analysis on ASR System Combination for Attention, CTC, Factored Hybrid, and Transducer Models
by: Bayoumi, Noureldin, et al.
Published: (2025)

Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition
by: Zeineldeen, Mohammad, et al.
Published: (2023)

Unified Learnable 2D Convolutional Feature Extraction for ASR
by: Vieting, Peter, et al.
Published: (2025)

Label-Context-Dependent Internal Language Model Estimation for CTC
by: Yang, Zijian, et al.
Published: (2025)

LLMs and Speech: Integration vs. Combination
by: Schmitt, Robin, et al.
Published: (2026)

The Conformer Encoder May Reverse the Time Dimension
by: Schmitt, Robin, et al.
Published: (2024)

Right Label Context in End-to-End Training of Time-Synchronous ASR Models
by: Raissi, Tina, et al.
Published: (2025)

AppTek Call-Center Dialogues: A Multi-Accent Long-Form Benchmark for English ASR
by: Beck, Eugen, et al.
Published: (2026)

Sequence-Level Unsupervised Training in Speech Recognition: A Theoretical Study
by: Yang, Zijian, et al.
Published: (2026)

Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions
by: Thulke, David, et al.
Published: (2025)

Regularizing Learnable Feature Extraction for Automatic Speech Recognition
by: Vieting, Peter, et al.
Published: (2025)

Towards Pretraining Robust ASR Foundation Model with Acoustic-Aware Data Augmentation
by: Liu, Dancheng, et al.
Published: (2025)

Revisiting Acoustic Features for Robust ASR
by: Shah, Muhammad A., et al.
Published: (2024)

Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding
by: Moriya, Takafumi, et al.
Published: (2024)

Investigating the Effect of Label Topology and Training Criterion on ASR Performance and Alignment Quality
by: Raissi, Tina, et al.
Published: (2024)

Prompting and Fine-Tuning of Small LLMs for Length-Controllable Telephone Call Summarization
by: Thulke, David, et al.
Published: (2024)

The Limits of Data Scaling: Sub-token Utilization and Acoustic Saturation in Multilingual ASR
by: Liang, Siyu, et al.
Published: (2025)

CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR
by: Shakeel, Muhammad, et al.
Published: (2026)

Classification Error Bound for Low Bayes Error Conditions in Machine Learning
by: Yang, Zijian, et al.
Published: (2025)

Refined Statistical Bounds for Classification Error Mismatches with Constrained Bayes Error
by: Yang, Zijian, et al.
Published: (2024)

MLMA: Towards Multilingual ASR With Mamba-based Architectures
by: Ali, Mohamed Nabih, et al.
Published: (2025)

Echoes of Phonetics: Unveiling Relevant Acoustic Cues for ASR via Feature Attribution
by: Fucci, Dennis, et al.
Published: (2025)

New Insights into Optimal Alignment of Acoustic and Linguistic Representations for Knowledge Transfer in ASR
by: Lu, Xugang, et al.
Published: (2025)

Uni-ASR: Unified LLM-Based Architecture for Non-Streaming and Streaming Automatic Speech Recognition
by: Xia, Yinfeng, et al.
Published: (2026)

Functional Abstraction of Knowledge Recall in Large Language Models
by: Wang, Zijian, et al.
Published: (2025)

Eliciting Chain-of-Thought in Base LLMs via Gradient-Based Representation Optimization
by: Wang, Zijian, et al.
Published: (2025)

PromptASR for contextualized ASR with controllable style
by: Yang, Xiaoyu, et al.
Published: (2023)

Optimizing Speech Language Models for Acoustic Consistency
by: Rohanian, Morteza, et al.
Published: (2025)

Towards interfacing large language models with ASR systems using confidence measures and prompting
by: Naderi, Maryam, et al.
Published: (2024)

Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition
by: Juvekar, Kush, et al.
Published: (2026)

OLMoASR: Open Models and Data for Training Robust Speech Recognition Models
by: Ngo, Huong, et al.
Published: (2025)

FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
by: Du, Yupei, et al.
Published: (2023)

MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
by: Nguyen, Thai-Binh, et al.
Published: (2024)

Locating and Extracting Relational Concepts in Large Language Models
by: Wang, Zijian, et al.
Published: (2024)