Saved in:
| Main Authors: | Xu, Jingjing, Yang, Zijian, Zeyer, Albert, Beck, Eugen, Schlueter, Ralf, Ney, Hermann |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.13180 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Efficient Supernet Training with Orthogonal Softmax for Scalable ASR Model Compression
by: Xu, Jingjing, et al.
Published: (2025)
by: Xu, Jingjing, et al.
Published: (2025)
Dynamic Encoder Size Based on Data-Driven Layer-wise Pruning for Speech Recognition
by: Xu, Jingjing, et al.
Published: (2024)
by: Xu, Jingjing, et al.
Published: (2024)
Text-Utilization for Encoder-dominated Speech Recognition Models
by: Zeyer, Albert, et al.
Published: (2026)
by: Zeyer, Albert, et al.
Published: (2026)
Diffusion Language Models for Speech Recognition
by: Naveriani, Davyd, et al.
Published: (2026)
by: Naveriani, Davyd, et al.
Published: (2026)
On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
by: Yang, Zijian, et al.
Published: (2023)
by: Yang, Zijian, et al.
Published: (2023)
Reproducing and Dissecting Denoising Language Models for Speech Recognition
by: Koch, Dorian, et al.
Published: (2025)
by: Koch, Dorian, et al.
Published: (2025)
A Comparative Analysis on ASR System Combination for Attention, CTC, Factored Hybrid, and Transducer Models
by: Bayoumi, Noureldin, et al.
Published: (2025)
by: Bayoumi, Noureldin, et al.
Published: (2025)
Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition
by: Zeineldeen, Mohammad, et al.
Published: (2023)
by: Zeineldeen, Mohammad, et al.
Published: (2023)
Unified Learnable 2D Convolutional Feature Extraction for ASR
by: Vieting, Peter, et al.
Published: (2025)
by: Vieting, Peter, et al.
Published: (2025)
Label-Context-Dependent Internal Language Model Estimation for CTC
by: Yang, Zijian, et al.
Published: (2025)
by: Yang, Zijian, et al.
Published: (2025)
LLMs and Speech: Integration vs. Combination
by: Schmitt, Robin, et al.
Published: (2026)
by: Schmitt, Robin, et al.
Published: (2026)
The Conformer Encoder May Reverse the Time Dimension
by: Schmitt, Robin, et al.
Published: (2024)
by: Schmitt, Robin, et al.
Published: (2024)
Right Label Context in End-to-End Training of Time-Synchronous ASR Models
by: Raissi, Tina, et al.
Published: (2025)
by: Raissi, Tina, et al.
Published: (2025)
AppTek Call-Center Dialogues: A Multi-Accent Long-Form Benchmark for English ASR
by: Beck, Eugen, et al.
Published: (2026)
by: Beck, Eugen, et al.
Published: (2026)
Sequence-Level Unsupervised Training in Speech Recognition: A Theoretical Study
by: Yang, Zijian, et al.
Published: (2026)
by: Yang, Zijian, et al.
Published: (2026)
Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions
by: Thulke, David, et al.
Published: (2025)
by: Thulke, David, et al.
Published: (2025)
Regularizing Learnable Feature Extraction for Automatic Speech Recognition
by: Vieting, Peter, et al.
Published: (2025)
by: Vieting, Peter, et al.
Published: (2025)
Towards Pretraining Robust ASR Foundation Model with Acoustic-Aware Data Augmentation
by: Liu, Dancheng, et al.
Published: (2025)
by: Liu, Dancheng, et al.
Published: (2025)
Revisiting Acoustic Features for Robust ASR
by: Shah, Muhammad A., et al.
Published: (2024)
by: Shah, Muhammad A., et al.
Published: (2024)
Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding
by: Moriya, Takafumi, et al.
Published: (2024)
by: Moriya, Takafumi, et al.
Published: (2024)
Investigating the Effect of Label Topology and Training Criterion on ASR Performance and Alignment Quality
by: Raissi, Tina, et al.
Published: (2024)
by: Raissi, Tina, et al.
Published: (2024)
Prompting and Fine-Tuning of Small LLMs for Length-Controllable Telephone Call Summarization
by: Thulke, David, et al.
Published: (2024)
by: Thulke, David, et al.
Published: (2024)
The Limits of Data Scaling: Sub-token Utilization and Acoustic Saturation in Multilingual ASR
by: Liang, Siyu, et al.
Published: (2025)
by: Liang, Siyu, et al.
Published: (2025)
CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR
by: Shakeel, Muhammad, et al.
Published: (2026)
by: Shakeel, Muhammad, et al.
Published: (2026)
Classification Error Bound for Low Bayes Error Conditions in Machine Learning
by: Yang, Zijian, et al.
Published: (2025)
by: Yang, Zijian, et al.
Published: (2025)
Refined Statistical Bounds for Classification Error Mismatches with Constrained Bayes Error
by: Yang, Zijian, et al.
Published: (2024)
by: Yang, Zijian, et al.
Published: (2024)
MLMA: Towards Multilingual ASR With Mamba-based Architectures
by: Ali, Mohamed Nabih, et al.
Published: (2025)
by: Ali, Mohamed Nabih, et al.
Published: (2025)
Echoes of Phonetics: Unveiling Relevant Acoustic Cues for ASR via Feature Attribution
by: Fucci, Dennis, et al.
Published: (2025)
by: Fucci, Dennis, et al.
Published: (2025)
New Insights into Optimal Alignment of Acoustic and Linguistic Representations for Knowledge Transfer in ASR
by: Lu, Xugang, et al.
Published: (2025)
by: Lu, Xugang, et al.
Published: (2025)
Uni-ASR: Unified LLM-Based Architecture for Non-Streaming and Streaming Automatic Speech Recognition
by: Xia, Yinfeng, et al.
Published: (2026)
by: Xia, Yinfeng, et al.
Published: (2026)
Functional Abstraction of Knowledge Recall in Large Language Models
by: Wang, Zijian, et al.
Published: (2025)
by: Wang, Zijian, et al.
Published: (2025)
Eliciting Chain-of-Thought in Base LLMs via Gradient-Based Representation Optimization
by: Wang, Zijian, et al.
Published: (2025)
by: Wang, Zijian, et al.
Published: (2025)
PromptASR for contextualized ASR with controllable style
by: Yang, Xiaoyu, et al.
Published: (2023)
by: Yang, Xiaoyu, et al.
Published: (2023)
Optimizing Speech Language Models for Acoustic Consistency
by: Rohanian, Morteza, et al.
Published: (2025)
by: Rohanian, Morteza, et al.
Published: (2025)
Towards interfacing large language models with ASR systems using confidence measures and prompting
by: Naderi, Maryam, et al.
Published: (2024)
by: Naderi, Maryam, et al.
Published: (2024)
Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition
by: Juvekar, Kush, et al.
Published: (2026)
by: Juvekar, Kush, et al.
Published: (2026)
OLMoASR: Open Models and Data for Training Robust Speech Recognition Models
by: Ngo, Huong, et al.
Published: (2025)
by: Ngo, Huong, et al.
Published: (2025)
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
by: Du, Yupei, et al.
Published: (2023)
by: Du, Yupei, et al.
Published: (2023)
MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
by: Nguyen, Thai-Binh, et al.
Published: (2024)
by: Nguyen, Thai-Binh, et al.
Published: (2024)
Locating and Extracting Relational Concepts in Large Language Models
by: Wang, Zijian, et al.
Published: (2024)
by: Wang, Zijian, et al.
Published: (2024)
Similar Items
-
Efficient Supernet Training with Orthogonal Softmax for Scalable ASR Model Compression
by: Xu, Jingjing, et al.
Published: (2025) -
Dynamic Encoder Size Based on Data-Driven Layer-wise Pruning for Speech Recognition
by: Xu, Jingjing, et al.
Published: (2024) -
Text-Utilization for Encoder-dominated Speech Recognition Models
by: Zeyer, Albert, et al.
Published: (2026) -
Diffusion Language Models for Speech Recognition
by: Naveriani, Davyd, et al.
Published: (2026) -
On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
by: Yang, Zijian, et al.
Published: (2023)