:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Jingjing, Zhou, Wei, Yang, Zijian, Beck, Eugen, Schlueter, Ralf
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2407.18930
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Dynamic Data Pruning for Automatic Speech Recognition
by: Xiao, Qiao, et al.
Published: (2024)

Spiralformer: Low Latency Encoder for Streaming Speech Recognition with Circular Layer Skipping and Early Exiting
by: Tsunoo, Emiru, et al.
Published: (2025)

Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations
by: He, Linyang, et al.
Published: (2025)

Data-Driven Mispronunciation Pattern Discovery for Robust Speech Recognition
by: Choi, Anna Seo Gyeong, et al.
Published: (2025)

Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation
by: Lin, Zhennan, et al.
Published: (2025)

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
by: Peng, Yifan, et al.
Published: (2024)

MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
by: Le-Duc, Khai, et al.
Published: (2024)

Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus
by: Harati, Amir, et al.
Published: (2024)

An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement
by: Yang, Tzu-Ting, et al.
Published: (2024)

Robust and Efficient Autoregressive Speech Synthesis with Dynamic Chunk-wise Prediction Policy
by: Li, Bohan, et al.
Published: (2025)

Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
by: Kashiwagi, Yosuke, et al.
Published: (2024)

Training and Inference Efficiency of Encoder-Decoder Speech Models
by: Żelasko, Piotr, et al.
Published: (2025)

Task-Agnostic Structured Pruning of Speech Representation Models
by: Wang, Haoyu, et al.
Published: (2023)

SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation
by: Du, Jiayu, et al.
Published: (2024)

On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
by: Rossenbach, Nick, et al.
Published: (2024)

Data Augmentation for End-to-end Code-switching Speech Recognition
by: Du, Chenpeng, et al.
Published: (2020)

Context-Driven Dynamic Pruning for Large Speech Foundation Models
by: Someki, Masao, et al.
Published: (2025)

Streaming Speech-to-Confusion Network Speech Recognition
by: Filimonov, Denis, et al.
Published: (2023)

Contextualized Automatic Speech Recognition with Dynamic Vocabulary
by: Sudo, Yui, et al.
Published: (2024)

Diagnostic-Driven Layer-Wise Compensation for Post-Training Quantization of Encoder-Decoder ASR Models
by: Wang, Xinyu, et al.
Published: (2026)

DQ-Data2vec: Decoupling Quantization for Multilingual Speech Recognition
by: Shao, Qijie, et al.
Published: (2025)

Approaching Dialogue State Tracking via Aligning Speech Encoders and LLMs
by: Sedláček, Šimon, et al.
Published: (2025)

Advancing African-Accented Speech Recognition: Epistemic Uncertainty-Driven Data Selection for Generalizable ASR Models
by: Dossou, Bonaventure F. P.
Published: (2023)

AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation
by: Huang, Wuwei, et al.
Published: (2025)

Rethinking Entropy Allocation in LLM-based ASR: Understanding the Dynamics between Speech Encoders and LLMs
by: Xie, Yuan, et al.
Published: (2026)

Convexity-based Pruning of Speech Representation Models
by: Dorszewski, Teresa, et al.
Published: (2024)

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
by: Cornell, Samuele, et al.
Published: (2024)

DeCRED: Decoder-Centric Regularization for Encoder-Decoder Based Speech Recognition
by: Polok, Alexander, et al.
Published: (2025)

On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
by: Yang, Zijian, et al.
Published: (2023)

Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation
by: Wang, Shiyao, et al.
Published: (2024)

Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation
by: Matsuura, Kohei, et al.
Published: (2024)

Automatic Speech Recognition for Biomedical Data in Bengali Language
by: Kabir, Shariar, et al.
Published: (2024)

From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition
by: Wang, Tianduo, et al.
Published: (2025)

On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures
by: Hilmes, Benedikt, et al.
Published: (2024)

Chain of Correction for Full-text Speech Recognition with Large Language Models
by: Tang, Zhiyuan, et al.
Published: (2025)

Teaching Audio Models to Reason: A Unified Framework for Source- and Layer-wise Distillation
by: Yang, Runyan, et al.
Published: (2025)

AdaLTM: Adaptive Layer-wise Task Vector Merging for Categorical Speech Emotion Recognition with ASR Knowledge Integration
by: Lee, Chia-Yu, et al.
Published: (2026)

Accent-Invariant Automatic Speech Recognition via Saliency-Driven Spectrogram Masking
by: Sameti, Mohammad Hossein, et al.
Published: (2025)

Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection
by: Lin, Hsi-Che, et al.
Published: (2024)

Frontend Token Enhancement for Token-Based Speech Recognition
by: Ashihara, Takanori, et al.
Published: (2026)