Saved in:
| Main Authors: | Xu, Jingjing, Zhou, Wei, Yang, Zijian, Beck, Eugen, Schlueter, Ralf |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.18930 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Dynamic Data Pruning for Automatic Speech Recognition
by: Xiao, Qiao, et al.
Published: (2024)
by: Xiao, Qiao, et al.
Published: (2024)
Spiralformer: Low Latency Encoder for Streaming Speech Recognition with Circular Layer Skipping and Early Exiting
by: Tsunoo, Emiru, et al.
Published: (2025)
by: Tsunoo, Emiru, et al.
Published: (2025)
Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations
by: He, Linyang, et al.
Published: (2025)
by: He, Linyang, et al.
Published: (2025)
Data-Driven Mispronunciation Pattern Discovery for Robust Speech Recognition
by: Choi, Anna Seo Gyeong, et al.
Published: (2025)
by: Choi, Anna Seo Gyeong, et al.
Published: (2025)
Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation
by: Lin, Zhennan, et al.
Published: (2025)
by: Lin, Zhennan, et al.
Published: (2025)
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
by: Peng, Yifan, et al.
Published: (2024)
by: Peng, Yifan, et al.
Published: (2024)
MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
by: Le-Duc, Khai, et al.
Published: (2024)
by: Le-Duc, Khai, et al.
Published: (2024)
Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus
by: Harati, Amir, et al.
Published: (2024)
by: Harati, Amir, et al.
Published: (2024)
An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement
by: Yang, Tzu-Ting, et al.
Published: (2024)
by: Yang, Tzu-Ting, et al.
Published: (2024)
Robust and Efficient Autoregressive Speech Synthesis with Dynamic Chunk-wise Prediction Policy
by: Li, Bohan, et al.
Published: (2025)
by: Li, Bohan, et al.
Published: (2025)
Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
by: Kashiwagi, Yosuke, et al.
Published: (2024)
by: Kashiwagi, Yosuke, et al.
Published: (2024)
Training and Inference Efficiency of Encoder-Decoder Speech Models
by: Żelasko, Piotr, et al.
Published: (2025)
by: Żelasko, Piotr, et al.
Published: (2025)
Task-Agnostic Structured Pruning of Speech Representation Models
by: Wang, Haoyu, et al.
Published: (2023)
by: Wang, Haoyu, et al.
Published: (2023)
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation
by: Du, Jiayu, et al.
Published: (2024)
by: Du, Jiayu, et al.
Published: (2024)
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
by: Rossenbach, Nick, et al.
Published: (2024)
by: Rossenbach, Nick, et al.
Published: (2024)
Data Augmentation for End-to-end Code-switching Speech Recognition
by: Du, Chenpeng, et al.
Published: (2020)
by: Du, Chenpeng, et al.
Published: (2020)
Context-Driven Dynamic Pruning for Large Speech Foundation Models
by: Someki, Masao, et al.
Published: (2025)
by: Someki, Masao, et al.
Published: (2025)
Streaming Speech-to-Confusion Network Speech Recognition
by: Filimonov, Denis, et al.
Published: (2023)
by: Filimonov, Denis, et al.
Published: (2023)
Contextualized Automatic Speech Recognition with Dynamic Vocabulary
by: Sudo, Yui, et al.
Published: (2024)
by: Sudo, Yui, et al.
Published: (2024)
Diagnostic-Driven Layer-Wise Compensation for Post-Training Quantization of Encoder-Decoder ASR Models
by: Wang, Xinyu, et al.
Published: (2026)
by: Wang, Xinyu, et al.
Published: (2026)
DQ-Data2vec: Decoupling Quantization for Multilingual Speech Recognition
by: Shao, Qijie, et al.
Published: (2025)
by: Shao, Qijie, et al.
Published: (2025)
Approaching Dialogue State Tracking via Aligning Speech Encoders and LLMs
by: Sedláček, Šimon, et al.
Published: (2025)
by: Sedláček, Šimon, et al.
Published: (2025)
Advancing African-Accented Speech Recognition: Epistemic Uncertainty-Driven Data Selection for Generalizable ASR Models
by: Dossou, Bonaventure F. P.
Published: (2023)
by: Dossou, Bonaventure F. P.
Published: (2023)
AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation
by: Huang, Wuwei, et al.
Published: (2025)
by: Huang, Wuwei, et al.
Published: (2025)
Rethinking Entropy Allocation in LLM-based ASR: Understanding the Dynamics between Speech Encoders and LLMs
by: Xie, Yuan, et al.
Published: (2026)
by: Xie, Yuan, et al.
Published: (2026)
Convexity-based Pruning of Speech Representation Models
by: Dorszewski, Teresa, et al.
Published: (2024)
by: Dorszewski, Teresa, et al.
Published: (2024)
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
by: Cornell, Samuele, et al.
Published: (2024)
by: Cornell, Samuele, et al.
Published: (2024)
DeCRED: Decoder-Centric Regularization for Encoder-Decoder Based Speech Recognition
by: Polok, Alexander, et al.
Published: (2025)
by: Polok, Alexander, et al.
Published: (2025)
On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
by: Yang, Zijian, et al.
Published: (2023)
by: Yang, Zijian, et al.
Published: (2023)
Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation
by: Wang, Shiyao, et al.
Published: (2024)
by: Wang, Shiyao, et al.
Published: (2024)
Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation
by: Matsuura, Kohei, et al.
Published: (2024)
by: Matsuura, Kohei, et al.
Published: (2024)
Automatic Speech Recognition for Biomedical Data in Bengali Language
by: Kabir, Shariar, et al.
Published: (2024)
by: Kabir, Shariar, et al.
Published: (2024)
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition
by: Wang, Tianduo, et al.
Published: (2025)
by: Wang, Tianduo, et al.
Published: (2025)
On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures
by: Hilmes, Benedikt, et al.
Published: (2024)
by: Hilmes, Benedikt, et al.
Published: (2024)
Chain of Correction for Full-text Speech Recognition with Large Language Models
by: Tang, Zhiyuan, et al.
Published: (2025)
by: Tang, Zhiyuan, et al.
Published: (2025)
Teaching Audio Models to Reason: A Unified Framework for Source- and Layer-wise Distillation
by: Yang, Runyan, et al.
Published: (2025)
by: Yang, Runyan, et al.
Published: (2025)
AdaLTM: Adaptive Layer-wise Task Vector Merging for Categorical Speech Emotion Recognition with ASR Knowledge Integration
by: Lee, Chia-Yu, et al.
Published: (2026)
by: Lee, Chia-Yu, et al.
Published: (2026)
Accent-Invariant Automatic Speech Recognition via Saliency-Driven Spectrogram Masking
by: Sameti, Mohammad Hossein, et al.
Published: (2025)
by: Sameti, Mohammad Hossein, et al.
Published: (2025)
Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection
by: Lin, Hsi-Che, et al.
Published: (2024)
by: Lin, Hsi-Che, et al.
Published: (2024)
Frontend Token Enhancement for Token-Based Speech Recognition
by: Ashihara, Takanori, et al.
Published: (2026)
by: Ashihara, Takanori, et al.
Published: (2026)
Similar Items
-
Dynamic Data Pruning for Automatic Speech Recognition
by: Xiao, Qiao, et al.
Published: (2024) -
Spiralformer: Low Latency Encoder for Streaming Speech Recognition with Circular Layer Skipping and Early Exiting
by: Tsunoo, Emiru, et al.
Published: (2025) -
Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations
by: He, Linyang, et al.
Published: (2025) -
Data-Driven Mispronunciation Pattern Discovery for Robust Speech Recognition
by: Choi, Anna Seo Gyeong, et al.
Published: (2025) -
Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation
by: Lin, Zhennan, et al.
Published: (2025)