Saved in:
| Main Authors: | Koluguri, Nithin Rao, Sekoyan, Monica, Zelenfroynd, George, Meister, Sasha, Ding, Shuoyang, Kostandian, Sofia, Huang, He, Karpov, Nikolay, Balam, Jagadeesh, Lavrukhin, Vitaly, Peng, Yifan, Papi, Sara, Gaido, Marco, Brutti, Alessio, Ginsburg, Boris |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.13404 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST
by: Sekoyan, Monica, et al.
Published: (2025)
by: Sekoyan, Monica, et al.
Published: (2025)
Methods to Increase the Amount of Data for Speech Recognition for Low Resource Languages
by: Ayrapetyan, Alexan, et al.
Published: (2025)
by: Ayrapetyan, Alexan, et al.
Published: (2025)
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
by: Koluguri, Nithin Rao, et al.
Published: (2024)
by: Koluguri, Nithin Rao, et al.
Published: (2024)
Training and Inference Efficiency of Encoder-Decoder Speech Models
by: Żelasko, Piotr, et al.
Published: (2025)
by: Żelasko, Piotr, et al.
Published: (2025)
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
by: Dhawan, Kunal, et al.
Published: (2024)
by: Dhawan, Kunal, et al.
Published: (2024)
Less is More: Accurate Speech Recognition & Translation without Web-Scale Data
by: Puvvada, Krishna C., et al.
Published: (2024)
by: Puvvada, Krishna C., et al.
Published: (2024)
EMMeTT: Efficient Multimodal Machine Translation Training
by: Żelasko, Piotr, et al.
Published: (2024)
by: Żelasko, Piotr, et al.
Published: (2024)
Anticipating Future with Large Language Model for Simultaneous Machine Translation
by: Ouyang, Siqi, et al.
Published: (2024)
by: Ouyang, Siqi, et al.
Published: (2024)
Open Automatic Speech Recognition Models for Classical and Modern Standard Arabic
by: Grigoryan, Lilit, et al.
Published: (2025)
by: Grigoryan, Lilit, et al.
Published: (2025)
Chain-of-Thought Prompting for Speech Translation
by: Hu, Ke, et al.
Published: (2024)
by: Hu, Ke, et al.
Published: (2024)
BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
by: Chen, Zhehuai, et al.
Published: (2024)
by: Chen, Zhehuai, et al.
Published: (2024)
Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR
by: Wang, Weiqing, et al.
Published: (2025)
by: Wang, Weiqing, et al.
Published: (2025)
Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech
by: Ouyang, Siqi, et al.
Published: (2026)
by: Ouyang, Siqi, et al.
Published: (2026)
FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities
by: Grigoryan, Lilit, et al.
Published: (2025)
by: Grigoryan, Lilit, et al.
Published: (2025)
Sortformer: A Novel Approach for Permutation-Resolved Speaker Supervision in Speech-to-Text Systems
by: Park, Taejin, et al.
Published: (2024)
by: Park, Taejin, et al.
Published: (2024)
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
by: Huang, He, et al.
Published: (2024)
by: Huang, He, et al.
Published: (2024)
META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR
by: Wang, Jinhan, et al.
Published: (2024)
by: Wang, Jinhan, et al.
Published: (2024)
Flexible Multichannel Speech Enhancement for Noise-Robust Frontend
by: Jukić, Ante, et al.
Published: (2024)
by: Jukić, Ante, et al.
Published: (2024)
Unified Semi-Supervised Pipeline for Automatic Speech Recognition
by: Tadevosyan, Nune, et al.
Published: (2025)
by: Tadevosyan, Nune, et al.
Published: (2025)
Extending Automatic Machine Translation Evaluation to Book-Length Documents
by: Wang, Kuang-Da, et al.
Published: (2025)
by: Wang, Kuang-Da, et al.
Published: (2025)
Word Level Timestamp Generation for Automatic Speech Recognition and Translation
by: Hu, Ke, et al.
Published: (2025)
by: Hu, Ke, et al.
Published: (2025)
Schrödinger Bridge for Generative Speech Enhancement
by: Jukić, Ante, et al.
Published: (2024)
by: Jukić, Ante, et al.
Published: (2024)
SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation
by: Papi, Sara, et al.
Published: (2024)
by: Papi, Sara, et al.
Published: (2024)
TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree
by: Andrusenko, Andrei, et al.
Published: (2025)
by: Andrusenko, Andrei, et al.
Published: (2025)
Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter
by: Andrusenko, Andrei, et al.
Published: (2024)
by: Andrusenko, Andrei, et al.
Published: (2024)
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
by: Bataev, Vladimir, et al.
Published: (2023)
by: Bataev, Vladimir, et al.
Published: (2023)
Label-Looping: Highly Efficient Decoding for Transducers
by: Bataev, Vladimir, et al.
Published: (2024)
by: Bataev, Vladimir, et al.
Published: (2024)
Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition
by: Noroozi, Vahid, et al.
Published: (2023)
by: Noroozi, Vahid, et al.
Published: (2023)
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
by: Gaido, Marco, et al.
Published: (2024)
by: Gaido, Marco, et al.
Published: (2024)
How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena
by: Gaido, Marco, et al.
Published: (2024)
by: Gaido, Marco, et al.
Published: (2024)
StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection
by: Papi, Sara, et al.
Published: (2024)
by: Papi, Sara, et al.
Published: (2024)
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
by: Gaido, Marco, et al.
Published: (2024)
by: Gaido, Marco, et al.
Published: (2024)
FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian
by: Papi, Sara, et al.
Published: (2025)
by: Papi, Sara, et al.
Published: (2025)
The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence
by: Gaido, Marco, et al.
Published: (2025)
by: Gaido, Marco, et al.
Published: (2025)
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
by: Burchi, Maxime, et al.
Published: (2024)
by: Burchi, Maxime, et al.
Published: (2024)
How to Evaluate Speech Translation with Source-Aware Neural MT Metrics
by: Cettolo, Mauro, et al.
Published: (2025)
by: Cettolo, Mauro, et al.
Published: (2025)
Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems
by: Gaido, Marco, et al.
Published: (2025)
by: Gaido, Marco, et al.
Published: (2025)
NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding
by: Bataev, Vladimir, et al.
Published: (2025)
by: Bataev, Vladimir, et al.
Published: (2025)
Pushing the Limits of Beam Search Decoding for Transducer-based ASR models
by: Grigoryan, Lilit, et al.
Published: (2025)
by: Grigoryan, Lilit, et al.
Published: (2025)
Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization
by: Andrusenko, Andrei, et al.
Published: (2026)
by: Andrusenko, Andrei, et al.
Published: (2026)
Similar Items
-
Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST
by: Sekoyan, Monica, et al.
Published: (2025) -
Methods to Increase the Amount of Data for Speech Recognition for Low Resource Languages
by: Ayrapetyan, Alexan, et al.
Published: (2025) -
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
by: Koluguri, Nithin Rao, et al.
Published: (2024) -
Training and Inference Efficiency of Encoder-Decoder Speech Models
by: Żelasko, Piotr, et al.
Published: (2025) -
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
by: Dhawan, Kunal, et al.
Published: (2024)