:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Koluguri, Nithin Rao, Sekoyan, Monica, Zelenfroynd, George, Meister, Sasha, Ding, Shuoyang, Kostandian, Sofia, Huang, He, Karpov, Nikolay, Balam, Jagadeesh, Lavrukhin, Vitaly, Peng, Yifan, Papi, Sara, Gaido, Marco, Brutti, Alessio, Ginsburg, Boris
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2505.13404
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST
by: Sekoyan, Monica, et al.
Published: (2025)

Methods to Increase the Amount of Data for Speech Recognition for Low Resource Languages
by: Ayrapetyan, Alexan, et al.
Published: (2025)

Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
by: Koluguri, Nithin Rao, et al.
Published: (2024)

Training and Inference Efficiency of Encoder-Decoder Speech Models
by: Żelasko, Piotr, et al.
Published: (2025)

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
by: Dhawan, Kunal, et al.
Published: (2024)

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data
by: Puvvada, Krishna C., et al.
Published: (2024)

EMMeTT: Efficient Multimodal Machine Translation Training
by: Żelasko, Piotr, et al.
Published: (2024)

Anticipating Future with Large Language Model for Simultaneous Machine Translation
by: Ouyang, Siqi, et al.
Published: (2024)

Open Automatic Speech Recognition Models for Classical and Modern Standard Arabic
by: Grigoryan, Lilit, et al.
Published: (2025)

Chain-of-Thought Prompting for Speech Translation
by: Hu, Ke, et al.
Published: (2024)

BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
by: Chen, Zhehuai, et al.
Published: (2024)

Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR
by: Wang, Weiqing, et al.
Published: (2025)

Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech
by: Ouyang, Siqi, et al.
Published: (2026)

FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities
by: Grigoryan, Lilit, et al.
Published: (2025)

Sortformer: A Novel Approach for Permutation-Resolved Speaker Supervision in Speech-to-Text Systems
by: Park, Taejin, et al.
Published: (2024)

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
by: Huang, He, et al.
Published: (2024)

META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR
by: Wang, Jinhan, et al.
Published: (2024)

Flexible Multichannel Speech Enhancement for Noise-Robust Frontend
by: Jukić, Ante, et al.
Published: (2024)

Unified Semi-Supervised Pipeline for Automatic Speech Recognition
by: Tadevosyan, Nune, et al.
Published: (2025)

Extending Automatic Machine Translation Evaluation to Book-Length Documents
by: Wang, Kuang-Da, et al.
Published: (2025)

Word Level Timestamp Generation for Automatic Speech Recognition and Translation
by: Hu, Ke, et al.
Published: (2025)

Schrödinger Bridge for Generative Speech Enhancement
by: Jukić, Ante, et al.
Published: (2024)

SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation
by: Papi, Sara, et al.
Published: (2024)

TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree
by: Andrusenko, Andrei, et al.
Published: (2025)

Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter
by: Andrusenko, Andrei, et al.
Published: (2024)

Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
by: Bataev, Vladimir, et al.
Published: (2023)

Label-Looping: Highly Efficient Decoding for Transducers
by: Bataev, Vladimir, et al.
Published: (2024)

Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition
by: Noroozi, Vahid, et al.
Published: (2023)

Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
by: Gaido, Marco, et al.
Published: (2024)

How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena
by: Gaido, Marco, et al.
Published: (2024)

StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection
by: Papi, Sara, et al.
Published: (2024)

MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
by: Gaido, Marco, et al.
Published: (2024)

FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian
by: Papi, Sara, et al.
Published: (2025)

The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence
by: Gaido, Marco, et al.
Published: (2025)

Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
by: Burchi, Maxime, et al.
Published: (2024)

How to Evaluate Speech Translation with Source-Aware Neural MT Metrics
by: Cettolo, Mauro, et al.
Published: (2025)

Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems
by: Gaido, Marco, et al.
Published: (2025)

NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding
by: Bataev, Vladimir, et al.
Published: (2025)

Pushing the Limits of Beam Search Decoding for Transducer-based ASR models
by: Grigoryan, Lilit, et al.
Published: (2025)

Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization
by: Andrusenko, Andrei, et al.
Published: (2026)