:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhou, Wei, Jia, Junteng, Sari, Leda, Mahadeokar, Jay, Kalinli, Ozlem
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Machine Learning Sound
Online Access:	https://arxiv.org/abs/2411.07607
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Faster Speech-LLaMA Inference with Multi-token Prediction
by: Raj, Desh, et al.
Published: (2024)

Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
by: Kang, Wonjune, et al.
Published: (2024)

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
by: Xie, Jiamin, et al.
Published: (2023)

Efficient Streaming LLM for Speech Recognition
by: Jia, Junteng, et al.
Published: (2024)

M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
by: Yang, Yufeng, et al.
Published: (2024)

Effective Text Adaptation for LLM-based ASR through Soft Prompt Fine-Tuning
by: Ma, Yingyi, et al.
Published: (2024)

kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
by: Zhou, Jiaming, et al.
Published: (2023)

Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation
by: Tsunoo, Emiru, et al.
Published: (2023)

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
by: Du, Chenpeng, et al.
Published: (2024)

Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
by: Wang, Hankun, et al.
Published: (2024)

Token-Weighted RNN-T for Learning from Flawed Data
by: Keren, Gil, et al.
Published: (2024)

CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech Recognition
by: Hou, Junfeng, et al.
Published: (2024)

Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR
by: Li, Longhao, et al.
Published: (2025)

SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative Decoding
by: Wei, Linye, et al.
Published: (2025)

Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
by: Masuyama, Yoshiki, et al.
Published: (2024)

Joint Beam Search Integrating CTC, Attention, and Transducer Decoders
by: Sudo, Yui, et al.
Published: (2024)

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
by: Peng, Yifan, et al.
Published: (2024)

SOA: Reducing Domain Mismatch in SSL Pipeline by Speech Only Adaptation for Low Resource ASR
by: Shankar, Natarajan Balaji, et al.
Published: (2024)

Unimodal Aggregation for CTC-based Speech Recognition
by: Fang, Ying, et al.
Published: (2023)

BrainWhisperer: Leveraging Large-Scale ASR Models for Neural Speech Decoding
by: Boccato, Tommaso, et al.
Published: (2026)

Joint ASR and Speaker Role Tagging with Serialized Output Training
by: Xu, Anfeng, et al.
Published: (2025)

Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
by: Chen, Peikun, et al.
Published: (2024)

Discrete Tokens Exhibit Interlanguage Speech Intelligibility Benefit: an Analytical Study Towards Accent-robust ASR Only with Native Speech Data
by: Onda, Kentaro, et al.
Published: (2025)

Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization
by: Wan, Genshun, et al.
Published: (2026)

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
by: Bai, Ye, et al.
Published: (2024)

Enhancing Intelligibility for Generative Target Speech Extraction via Joint Optimization with Target Speaker ASR
by: Ma, Hao, et al.
Published: (2025)

CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
by: Zhao, Wenbo, et al.
Published: (2024)

Boosting CTC-Based ASR Using LLM-Based Intermediate Loss Regularization
by: Altinok, Duygu
Published: (2025)

A Language-Agnostic Hierarchical LoRA-MoE Architecture for CTC-based Multilingual ASR
by: Zheng, Yuang, et al.
Published: (2026)

Speech Emotion Recognition with ASR Integration
by: Li, Yuanchao
Published: (2026)

Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
by: Sakuma, Asahi, et al.
Published: (2025)

SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition
by: Le, Khanh, et al.
Published: (2025)

Joint Training And Decoding for Multilingual End-to-End Simultaneous Speech Translation
by: Huang, Wuwei, et al.
Published: (2025)

Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores
by: Zhou, Jiaming, et al.
Published: (2024)

SNIPER Training: Single-Shot Sparse Training for Text-to-Speech
by: Lam, Perry, et al.
Published: (2022)

Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter
by: Andrusenko, Andrei, et al.
Published: (2024)

FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
by: Xu, Kai-Tuo, et al.
Published: (2025)

Efficient Scaling for LLM-based ASR
by: Mu, Bingshen, et al.
Published: (2025)

Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment
by: Choi, Jeongsoo, et al.
Published: (2025)

WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing
by: Nakagome, Yu, et al.
Published: (2025)