Saved in:
| Main Authors: | Zhou, Wei, Jia, Junteng, Sari, Leda, Mahadeokar, Jay, Kalinli, Ozlem |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.07607 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Faster Speech-LLaMA Inference with Multi-token Prediction
by: Raj, Desh, et al.
Published: (2024)
by: Raj, Desh, et al.
Published: (2024)
Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
by: Kang, Wonjune, et al.
Published: (2024)
by: Kang, Wonjune, et al.
Published: (2024)
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
by: Xie, Jiamin, et al.
Published: (2023)
by: Xie, Jiamin, et al.
Published: (2023)
Efficient Streaming LLM for Speech Recognition
by: Jia, Junteng, et al.
Published: (2024)
by: Jia, Junteng, et al.
Published: (2024)
M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
by: Yang, Yufeng, et al.
Published: (2024)
by: Yang, Yufeng, et al.
Published: (2024)
Effective Text Adaptation for LLM-based ASR through Soft Prompt Fine-Tuning
by: Ma, Yingyi, et al.
Published: (2024)
by: Ma, Yingyi, et al.
Published: (2024)
kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
by: Zhou, Jiaming, et al.
Published: (2023)
by: Zhou, Jiaming, et al.
Published: (2023)
Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation
by: Tsunoo, Emiru, et al.
Published: (2023)
by: Tsunoo, Emiru, et al.
Published: (2023)
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
by: Du, Chenpeng, et al.
Published: (2024)
by: Du, Chenpeng, et al.
Published: (2024)
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
by: Wang, Hankun, et al.
Published: (2024)
by: Wang, Hankun, et al.
Published: (2024)
Token-Weighted RNN-T for Learning from Flawed Data
by: Keren, Gil, et al.
Published: (2024)
by: Keren, Gil, et al.
Published: (2024)
CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech Recognition
by: Hou, Junfeng, et al.
Published: (2024)
by: Hou, Junfeng, et al.
Published: (2024)
Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR
by: Li, Longhao, et al.
Published: (2025)
by: Li, Longhao, et al.
Published: (2025)
SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative Decoding
by: Wei, Linye, et al.
Published: (2025)
by: Wei, Linye, et al.
Published: (2025)
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
by: Masuyama, Yoshiki, et al.
Published: (2024)
by: Masuyama, Yoshiki, et al.
Published: (2024)
Joint Beam Search Integrating CTC, Attention, and Transducer Decoders
by: Sudo, Yui, et al.
Published: (2024)
by: Sudo, Yui, et al.
Published: (2024)
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
by: Peng, Yifan, et al.
Published: (2024)
by: Peng, Yifan, et al.
Published: (2024)
SOA: Reducing Domain Mismatch in SSL Pipeline by Speech Only Adaptation for Low Resource ASR
by: Shankar, Natarajan Balaji, et al.
Published: (2024)
by: Shankar, Natarajan Balaji, et al.
Published: (2024)
Unimodal Aggregation for CTC-based Speech Recognition
by: Fang, Ying, et al.
Published: (2023)
by: Fang, Ying, et al.
Published: (2023)
BrainWhisperer: Leveraging Large-Scale ASR Models for Neural Speech Decoding
by: Boccato, Tommaso, et al.
Published: (2026)
by: Boccato, Tommaso, et al.
Published: (2026)
Joint ASR and Speaker Role Tagging with Serialized Output Training
by: Xu, Anfeng, et al.
Published: (2025)
by: Xu, Anfeng, et al.
Published: (2025)
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
by: Chen, Peikun, et al.
Published: (2024)
by: Chen, Peikun, et al.
Published: (2024)
Discrete Tokens Exhibit Interlanguage Speech Intelligibility Benefit: an Analytical Study Towards Accent-robust ASR Only with Native Speech Data
by: Onda, Kentaro, et al.
Published: (2025)
by: Onda, Kentaro, et al.
Published: (2025)
Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization
by: Wan, Genshun, et al.
Published: (2026)
by: Wan, Genshun, et al.
Published: (2026)
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
by: Bai, Ye, et al.
Published: (2024)
by: Bai, Ye, et al.
Published: (2024)
Enhancing Intelligibility for Generative Target Speech Extraction via Joint Optimization with Target Speaker ASR
by: Ma, Hao, et al.
Published: (2025)
by: Ma, Hao, et al.
Published: (2025)
CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
by: Zhao, Wenbo, et al.
Published: (2024)
by: Zhao, Wenbo, et al.
Published: (2024)
Boosting CTC-Based ASR Using LLM-Based Intermediate Loss Regularization
by: Altinok, Duygu
Published: (2025)
by: Altinok, Duygu
Published: (2025)
A Language-Agnostic Hierarchical LoRA-MoE Architecture for CTC-based Multilingual ASR
by: Zheng, Yuang, et al.
Published: (2026)
by: Zheng, Yuang, et al.
Published: (2026)
Speech Emotion Recognition with ASR Integration
by: Li, Yuanchao
Published: (2026)
by: Li, Yuanchao
Published: (2026)
Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
by: Sakuma, Asahi, et al.
Published: (2025)
by: Sakuma, Asahi, et al.
Published: (2025)
SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition
by: Le, Khanh, et al.
Published: (2025)
by: Le, Khanh, et al.
Published: (2025)
Joint Training And Decoding for Multilingual End-to-End Simultaneous Speech Translation
by: Huang, Wuwei, et al.
Published: (2025)
by: Huang, Wuwei, et al.
Published: (2025)
Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores
by: Zhou, Jiaming, et al.
Published: (2024)
by: Zhou, Jiaming, et al.
Published: (2024)
SNIPER Training: Single-Shot Sparse Training for Text-to-Speech
by: Lam, Perry, et al.
Published: (2022)
by: Lam, Perry, et al.
Published: (2022)
Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter
by: Andrusenko, Andrei, et al.
Published: (2024)
by: Andrusenko, Andrei, et al.
Published: (2024)
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
by: Xu, Kai-Tuo, et al.
Published: (2025)
by: Xu, Kai-Tuo, et al.
Published: (2025)
Efficient Scaling for LLM-based ASR
by: Mu, Bingshen, et al.
Published: (2025)
by: Mu, Bingshen, et al.
Published: (2025)
Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment
by: Choi, Jeongsoo, et al.
Published: (2025)
by: Choi, Jeongsoo, et al.
Published: (2025)
WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing
by: Nakagome, Yu, et al.
Published: (2025)
by: Nakagome, Yu, et al.
Published: (2025)
Similar Items
-
Faster Speech-LLaMA Inference with Multi-token Prediction
by: Raj, Desh, et al.
Published: (2024) -
Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
by: Kang, Wonjune, et al.
Published: (2024) -
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
by: Xie, Jiamin, et al.
Published: (2023) -
Efficient Streaming LLM for Speech Recognition
by: Jia, Junteng, et al.
Published: (2024) -
M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
by: Yang, Yufeng, et al.
Published: (2024)