Saved in:
| Main Authors: | Wang, Xinyu, Zhao, Ziyu, Luo, Yajie, Wu, Yihong, Ma, Liheng, Tian, Jingrui, Ding, Lei, Chang, Xiao-Wen, Lu, Peng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.02455 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Post-Training Quantization for Audio Diffusion Transformers
by: Khandelwal, Tanmay, et al.
Published: (2025)
by: Khandelwal, Tanmay, et al.
Published: (2025)
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
by: Xu, Kai-Tuo, et al.
Published: (2025)
by: Xu, Kai-Tuo, et al.
Published: (2025)
Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
by: Guo, Haohan, et al.
Published: (2024)
by: Guo, Haohan, et al.
Published: (2024)
Training Large ASR Encoders with Differential Privacy
by: Chauhan, Geeticka, et al.
Published: (2024)
by: Chauhan, Geeticka, et al.
Published: (2024)
Fx-Encoder++: Extracting Instrument-Wise Audio Effects Representations from Mixtures
by: Yeh, Yen-Tung, et al.
Published: (2025)
by: Yeh, Yen-Tung, et al.
Published: (2025)
Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER
by: Zheng, Xiuwen, et al.
Published: (2026)
by: Zheng, Xiuwen, et al.
Published: (2026)
CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
by: Zhao, Wenbo, et al.
Published: (2024)
by: Zhao, Wenbo, et al.
Published: (2024)
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
by: Guo, Pengcheng, et al.
Published: (2024)
by: Guo, Pengcheng, et al.
Published: (2024)
Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder
by: Shakeel, Muhammad, et al.
Published: (2025)
by: Shakeel, Muhammad, et al.
Published: (2025)
BrainWhisperer: Leveraging Large-Scale ASR Models for Neural Speech Decoding
by: Boccato, Tommaso, et al.
Published: (2026)
by: Boccato, Tommaso, et al.
Published: (2026)
dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition
by: Tian, Wenjie, et al.
Published: (2026)
by: Tian, Wenjie, et al.
Published: (2026)
SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative Decoding
by: Wei, Linye, et al.
Published: (2025)
by: Wei, Linye, et al.
Published: (2025)
Decoding Strategies for Diffusion-Based ASR: A Systematic Evaluation of Confidence-Based Thresholding
by: Yeo, Jeong Hun, et al.
Published: (2026)
by: Yeo, Jeong Hun, et al.
Published: (2026)
CJST: CTC Compressor based Joint Speech and Text Training for Decoder-Only ASR
by: Zhou, Wei, et al.
Published: (2024)
by: Zhou, Wei, et al.
Published: (2024)
Rethinking Entropy Allocation in LLM-based ASR: Understanding the Dynamics between Speech Encoders and LLMs
by: Xie, Yuan, et al.
Published: (2026)
by: Xie, Yuan, et al.
Published: (2026)
Asymmetric Encoder-Decoder Based on Time-Frequency Correlation for Speech Separation
by: Shin, Ui-Hyeop, et al.
Published: (2026)
by: Shin, Ui-Hyeop, et al.
Published: (2026)
Efficient Scaling for LLM-based ASR
by: Mu, Bingshen, et al.
Published: (2025)
by: Mu, Bingshen, et al.
Published: (2025)
Joint ASR and Speaker Role Tagging with Serialized Output Training
by: Xu, Anfeng, et al.
Published: (2025)
by: Xu, Anfeng, et al.
Published: (2025)
Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders
by: Violeta, Lester Phillip, et al.
Published: (2023)
by: Violeta, Lester Phillip, et al.
Published: (2023)
Romanization Encoding For Multilingual ASR
by: Ding, Wen, et al.
Published: (2024)
by: Ding, Wen, et al.
Published: (2024)
Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints
by: Lee, PeiYing, et al.
Published: (2024)
by: Lee, PeiYing, et al.
Published: (2024)
Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision
by: Li, Zhaoqing, et al.
Published: (2025)
by: Li, Zhaoqing, et al.
Published: (2025)
Causal Structure Discovery for Error Diagnostics of Children's ASR
by: Singh, Vishwanath Pratap, et al.
Published: (2025)
by: Singh, Vishwanath Pratap, et al.
Published: (2025)
SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
by: Fan, Zhiyun, et al.
Published: (2024)
by: Fan, Zhiyun, et al.
Published: (2024)
Investigating the Effect of Label Topology and Training Criterion on ASR Performance and Alignment Quality
by: Raissi, Tina, et al.
Published: (2024)
by: Raissi, Tina, et al.
Published: (2024)
Right Label Context in End-to-End Training of Time-Synchronous ASR Models
by: Raissi, Tina, et al.
Published: (2025)
by: Raissi, Tina, et al.
Published: (2025)
Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token-based ASR
by: Chen, Qian, et al.
Published: (2023)
by: Chen, Qian, et al.
Published: (2023)
Are Transformers in Pre-trained LM A Good ASR Encoder? An Empirical Study
by: An, Keyu, et al.
Published: (2024)
by: An, Keyu, et al.
Published: (2024)
Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect
by: Mdhaffar, Salima, et al.
Published: (2024)
by: Mdhaffar, Salima, et al.
Published: (2024)
Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech
by: Yang, Dong, et al.
Published: (2024)
by: Yang, Dong, et al.
Published: (2024)
UniEnc-CASSNAT: An Encoder-only Non-autoregressive ASR for Speech SSL Models
by: Fan, Ruchao, et al.
Published: (2024)
by: Fan, Ruchao, et al.
Published: (2024)
Quantizing Whisper-small: How design choices affect ASR performance
by: Söhler, Arthur, et al.
Published: (2025)
by: Söhler, Arthur, et al.
Published: (2025)
Target Speaker ASR with Whisper
by: Polok, Alexander, et al.
Published: (2024)
by: Polok, Alexander, et al.
Published: (2024)
Index-ASR Technical Report
by: Song, Zheshu, et al.
Published: (2025)
by: Song, Zheshu, et al.
Published: (2025)
SNR-Progressive Model with Harmonic Compensation for Low-SNR Speech Enhancement
by: Hou, Zhongshu, et al.
Published: (2024)
by: Hou, Zhongshu, et al.
Published: (2024)
Continual Learning Optimizations for Auto-regressive Decoder of Multilingual ASR systems
by: Kwok, Chin Yuen, et al.
Published: (2024)
by: Kwok, Chin Yuen, et al.
Published: (2024)
Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models
by: Feng, Chen, et al.
Published: (2025)
by: Feng, Chen, et al.
Published: (2025)
Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty
by: Xue, Hongfei, et al.
Published: (2025)
by: Xue, Hongfei, et al.
Published: (2025)
Enhancing Intelligibility for Generative Target Speech Extraction via Joint Optimization with Target Speaker ASR
by: Ma, Hao, et al.
Published: (2025)
by: Ma, Hao, et al.
Published: (2025)
Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR
by: Li, Longhao, et al.
Published: (2025)
by: Li, Longhao, et al.
Published: (2025)
Similar Items
-
Post-Training Quantization for Audio Diffusion Transformers
by: Khandelwal, Tanmay, et al.
Published: (2025) -
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
by: Xu, Kai-Tuo, et al.
Published: (2025) -
Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
by: Guo, Haohan, et al.
Published: (2024) -
Training Large ASR Encoders with Differential Privacy
by: Chauhan, Geeticka, et al.
Published: (2024) -
Fx-Encoder++: Extracting Instrument-Wise Audio Effects Representations from Mixtures
by: Yeh, Yen-Tung, et al.
Published: (2025)