:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Xinyu, Zhao, Ziyu, Luo, Yajie, Wu, Yihong, Ma, Liheng, Tian, Jingrui, Ding, Lei, Chang, Xiao-Wen, Lu, Peng
Format:	Preprint
Published:	2026
Subjects:	Sound Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2601.02455
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Post-Training Quantization for Audio Diffusion Transformers
by: Khandelwal, Tanmay, et al.
Published: (2025)

FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
by: Xu, Kai-Tuo, et al.
Published: (2025)

Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
by: Guo, Haohan, et al.
Published: (2024)

Training Large ASR Encoders with Differential Privacy
by: Chauhan, Geeticka, et al.
Published: (2024)

Fx-Encoder++: Extracting Instrument-Wise Audio Effects Representations from Mixtures
by: Yeh, Yen-Tung, et al.
Published: (2025)

Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER
by: Zheng, Xiuwen, et al.
Published: (2026)

CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
by: Zhao, Wenbo, et al.
Published: (2024)

SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
by: Guo, Pengcheng, et al.
Published: (2024)

Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder
by: Shakeel, Muhammad, et al.
Published: (2025)

BrainWhisperer: Leveraging Large-Scale ASR Models for Neural Speech Decoding
by: Boccato, Tommaso, et al.
Published: (2026)

dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition
by: Tian, Wenjie, et al.
Published: (2026)

SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative Decoding
by: Wei, Linye, et al.
Published: (2025)

Decoding Strategies for Diffusion-Based ASR: A Systematic Evaluation of Confidence-Based Thresholding
by: Yeo, Jeong Hun, et al.
Published: (2026)

CJST: CTC Compressor based Joint Speech and Text Training for Decoder-Only ASR
by: Zhou, Wei, et al.
Published: (2024)

Rethinking Entropy Allocation in LLM-based ASR: Understanding the Dynamics between Speech Encoders and LLMs
by: Xie, Yuan, et al.
Published: (2026)

Asymmetric Encoder-Decoder Based on Time-Frequency Correlation for Speech Separation
by: Shin, Ui-Hyeop, et al.
Published: (2026)

Efficient Scaling for LLM-based ASR
by: Mu, Bingshen, et al.
Published: (2025)

Joint ASR and Speaker Role Tagging with Serialized Output Training
by: Xu, Anfeng, et al.
Published: (2025)

Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders
by: Violeta, Lester Phillip, et al.
Published: (2023)

Romanization Encoding For Multilingual ASR
by: Ding, Wen, et al.
Published: (2024)

Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints
by: Lee, PeiYing, et al.
Published: (2024)

Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision
by: Li, Zhaoqing, et al.
Published: (2025)

Causal Structure Discovery for Error Diagnostics of Children's ASR
by: Singh, Vishwanath Pratap, et al.
Published: (2025)

SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
by: Fan, Zhiyun, et al.
Published: (2024)

Investigating the Effect of Label Topology and Training Criterion on ASR Performance and Alignment Quality
by: Raissi, Tina, et al.
Published: (2024)

Right Label Context in End-to-End Training of Time-Synchronous ASR Models
by: Raissi, Tina, et al.
Published: (2025)

Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token-based ASR
by: Chen, Qian, et al.
Published: (2023)

Are Transformers in Pre-trained LM A Good ASR Encoder? An Empirical Study
by: An, Keyu, et al.
Published: (2024)

Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect
by: Mdhaffar, Salima, et al.
Published: (2024)

Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech
by: Yang, Dong, et al.
Published: (2024)

UniEnc-CASSNAT: An Encoder-only Non-autoregressive ASR for Speech SSL Models
by: Fan, Ruchao, et al.
Published: (2024)

Quantizing Whisper-small: How design choices affect ASR performance
by: Söhler, Arthur, et al.
Published: (2025)

Target Speaker ASR with Whisper
by: Polok, Alexander, et al.
Published: (2024)

Index-ASR Technical Report
by: Song, Zheshu, et al.
Published: (2025)

SNR-Progressive Model with Harmonic Compensation for Low-SNR Speech Enhancement
by: Hou, Zhongshu, et al.
Published: (2024)

Continual Learning Optimizations for Auto-regressive Decoder of Multilingual ASR systems
by: Kwok, Chin Yuen, et al.
Published: (2024)

Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models
by: Feng, Chen, et al.
Published: (2025)

Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty
by: Xue, Hongfei, et al.
Published: (2025)

Enhancing Intelligibility for Generative Target Speech Extraction via Joint Optimization with Target Speaker ASR
by: Ma, Hao, et al.
Published: (2025)

Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR
by: Li, Longhao, et al.
Published: (2025)