:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Yang, Yiru
Format:	Preprint
Published:	2025
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2507.10313
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Unified Denoising and Adaptation Framework for Self-Supervised Bengali Dialectal ASR
by: Biswas, Swadhin, et al.
Published: (2025)

SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
by: Zhao, Qiuming, et al.
Published: (2024)

Lightweight Target-Speaker-Based Overlap Transcription for Practical Streaming ASR
by: Pražák, Aleš, et al.
Published: (2025)

VoiceTailor: Lightweight Plug-In Adapter for Diffusion-Based Personalized Text-to-Speech
by: Kim, Heeseung, et al.
Published: (2024)

Efficient Multilingual ASR Finetuning via LoRA Language Experts
by: Li, Jiahong, et al.
Published: (2025)

Synthetic Data Domain Adaptation for ASR via LLM-based Text and Phonetic Respelling Augmentation
by: Yamashita, Natsuo, et al.
Published: (2026)

Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR
by: Li, Longhao, et al.
Published: (2025)

SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker Recognition
by: Wang, Tianhao, et al.
Published: (2024)

HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
by: Mu, Bingshen, et al.
Published: (2024)

Overlap-Adaptive Hybrid Speaker Diarization and ASR-Aware Observation Addition for MISP 2025 Challenge
by: Huang, Shangkun, et al.
Published: (2025)

Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models
by: Simic, Christopher, et al.
Published: (2025)

Index-ASR Technical Report
by: Song, Zheshu, et al.
Published: (2025)

Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper
by: Thorbecke, Iuliia, et al.
Published: (2024)

SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
by: Fan, Zhiyun, et al.
Published: (2024)

Speech Recognition on TV Series with Video-guided Post-ASR Correction
by: Yang, Haoyuan, et al.
Published: (2025)

SpecTokenizer: A Lightweight Streaming Codec in the Compressed Spectrum Domain
by: Wan, Zixiang, et al.
Published: (2025)

SOA: Reducing Domain Mismatch in SSL Pipeline by Speech Only Adaptation for Low Resource ASR
by: Shankar, Natarajan Balaji, et al.
Published: (2024)

Distilling Spectrograms into Tokens: Fast and Lightweight Bioacoustic Classification for BirdCLEF+ 2025
by: Miyaguchi, Anthony, et al.
Published: (2025)

Lightweight Resolution-Aware Audio Deepfake Detection via Cross-Scale Attention and Consistency Learning
by: Shahriar, K. A.
Published: (2026)

Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper
by: Xu, Tianyi, et al.
Published: (2024)

LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
by: Liu, Wei, et al.
Published: (2024)

Audio Prompt Adapter: Unleashing Music Editing Abilities for Text-to-Music with Lightweight Finetuning
by: Tsai, Fang-Duo, et al.
Published: (2024)

Monaural speech enhancement on drone via Adapter based transfer learning
by: Chen, Xingyu, et al.
Published: (2024)

Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR
by: Wang, Weiqing, et al.
Published: (2025)

Efficient Rehearsal for Continual Learning in ASR via Singular Value Tuning
by: Eeckt, Steven Vander, et al.
Published: (2026)

Target Speaker ASR with Whisper
by: Polok, Alexander, et al.
Published: (2024)

A Language-Agnostic Hierarchical LoRA-MoE Architecture for CTC-based Multilingual ASR
by: Zheng, Yuang, et al.
Published: (2026)

kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
by: Zhou, Jiaming, et al.
Published: (2023)

Towards Decoupling Frontend Enhancement and Backend Recognition in Monaural Robust ASR
by: Yang, Yufeng, et al.
Published: (2024)

Efficient Scaling for LLM-based ASR
by: Mu, Bingshen, et al.
Published: (2025)

Speech Emotion Recognition with ASR Integration
by: Li, Yuanchao
Published: (2026)

Speech Denoising with Auditory Models
by: Saddler, Mark R., et al.
Published: (2020)

Multi-Channel Differential ASR for Robust Wearer Speech Recognition on Smart Glasses
by: Yang, Yufeng, et al.
Published: (2025)

PromptASR for contextualized ASR with controllable style
by: Yang, Xiaoyu, et al.
Published: (2023)

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
by: Kang, Wei, et al.
Published: (2023)

SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative Decoding
by: Wei, Linye, et al.
Published: (2025)

Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR
by: Bai, Junwen, et al.
Published: (2024)

Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition
by: Zhang, Yiru, et al.
Published: (2025)

Technical Report: A Practical Guide to Kaldi ASR Optimization
by: Hong, Mengze, et al.
Published: (2025)

Elevating Robust Multi-Talker ASR by Decoupling Speaker Separation and Speech Recognition
by: Yang, Yufeng, et al.
Published: (2025)