Saved in:
| Main Authors: | An, Junjie, Tian, Jingguang, Wang, Tianyi, Gao, Yu, Mou, Xiaofeng, Xu, Yi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.12287 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Adaptive Speaker Embedding Self-Augmentation for Personal Voice Activity Detection with Short Enrollment Speech
by: Feng, Fuyuan, et al.
Published: (2026)
by: Feng, Fuyuan, et al.
Published: (2026)
Retrieval Augmented Correction of Named Entity Speech Recognition Errors
by: Pusateri, Ernest, et al.
Published: (2024)
by: Pusateri, Ernest, et al.
Published: (2024)
End-to-End Direction-Aware Keyword Spotting with Spatial Priors in Noisy Environments
by: Wang, Rui, et al.
Published: (2026)
by: Wang, Rui, et al.
Published: (2026)
Learning Emotion-Invariant Speaker Representations for Speaker Verification
by: Tian, Jingguang, et al.
Published: (2025)
by: Tian, Jingguang, et al.
Published: (2025)
RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval
by: Sun, Haoqin, et al.
Published: (2025)
by: Sun, Haoqin, et al.
Published: (2025)
Discrete Audio Representations for Automated Audio Captioning
by: Tian, Jingguang, et al.
Published: (2025)
by: Tian, Jingguang, et al.
Published: (2025)
Semi-supervised Learning for Code-Switching ASR with Large Language Model Filter
by: Xi, Yu, et al.
Published: (2024)
by: Xi, Yu, et al.
Published: (2024)
Retrieval Augmented Generation based context discovery for ASR
by: Siskos, Dimitrios, et al.
Published: (2025)
by: Siskos, Dimitrios, et al.
Published: (2025)
Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
by: Ghosh, Sreyan, et al.
Published: (2024)
by: Ghosh, Sreyan, et al.
Published: (2024)
Enhancing Automatic Chord Recognition through LLM Chain-of-Thought Reasoning
by: Chang, Chih-Cheng, et al.
Published: (2025)
by: Chang, Chih-Cheng, et al.
Published: (2025)
MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR
by: Li, Junjie, et al.
Published: (2025)
by: Li, Junjie, et al.
Published: (2025)
Contextual Biasing for LLM-Based ASR with Hotword Retrieval and Reinforcement Learning
by: Kong, YuXiang, et al.
Published: (2025)
by: Kong, YuXiang, et al.
Published: (2025)
Interpretable Audio Editing Evaluation via Chain-of-Thought Difference-Commonality Reasoning with Multimodal LLMs
by: Jia, Yuhang, et al.
Published: (2025)
by: Jia, Yuhang, et al.
Published: (2025)
Romanization Encoding For Multilingual ASR
by: Ding, Wen, et al.
Published: (2024)
by: Ding, Wen, et al.
Published: (2024)
Performant ASR Models for Medical Entities in Accented Speech
by: Afonja, Tejumade, et al.
Published: (2024)
by: Afonja, Tejumade, et al.
Published: (2024)
persoDA: Personalized Data Augmentation for Personalized ASR
by: Parada, Pablo Peso, et al.
Published: (2025)
by: Parada, Pablo Peso, et al.
Published: (2025)
Large Language Model Should Understand Pinyin for Chinese ASR Error Correction
by: Li, Yuang, et al.
Published: (2024)
by: Li, Yuang, et al.
Published: (2024)
The RoyalFlush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge
by: Tian, Jingguang, et al.
Published: (2024)
by: Tian, Jingguang, et al.
Published: (2024)
XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models
by: Kumar, Shashi, et al.
Published: (2024)
by: Kumar, Shashi, et al.
Published: (2024)
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
by: Yang, Guanrou, et al.
Published: (2024)
by: Yang, Guanrou, et al.
Published: (2024)
Chain-of-Thought Reasoning in Streaming Full-Duplex End-to-End Spoken Dialogue Systems
by: Arora, Siddhant, et al.
Published: (2025)
by: Arora, Siddhant, et al.
Published: (2025)
Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
by: Wang, Peng, et al.
Published: (2023)
by: Wang, Peng, et al.
Published: (2023)
LA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented Generation
by: Li, Shaojun, et al.
Published: (2024)
by: Li, Shaojun, et al.
Published: (2024)
BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM
by: Gong, Xun, et al.
Published: (2025)
by: Gong, Xun, et al.
Published: (2025)
A Semantic Information-based Hierarchical Speech Enhancement Method Using Factorized Codec and Diffusion Model
by: Xiang, Yang, et al.
Published: (2025)
by: Xiang, Yang, et al.
Published: (2025)
CHSER: A Dataset and Case Study on Generative Speech Error Correction for Child ASR
by: Shankar, Natarajan Balaji, et al.
Published: (2025)
by: Shankar, Natarajan Balaji, et al.
Published: (2025)
Self-Speculative Decoding for LLM-based ASR with CTC Encoder Drafts
by: Saon, George, et al.
Published: (2026)
by: Saon, George, et al.
Published: (2026)
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
by: Zhao, Qiuming, et al.
Published: (2024)
by: Zhao, Qiuming, et al.
Published: (2024)
Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment
by: Shao, Yiwen, et al.
Published: (2024)
by: Shao, Yiwen, et al.
Published: (2024)
Consistency Based Unsupervised Self-training For ASR Personalisation
by: Zhang, Jisi, et al.
Published: (2024)
by: Zhang, Jisi, et al.
Published: (2024)
Large Language Models based ASR Error Correction for Child Conversations
by: Xu, Anfeng, et al.
Published: (2025)
by: Xu, Anfeng, et al.
Published: (2025)
Mind the Gap: Entity-Preserved Context-Aware ASR Structured Transcriptions
by: Altinok, Duygu
Published: (2025)
by: Altinok, Duygu
Published: (2025)
AdaLTM: Adaptive Layer-wise Task Vector Merging for Categorical Speech Emotion Recognition with ASR Knowledge Integration
by: Lee, Chia-Yu, et al.
Published: (2026)
by: Lee, Chia-Yu, et al.
Published: (2026)
DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models
by: Li, Li, et al.
Published: (2026)
by: Li, Li, et al.
Published: (2026)
Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion
by: Kamble, Anand, et al.
Published: (2023)
by: Kamble, Anand, et al.
Published: (2023)
kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
by: Zhou, Jiaming, et al.
Published: (2023)
by: Zhou, Jiaming, et al.
Published: (2023)
Medical Spoken Named Entity Recognition
by: Le-Duc, Khai, et al.
Published: (2024)
by: Le-Duc, Khai, et al.
Published: (2024)
Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction
by: Fang, Yangui, et al.
Published: (2025)
by: Fang, Yangui, et al.
Published: (2025)
Robust ASR Error Correction with Conservative Data Filtering
by: Udagawa, Takuma, et al.
Published: (2024)
by: Udagawa, Takuma, et al.
Published: (2024)
ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR
by: Singh, Vishwanath Pratap, et al.
Published: (2024)
by: Singh, Vishwanath Pratap, et al.
Published: (2024)
Similar Items
-
Adaptive Speaker Embedding Self-Augmentation for Personal Voice Activity Detection with Short Enrollment Speech
by: Feng, Fuyuan, et al.
Published: (2026) -
Retrieval Augmented Correction of Named Entity Speech Recognition Errors
by: Pusateri, Ernest, et al.
Published: (2024) -
End-to-End Direction-Aware Keyword Spotting with Spatial Priors in Noisy Environments
by: Wang, Rui, et al.
Published: (2026) -
Learning Emotion-Invariant Speaker Representations for Speaker Verification
by: Tian, Jingguang, et al.
Published: (2025) -
RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval
by: Sun, Haoqin, et al.
Published: (2025)