Saved in:
| Main Authors: | Li, Yang, Shangguan, Yuan, Wang, Yuhao, Lai, Liangzhen, Chang, Ernie, Zhao, Changsheng, Shi, Yangyang, Chandra, Vikas |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.13076 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
by: Li, Yang, et al.
Published: (2023)
by: Li, Yang, et al.
Published: (2023)
High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching
by: Lan, Gael Le, et al.
Published: (2024)
by: Lan, Gael Le, et al.
Published: (2024)
Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax
by: Patil, Aditya, et al.
Published: (2024)
by: Patil, Aditya, et al.
Published: (2024)
CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
by: Zhao, Wenbo, et al.
Published: (2024)
by: Zhao, Wenbo, et al.
Published: (2024)
Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers
by: Quan, Changsheng, et al.
Published: (2024)
by: Quan, Changsheng, et al.
Published: (2024)
Scaling Multi-Talker ASR with Speaker-Agnostic Activity Streams
by: He, Xiluo, et al.
Published: (2025)
by: He, Xiluo, et al.
Published: (2025)
Lightweight Target-Speaker-Based Overlap Transcription for Practical Streaming ASR
by: Pražák, Aleš, et al.
Published: (2025)
by: Pražák, Aleš, et al.
Published: (2025)
Mobile Recording Device Recognition Based Cross-Scale and Multi-Level Representation Learning
by: Zeng, Chunyan, et al.
Published: (2024)
by: Zeng, Chunyan, et al.
Published: (2024)
Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR
by: Li, Longhao, et al.
Published: (2025)
by: Li, Longhao, et al.
Published: (2025)
Breaking the Barriers of Text-Hungry and Audio-Deficient AI
by: Tembine, Hamidou, et al.
Published: (2025)
by: Tembine, Hamidou, et al.
Published: (2025)
Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation
by: Chang, Yi, et al.
Published: (2024)
by: Chang, Yi, et al.
Published: (2024)
Semi-Autoregressive Streaming ASR With Label Context
by: Arora, Siddhant, et al.
Published: (2023)
by: Arora, Siddhant, et al.
Published: (2023)
Mamba for Streaming ASR Combined with Unimodal Aggregation
by: Fang, Ying, et al.
Published: (2024)
by: Fang, Ying, et al.
Published: (2024)
SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training
by: Mei, Xinhao, et al.
Published: (2026)
by: Mei, Xinhao, et al.
Published: (2026)
DITTO: Data-efficient and Fair Targeted Subset Selection for ASR Accent Adaptation
by: Kothawade, Suraj, et al.
Published: (2021)
by: Kothawade, Suraj, et al.
Published: (2021)
Unifying Streaming and Non-streaming Zipformer-based ASR
by: Sharma, Bidisha, et al.
Published: (2025)
by: Sharma, Bidisha, et al.
Published: (2025)
SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from Text
by: Liu, Haohe, et al.
Published: (2024)
by: Liu, Haohe, et al.
Published: (2024)
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
by: Guo, Pengcheng, et al.
Published: (2024)
by: Guo, Pengcheng, et al.
Published: (2024)
Speaker Adaptation for Quantised End-to-End ASR Models
by: Zhao, Qiuming, et al.
Published: (2024)
by: Zhao, Qiuming, et al.
Published: (2024)
Romanization Encoding For Multilingual ASR
by: Ding, Wen, et al.
Published: (2024)
by: Ding, Wen, et al.
Published: (2024)
Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper
by: Thorbecke, Iuliia, et al.
Published: (2024)
by: Thorbecke, Iuliia, et al.
Published: (2024)
Target Speaker ASR with Whisper
by: Polok, Alexander, et al.
Published: (2024)
by: Polok, Alexander, et al.
Published: (2024)
Index-ASR Technical Report
by: Song, Zheshu, et al.
Published: (2025)
by: Song, Zheshu, et al.
Published: (2025)
The USTC-NERCSLIP Systems for The ICMC-ASR Challenge
by: Wu, Minghui, et al.
Published: (2024)
by: Wu, Minghui, et al.
Published: (2024)
Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER
by: Zheng, Xiuwen, et al.
Published: (2026)
by: Zheng, Xiuwen, et al.
Published: (2026)
Crossmodal ASR Error Correction with Discrete Speech Units
by: Li, Yuanchao, et al.
Published: (2024)
by: Li, Yuanchao, et al.
Published: (2024)
Speech Emotion Recognition with ASR Integration
by: Li, Yuanchao
Published: (2026)
by: Li, Yuanchao
Published: (2026)
Efficient Scaling for LLM-based ASR
by: Mu, Bingshen, et al.
Published: (2025)
by: Mu, Bingshen, et al.
Published: (2025)
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
by: Zhao, Qiuming, et al.
Published: (2024)
by: Zhao, Qiuming, et al.
Published: (2024)
kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
by: Zhou, Jiaming, et al.
Published: (2023)
by: Zhou, Jiaming, et al.
Published: (2023)
EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios
by: Srivastava, Tejes, et al.
Published: (2023)
by: Srivastava, Tejes, et al.
Published: (2023)
BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer
by: Chang, Chih-Cheng, et al.
Published: (2023)
by: Chang, Chih-Cheng, et al.
Published: (2023)
dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition
by: Tian, Wenjie, et al.
Published: (2026)
by: Tian, Wenjie, et al.
Published: (2026)
Breaking the Transcription Bottleneck: Fine-tuning ASR Models for Extremely Low-Resource Fieldwork Languages
by: Liang, Siyu, et al.
Published: (2025)
by: Liang, Siyu, et al.
Published: (2025)
NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR
by: Xie, Yuan, et al.
Published: (2026)
by: Xie, Yuan, et al.
Published: (2026)
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
by: Liu, Wei, et al.
Published: (2024)
by: Liu, Wei, et al.
Published: (2024)
persoDA: Personalized Data Augmentation for Personalized ASR
by: Parada, Pablo Peso, et al.
Published: (2025)
by: Parada, Pablo Peso, et al.
Published: (2025)
Comparative Analysis of ASR Methods for Speech Deepfake Detection
by: Salvi, Davide, et al.
Published: (2024)
by: Salvi, Davide, et al.
Published: (2024)
Consistency Based Unsupervised Self-training For ASR Personalisation
by: Zhang, Jisi, et al.
Published: (2024)
by: Zhang, Jisi, et al.
Published: (2024)
Joint ASR and Speaker Role Tagging with Serialized Output Training
by: Xu, Anfeng, et al.
Published: (2025)
by: Xu, Anfeng, et al.
Published: (2025)
Similar Items
-
Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
by: Li, Yang, et al.
Published: (2023) -
High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching
by: Lan, Gael Le, et al.
Published: (2024) -
Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax
by: Patil, Aditya, et al.
Published: (2024) -
CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
by: Zhao, Wenbo, et al.
Published: (2024) -
Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers
by: Quan, Changsheng, et al.
Published: (2024)