:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Yang, Shangguan, Yuan, Wang, Yuhao, Lai, Liangzhen, Chang, Ernie, Zhao, Changsheng, Shi, Yangyang, Chandra, Vikas
Format:	Preprint
Published:	2024
Subjects:	Sound Machine Learning Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2402.13076
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
by: Li, Yang, et al.
Published: (2023)

High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching
by: Lan, Gael Le, et al.
Published: (2024)

Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax
by: Patil, Aditya, et al.
Published: (2024)

CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
by: Zhao, Wenbo, et al.
Published: (2024)

Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers
by: Quan, Changsheng, et al.
Published: (2024)

Scaling Multi-Talker ASR with Speaker-Agnostic Activity Streams
by: He, Xiluo, et al.
Published: (2025)

Lightweight Target-Speaker-Based Overlap Transcription for Practical Streaming ASR
by: Pražák, Aleš, et al.
Published: (2025)

Mobile Recording Device Recognition Based Cross-Scale and Multi-Level Representation Learning
by: Zeng, Chunyan, et al.
Published: (2024)

Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR
by: Li, Longhao, et al.
Published: (2025)

Breaking the Barriers of Text-Hungry and Audio-Deficient AI
by: Tembine, Hamidou, et al.
Published: (2025)

Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation
by: Chang, Yi, et al.
Published: (2024)

Semi-Autoregressive Streaming ASR With Label Context
by: Arora, Siddhant, et al.
Published: (2023)

Mamba for Streaming ASR Combined with Unimodal Aggregation
by: Fang, Ying, et al.
Published: (2024)

SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training
by: Mei, Xinhao, et al.
Published: (2026)

DITTO: Data-efficient and Fair Targeted Subset Selection for ASR Accent Adaptation
by: Kothawade, Suraj, et al.
Published: (2021)

Unifying Streaming and Non-streaming Zipformer-based ASR
by: Sharma, Bidisha, et al.
Published: (2025)

SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from Text
by: Liu, Haohe, et al.
Published: (2024)

SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
by: Guo, Pengcheng, et al.
Published: (2024)

Speaker Adaptation for Quantised End-to-End ASR Models
by: Zhao, Qiuming, et al.
Published: (2024)

Romanization Encoding For Multilingual ASR
by: Ding, Wen, et al.
Published: (2024)

Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper
by: Thorbecke, Iuliia, et al.
Published: (2024)

Target Speaker ASR with Whisper
by: Polok, Alexander, et al.
Published: (2024)

Index-ASR Technical Report
by: Song, Zheshu, et al.
Published: (2025)

The USTC-NERCSLIP Systems for The ICMC-ASR Challenge
by: Wu, Minghui, et al.
Published: (2024)

Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER
by: Zheng, Xiuwen, et al.
Published: (2026)

Crossmodal ASR Error Correction with Discrete Speech Units
by: Li, Yuanchao, et al.
Published: (2024)

Speech Emotion Recognition with ASR Integration
by: Li, Yuanchao
Published: (2026)

Efficient Scaling for LLM-based ASR
by: Mu, Bingshen, et al.
Published: (2025)

SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
by: Zhao, Qiuming, et al.
Published: (2024)

kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
by: Zhou, Jiaming, et al.
Published: (2023)

EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios
by: Srivastava, Tejes, et al.
Published: (2023)

BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer
by: Chang, Chih-Cheng, et al.
Published: (2023)

dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition
by: Tian, Wenjie, et al.
Published: (2026)

Breaking the Transcription Bottleneck: Fine-tuning ASR Models for Extremely Low-Resource Fieldwork Languages
by: Liang, Siyu, et al.
Published: (2025)

NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR
by: Xie, Yuan, et al.
Published: (2026)

LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
by: Liu, Wei, et al.
Published: (2024)

persoDA: Personalized Data Augmentation for Personalized ASR
by: Parada, Pablo Peso, et al.
Published: (2025)

Comparative Analysis of ASR Methods for Speech Deepfake Detection
by: Salvi, Davide, et al.
Published: (2024)

Consistency Based Unsupervised Self-training For ASR Personalisation
by: Zhang, Jisi, et al.
Published: (2024)

Joint ASR and Speaker Role Tagging with Serialized Output Training
by: Xu, Anfeng, et al.
Published: (2025)