:: Library Catalog

Image de couverture de livre

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Xia, Yinfeng, Tang, Jian, Hou, Junfeng, Xu, Gaopeng, Yao, Haitao
Format:	Preprint
Publié:	2026
Sujets:	Sound Computation and Language
Accès en ligne:	https://arxiv.org/abs/2603.11123
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

Documents similaires

MFLA: Monotonic Finite Look-ahead Attention for Streaming Speech Recognition
par: Xia, Yinfeng, et autres
Publié: (2025)

StreamUni: Achieving Streaming Speech Translation with a Unified Large Speech-Language Model
par: Guo, Shoutao, et autres
Publié: (2025)

WhisperPipe: A Resource-Efficient Streaming Architecture for Real-Time Automatic Speech Recognition
par: Ramezani, Erfan, et autres
Publié: (2026)

Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation
par: Shakeel, Muhammad, et autres
Publié: (2024)

Efficient Streaming LLM for Speech Recognition
par: Jia, Junteng, et autres
Publié: (2024)

Moonshine v2: Ergodic Streaming Encoder ASR for Latency-Critical Speech Applications
par: Kudlur, Manjunath, et autres
Publié: (2026)

UniEnc-CASSNAT: An Encoder-only Non-autoregressive ASR for Speech SSL Models
par: Fan, Ruchao, et autres
Publié: (2024)

Semi-Autoregressive Streaming ASR With Label Context
par: Arora, Siddhant, et autres
Publié: (2023)

Mamba for Streaming ASR Combined with Unimodal Aggregation
par: Fang, Ying, et autres
Publié: (2024)

Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time
par: Seide, Frank, et autres
Publié: (2024)

Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children
par: Ahn, Taekyung, et autres
Publié: (2024)

DiariST: Streaming Speech Translation with Speaker Diarization
par: Yang, Mu, et autres
Publié: (2023)

VocalNet-MDM: Accelerating Streaming Speech LLM via Self-Distilled Masked Diffusion Modeling
par: Cheng, Ziyang, et autres
Publié: (2026)

SpeakStream: Streaming Text-to-Speech with Interleaved Data
par: Bai, Richard He, et autres
Publié: (2025)

Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper
par: Thorbecke, Iuliia, et autres
Publié: (2024)

DARS: Dysarthria-Aware Rhythm-Style Synthesis for ASR Enhancement
par: Wu, Minghui, et autres
Publié: (2026)

UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
par: Liu, Zhenyu, et autres
Publié: (2025)

ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark
par: Wang, He, et autres
Publié: (2025)

NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR
par: Xie, Yuan, et autres
Publié: (2026)

CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition
par: Sung, Hung-Yang, et autres
Publié: (2025)

Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR
par: Bai, Junwen, et autres
Publié: (2024)

CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
par: Zhang, Tian-Hao, et autres
Publié: (2023)

Elderly-Contextual Data Augmentation via Speech Synthesis for Elderly ASR
par: Lee, Minsik, et autres
Publié: (2026)

Automatic Speech Recognition for Hindi
par: Saha, Anish, et autres
Publié: (2024)

SSCFormer: Push the Limit of Chunk-wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution
par: Wang, Fangyuan, et autres
Publié: (2022)

Spiralformer: Low Latency Encoder for Streaming Speech Recognition with Circular Layer Skipping and Early Exiting
par: Tsunoo, Emiru, et autres
Publié: (2025)

Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling
par: McGuire, Michael
Publié: (2025)

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
par: Wang, Yujin, et autres
Publié: (2022)

Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax
par: Patil, Aditya, et autres
Publié: (2024)

Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction
par: Ko, Yuka, et autres
Publié: (2024)

Automatic Speech Recognition Biases in Newcastle English: an Error Analysis
par: Serditova, Dana, et autres
Publié: (2025)

Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
par: Yan, Canxiang, et autres
Publié: (2025)

StreamAAD: Decoding Spatial Auditory Attention with a Streaming Architecture
par: Qiu, Zelin, et autres
Publié: (2024)

ASR-FAIRBENCH: Measuring and Benchmarking Equity Across Speech Recognition Systems
par: Rai, Anand, et autres
Publié: (2025)

TASTE-Streaming: Towards Streamable Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
par: Tseng, Liang-Hsuan, et autres
Publié: (2026)

Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications
par: Wills, Simone, et autres
Publié: (2023)

TG-ASR: Translation-Guided Learning with Parallel Gated Cross Attention for Low-Resource Automatic Speech Recognition
par: Yang, Cheng-Yeh, et autres
Publié: (2026)

A Unified Speech LLM for Diarization and Speech Recognition in Multilingual Conversations
par: Saengthong, Phurich, et autres
Publié: (2025)

Huntington Disease Automatic Speech Recognition with Biomarker Supervision
par: Wang, Charles L., et autres
Publié: (2026)

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
par: Zhang, Shaolei, et autres
Publié: (2024)