Saved in:
| Main Authors: | Wan, Genshun, Wang, Mengzhi, Mao, Tingzhi, Chen, Hang, Ye, Zhongfu |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.13698 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Deep CLAS: Deep Contextual Listen, Attend and Spell
by: Wang, Mengzhi, et al.
Published: (2024)
by: Wang, Mengzhi, et al.
Published: (2024)
Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization
by: Wan, Genshun, et al.
Published: (2026)
by: Wan, Genshun, et al.
Published: (2026)
Promptformer: Prompted Conformer Transducer for ASR
by: Duarte-Torres, Sergio, et al.
Published: (2024)
by: Duarte-Torres, Sergio, et al.
Published: (2024)
Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
by: Wang, Peng, et al.
Published: (2023)
by: Wang, Peng, et al.
Published: (2023)
Self-Supervised Learning for Multi-Channel Neural Transducer
by: Kojima, Atsushi
Published: (2024)
by: Kojima, Atsushi
Published: (2024)
Joint Beam Search Integrating CTC, Attention, and Transducer Decoders
by: Sudo, Yui, et al.
Published: (2024)
by: Sudo, Yui, et al.
Published: (2024)
Alignment-Free Training for Transducer-based Multi-Talker ASR
by: Moriya, Takafumi, et al.
Published: (2024)
by: Moriya, Takafumi, et al.
Published: (2024)
Enhanced Hybrid Transducer and Attention Encoder Decoder with Text Data
by: Tang, Yun, et al.
Published: (2025)
by: Tang, Yun, et al.
Published: (2025)
Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper
by: Thorbecke, Iuliia, et al.
Published: (2024)
by: Thorbecke, Iuliia, et al.
Published: (2024)
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
by: Bataev, Vladimir, et al.
Published: (2025)
by: Bataev, Vladimir, et al.
Published: (2025)
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
by: Zhang, Tian-Hao, et al.
Published: (2023)
by: Zhang, Tian-Hao, et al.
Published: (2023)
TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR
by: Kumar, Shashi, et al.
Published: (2024)
by: Kumar, Shashi, et al.
Published: (2024)
Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding
by: Moriya, Takafumi, et al.
Published: (2024)
by: Moriya, Takafumi, et al.
Published: (2024)
Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
by: Wang, Peidong, et al.
Published: (2025)
by: Wang, Peidong, et al.
Published: (2025)
Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
by: Xu, Hainan, et al.
Published: (2024)
by: Xu, Hainan, et al.
Published: (2024)
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models
by: Zhang, Jing-Xuan, et al.
Published: (2025)
by: Zhang, Jing-Xuan, et al.
Published: (2025)
Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models
by: Wang, Haoyu, et al.
Published: (2022)
by: Wang, Haoyu, et al.
Published: (2022)
Lightweight Audio Segmentation for Long-form Speech Translation
by: Lee, Jaesong, et al.
Published: (2024)
by: Lee, Jaesong, et al.
Published: (2024)
Re-Parameterization of Lightweight Transformer for On-Device Speech Emotion Recognition
by: Zhang, Zixing, et al.
Published: (2024)
by: Zhang, Zixing, et al.
Published: (2024)
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
by: Kim, Eungbeom, et al.
Published: (2024)
by: Kim, Eungbeom, et al.
Published: (2024)
Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models
by: Prabhavalkar, Rohit, et al.
Published: (2024)
by: Prabhavalkar, Rohit, et al.
Published: (2024)
On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
by: Yang, Zijian, et al.
Published: (2023)
by: Yang, Zijian, et al.
Published: (2023)
BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition
by: Jiang, Liuyuan, et al.
Published: (2025)
by: Jiang, Liuyuan, et al.
Published: (2025)
STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models
by: Jang, Kangwook, et al.
Published: (2023)
by: Jang, Kangwook, et al.
Published: (2023)
SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word Level
by: Tee, Hitomi Jin Ling, et al.
Published: (2025)
by: Tee, Hitomi Jin Ling, et al.
Published: (2025)
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction
by: Kim, Minchan, et al.
Published: (2024)
by: Kim, Minchan, et al.
Published: (2024)
DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition
by: Shao, Hang, et al.
Published: (2023)
by: Shao, Hang, et al.
Published: (2023)
Word Level Timestamp Generation for Automatic Speech Recognition and Translation
by: Hu, Ke, et al.
Published: (2025)
by: Hu, Ke, et al.
Published: (2025)
Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference
by: Casanova, Edresson, et al.
Published: (2024)
by: Casanova, Edresson, et al.
Published: (2024)
Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis
by: Wang, Xintong, et al.
Published: (2024)
by: Wang, Xintong, et al.
Published: (2024)
The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
by: Niu, Shutong, et al.
Published: (2024)
by: Niu, Shutong, et al.
Published: (2024)
Task-Agnostic Structured Pruning of Speech Representation Models
by: Wang, Haoyu, et al.
Published: (2023)
by: Wang, Haoyu, et al.
Published: (2023)
Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis
by: Xu, Tianyi, et al.
Published: (2025)
by: Xu, Tianyi, et al.
Published: (2025)
Joint Learning of Wording and Formatting for Singable Melody-to-Lyric Generation
by: Ou, Longshen, et al.
Published: (2023)
by: Ou, Longshen, et al.
Published: (2023)
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
by: Zhang, Yu, et al.
Published: (2024)
by: Zhang, Yu, et al.
Published: (2024)
ArabEmoNet: A Lightweight Hybrid 2D CNN-BiLSTM Model with Attention for Robust Arabic Speech Emotion Recognition
by: Abouzeid, Ali, et al.
Published: (2025)
by: Abouzeid, Ali, et al.
Published: (2025)
MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research
by: Li, Song, et al.
Published: (2024)
by: Li, Song, et al.
Published: (2024)
BoSS: Beyond-Semantic Speech
by: Wang, Qing, et al.
Published: (2025)
by: Wang, Qing, et al.
Published: (2025)
DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation
by: Chen, Ziqi, et al.
Published: (2025)
by: Chen, Ziqi, et al.
Published: (2025)
Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
by: Wei, Kun, et al.
Published: (2023)
by: Wei, Kun, et al.
Published: (2023)
Similar Items
-
Deep CLAS: Deep Contextual Listen, Attend and Spell
by: Wang, Mengzhi, et al.
Published: (2024) -
Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization
by: Wan, Genshun, et al.
Published: (2026) -
Promptformer: Prompted Conformer Transducer for ASR
by: Duarte-Torres, Sergio, et al.
Published: (2024) -
Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
by: Wang, Peng, et al.
Published: (2023) -
Self-Supervised Learning for Multi-Channel Neural Transducer
by: Kojima, Atsushi
Published: (2024)