:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wan, Genshun, Wang, Mengzhi, Mao, Tingzhi, Chen, Hang, Ye, Zhongfu
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2409.13698
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Deep CLAS: Deep Contextual Listen, Attend and Spell
by: Wang, Mengzhi, et al.
Published: (2024)

Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization
by: Wan, Genshun, et al.
Published: (2026)

Promptformer: Prompted Conformer Transducer for ASR
by: Duarte-Torres, Sergio, et al.
Published: (2024)

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
by: Wang, Peng, et al.
Published: (2023)

Self-Supervised Learning for Multi-Channel Neural Transducer
by: Kojima, Atsushi
Published: (2024)

Joint Beam Search Integrating CTC, Attention, and Transducer Decoders
by: Sudo, Yui, et al.
Published: (2024)

Alignment-Free Training for Transducer-based Multi-Talker ASR
by: Moriya, Takafumi, et al.
Published: (2024)

Enhanced Hybrid Transducer and Attention Encoder Decoder with Text Data
by: Tang, Yun, et al.
Published: (2025)

Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper
by: Thorbecke, Iuliia, et al.
Published: (2024)

TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
by: Bataev, Vladimir, et al.
Published: (2025)

CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
by: Zhang, Tian-Hao, et al.
Published: (2023)

TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR
by: Kumar, Shashi, et al.
Published: (2024)

Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding
by: Moriya, Takafumi, et al.
Published: (2024)

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
by: Wang, Peidong, et al.
Published: (2025)

Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
by: Xu, Hainan, et al.
Published: (2024)

Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models
by: Zhang, Jing-Xuan, et al.
Published: (2025)

Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models
by: Wang, Haoyu, et al.
Published: (2022)

Lightweight Audio Segmentation for Long-form Speech Translation
by: Lee, Jaesong, et al.
Published: (2024)

Re-Parameterization of Lightweight Transformer for On-Device Speech Emotion Recognition
by: Zhang, Zixing, et al.
Published: (2024)

Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
by: Kim, Eungbeom, et al.
Published: (2024)

Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models
by: Prabhavalkar, Rohit, et al.
Published: (2024)

On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
by: Yang, Zijian, et al.
Published: (2023)

BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition
by: Jiang, Liuyuan, et al.
Published: (2025)

STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models
by: Jang, Kangwook, et al.
Published: (2023)

SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word Level
by: Tee, Hitomi Jin Ling, et al.
Published: (2025)

Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction
by: Kim, Minchan, et al.
Published: (2024)

DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition
by: Shao, Hang, et al.
Published: (2023)

Word Level Timestamp Generation for Automatic Speech Recognition and Translation
by: Hu, Ke, et al.
Published: (2025)

Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference
by: Casanova, Edresson, et al.
Published: (2024)

Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis
by: Wang, Xintong, et al.
Published: (2024)

The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
by: Niu, Shutong, et al.
Published: (2024)

Task-Agnostic Structured Pruning of Speech Representation Models
by: Wang, Haoyu, et al.
Published: (2023)

Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis
by: Xu, Tianyi, et al.
Published: (2025)

Joint Learning of Wording and Formatting for Singable Melody-to-Lyric Generation
by: Ou, Longshen, et al.
Published: (2023)

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
by: Zhang, Yu, et al.
Published: (2024)

ArabEmoNet: A Lightweight Hybrid 2D CNN-BiLSTM Model with Attention for Robust Arabic Speech Emotion Recognition
by: Abouzeid, Ali, et al.
Published: (2025)

MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research
by: Li, Song, et al.
Published: (2024)

BoSS: Beyond-Semantic Speech
by: Wang, Qing, et al.
Published: (2025)

DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation
by: Chen, Ziqi, et al.
Published: (2025)

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
by: Wei, Kun, et al.
Published: (2023)