:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kong, Xiangzhu, Ning, Tianqi, Huang, Hao, Ou, Zhijian
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2407.09807
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Lightweight and Robust Multi-Channel End-to-End Speech Recognition with Spherical Harmonic Transform
by: Kong, Xiangzhu, et al.
Published: (2025)

CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
by: Zhao, Wenbo, et al.
Published: (2024)

Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement
by: Dai, Wang, et al.
Published: (2024)

End-to-End Speech Recognition with Pre-trained Masked Language Model
by: Higuchi, Yosuke, et al.
Published: (2024)

Decoder-only Architecture for Streaming End-to-end Speech Recognition
by: Tsunoo, Emiru, et al.
Published: (2024)

Using Adapters to Overcome Catastrophic Forgetting in End-to-End Automatic Speech Recognition
by: Eeckt, Steven Vander, et al.
Published: (2022)

Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition
by: Chen, Jinming, et al.
Published: (2024)

Disentangled-Transformer: An Explainable End-to-End Automatic Speech Recognition Model with Speech Content-Context Separation
by: Wang, Pu, et al.
Published: (2024)

Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization
by: You, Jian, et al.
Published: (2025)

SoulX-Transcriber: A Robust End-to-End Framework for Multi-Speaker Speech Transcription
by: Dai, Yuhang, et al.
Published: (2026)

Breaking Walls: Pioneering Automatic Speech Recognition for Central Kurdish: End-to-End Transformer Paradigm
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)

IKFST: IOO and KOO Algorithms for Accelerated and Precise WFST-based End-to-End Automatic Speech Recognition
by: Zhuang, Zhuoran, et al.
Published: (2026)

Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
by: Lin, Guan-Ting, et al.
Published: (2024)

End-to-End DOA-Guided Speech Extraction in Noisy Multi-Talker Scenarios
by: Jing, Kangqi, et al.
Published: (2025)

Adapting Diarization-Conditioned Whisper for End-to-End Multi-Talker Speech Recognition
by: Kocour, Martin, et al.
Published: (2025)

SynthVC: Leveraging Synthetic Data for End-to-End Low Latency Streaming Voice Conversion
by: Guo, Zhao, et al.
Published: (2025)

Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
by: Yusuyin, Saierdaer, et al.
Published: (2024)

Data Augmentation for End-to-end Code-switching Speech Recognition
by: Du, Chenpeng, et al.
Published: (2020)

Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview
by: Liu, Heyang, et al.
Published: (2024)

CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder
by: Cui, Jianwei, et al.
Published: (2024)

An Ultra-Low Latency, End-to-End Streaming Speech Synthesis Architecture via Block-Wise Generation and Depth-Wise Codec Decoding
by: Su, Tianhui, et al.
Published: (2026)

Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios
by: Subramanian, Aswin Shanmugam, et al.
Published: (2025)

End-to-End Transformer-based Automatic Speech Recognition for Northern Kurdish: A Pioneering Approach
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)

End-to-End Integration of Speech Emotion Recognition with Voice Activity Detection using Self-Supervised Learning Features
by: Yamashita, Natsuo, et al.
Published: (2024)

End-to-End Target Speaker Speech Recognition Using Context-Aware Attention Mechanisms for Challenging Enrollment Scenario
by: Ghane, Mohsen, et al.
Published: (2025)

Continual Learning for Monolingual End-to-End Automatic Speech Recognition
by: Eeckt, Steven Vander, et al.
Published: (2021)

Code-Switching in End-to-End Automatic Speech Recognition: A Systematic Literature Review
by: Agro, Maha Tufail, et al.
Published: (2025)

Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models
by: Kuzmin, Nikita, et al.
Published: (2026)

Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio
by: He, Xinlu, et al.
Published: (2025)

An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning
by: Guimarães, Heitor R., et al.
Published: (2024)

Anatomy of the Modality Gap: Dissecting the Internal States of End-to-End Speech LLMs
by: Hsu, Ming-Hao, et al.
Published: (2026)

Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility Assessment
by: Farhadipour, Aref, et al.
Published: (2023)

Energy-Based Models with Applications to Speech and Language Processing
by: Ou, Zhijian
Published: (2024)

DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
by: Ning, Ziqian, et al.
Published: (2025)

Chunkwise Aligners for Streaming Speech Recognition
by: Teo, Wen Shen, et al.
Published: (2026)

On Improving Error Resilience of Neural End-to-End Speech Coders
by: Gupta, Kishan, et al.
Published: (2024)

An End-to-End Speech Summarization Using Large Language Model
by: Shang, Hengchao, et al.
Published: (2024)

Speech-to-See: End-to-End Speech-Driven Open-Set Object Detection
by: Lu, Wenhuan, et al.
Published: (2025)

The RoyalFlush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge
by: Tian, Jingguang, et al.
Published: (2024)

SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition
by: Hirano, Yuta, et al.
Published: (2025)