Saved in:
| Main Authors: | Kong, Xiangzhu, Ning, Tianqi, Huang, Hao, Ou, Zhijian |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.09807 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Lightweight and Robust Multi-Channel End-to-End Speech Recognition with Spherical Harmonic Transform
by: Kong, Xiangzhu, et al.
Published: (2025)
by: Kong, Xiangzhu, et al.
Published: (2025)
CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
by: Zhao, Wenbo, et al.
Published: (2024)
by: Zhao, Wenbo, et al.
Published: (2024)
Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement
by: Dai, Wang, et al.
Published: (2024)
by: Dai, Wang, et al.
Published: (2024)
End-to-End Speech Recognition with Pre-trained Masked Language Model
by: Higuchi, Yosuke, et al.
Published: (2024)
by: Higuchi, Yosuke, et al.
Published: (2024)
Decoder-only Architecture for Streaming End-to-end Speech Recognition
by: Tsunoo, Emiru, et al.
Published: (2024)
by: Tsunoo, Emiru, et al.
Published: (2024)
Using Adapters to Overcome Catastrophic Forgetting in End-to-End Automatic Speech Recognition
by: Eeckt, Steven Vander, et al.
Published: (2022)
by: Eeckt, Steven Vander, et al.
Published: (2022)
Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition
by: Chen, Jinming, et al.
Published: (2024)
by: Chen, Jinming, et al.
Published: (2024)
Disentangled-Transformer: An Explainable End-to-End Automatic Speech Recognition Model with Speech Content-Context Separation
by: Wang, Pu, et al.
Published: (2024)
by: Wang, Pu, et al.
Published: (2024)
Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization
by: You, Jian, et al.
Published: (2025)
by: You, Jian, et al.
Published: (2025)
SoulX-Transcriber: A Robust End-to-End Framework for Multi-Speaker Speech Transcription
by: Dai, Yuhang, et al.
Published: (2026)
by: Dai, Yuhang, et al.
Published: (2026)
Breaking Walls: Pioneering Automatic Speech Recognition for Central Kurdish: End-to-End Transformer Paradigm
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)
IKFST: IOO and KOO Algorithms for Accelerated and Precise WFST-based End-to-End Automatic Speech Recognition
by: Zhuang, Zhuoran, et al.
Published: (2026)
by: Zhuang, Zhuoran, et al.
Published: (2026)
Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
by: Lin, Guan-Ting, et al.
Published: (2024)
by: Lin, Guan-Ting, et al.
Published: (2024)
End-to-End DOA-Guided Speech Extraction in Noisy Multi-Talker Scenarios
by: Jing, Kangqi, et al.
Published: (2025)
by: Jing, Kangqi, et al.
Published: (2025)
Adapting Diarization-Conditioned Whisper for End-to-End Multi-Talker Speech Recognition
by: Kocour, Martin, et al.
Published: (2025)
by: Kocour, Martin, et al.
Published: (2025)
SynthVC: Leveraging Synthetic Data for End-to-End Low Latency Streaming Voice Conversion
by: Guo, Zhao, et al.
Published: (2025)
by: Guo, Zhao, et al.
Published: (2025)
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
by: Yusuyin, Saierdaer, et al.
Published: (2024)
by: Yusuyin, Saierdaer, et al.
Published: (2024)
Data Augmentation for End-to-end Code-switching Speech Recognition
by: Du, Chenpeng, et al.
Published: (2020)
by: Du, Chenpeng, et al.
Published: (2020)
Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview
by: Liu, Heyang, et al.
Published: (2024)
by: Liu, Heyang, et al.
Published: (2024)
CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder
by: Cui, Jianwei, et al.
Published: (2024)
by: Cui, Jianwei, et al.
Published: (2024)
An Ultra-Low Latency, End-to-End Streaming Speech Synthesis Architecture via Block-Wise Generation and Depth-Wise Codec Decoding
by: Su, Tianhui, et al.
Published: (2026)
by: Su, Tianhui, et al.
Published: (2026)
Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios
by: Subramanian, Aswin Shanmugam, et al.
Published: (2025)
by: Subramanian, Aswin Shanmugam, et al.
Published: (2025)
End-to-End Transformer-based Automatic Speech Recognition for Northern Kurdish: A Pioneering Approach
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)
End-to-End Integration of Speech Emotion Recognition with Voice Activity Detection using Self-Supervised Learning Features
by: Yamashita, Natsuo, et al.
Published: (2024)
by: Yamashita, Natsuo, et al.
Published: (2024)
End-to-End Target Speaker Speech Recognition Using Context-Aware Attention Mechanisms for Challenging Enrollment Scenario
by: Ghane, Mohsen, et al.
Published: (2025)
by: Ghane, Mohsen, et al.
Published: (2025)
Continual Learning for Monolingual End-to-End Automatic Speech Recognition
by: Eeckt, Steven Vander, et al.
Published: (2021)
by: Eeckt, Steven Vander, et al.
Published: (2021)
Code-Switching in End-to-End Automatic Speech Recognition: A Systematic Literature Review
by: Agro, Maha Tufail, et al.
Published: (2025)
by: Agro, Maha Tufail, et al.
Published: (2025)
Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models
by: Kuzmin, Nikita, et al.
Published: (2026)
by: Kuzmin, Nikita, et al.
Published: (2026)
Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio
by: He, Xinlu, et al.
Published: (2025)
by: He, Xinlu, et al.
Published: (2025)
An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning
by: Guimarães, Heitor R., et al.
Published: (2024)
by: Guimarães, Heitor R., et al.
Published: (2024)
Anatomy of the Modality Gap: Dissecting the Internal States of End-to-End Speech LLMs
by: Hsu, Ming-Hao, et al.
Published: (2026)
by: Hsu, Ming-Hao, et al.
Published: (2026)
Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility Assessment
by: Farhadipour, Aref, et al.
Published: (2023)
by: Farhadipour, Aref, et al.
Published: (2023)
Energy-Based Models with Applications to Speech and Language Processing
by: Ou, Zhijian
Published: (2024)
by: Ou, Zhijian
Published: (2024)
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
by: Ning, Ziqian, et al.
Published: (2025)
by: Ning, Ziqian, et al.
Published: (2025)
Chunkwise Aligners for Streaming Speech Recognition
by: Teo, Wen Shen, et al.
Published: (2026)
by: Teo, Wen Shen, et al.
Published: (2026)
On Improving Error Resilience of Neural End-to-End Speech Coders
by: Gupta, Kishan, et al.
Published: (2024)
by: Gupta, Kishan, et al.
Published: (2024)
An End-to-End Speech Summarization Using Large Language Model
by: Shang, Hengchao, et al.
Published: (2024)
by: Shang, Hengchao, et al.
Published: (2024)
Speech-to-See: End-to-End Speech-Driven Open-Set Object Detection
by: Lu, Wenhuan, et al.
Published: (2025)
by: Lu, Wenhuan, et al.
Published: (2025)
The RoyalFlush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge
by: Tian, Jingguang, et al.
Published: (2024)
by: Tian, Jingguang, et al.
Published: (2024)
SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition
by: Hirano, Yuta, et al.
Published: (2025)
by: Hirano, Yuta, et al.
Published: (2025)
Similar Items
-
Lightweight and Robust Multi-Channel End-to-End Speech Recognition with Spherical Harmonic Transform
by: Kong, Xiangzhu, et al.
Published: (2025) -
CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
by: Zhao, Wenbo, et al.
Published: (2024) -
Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement
by: Dai, Wang, et al.
Published: (2024) -
End-to-End Speech Recognition with Pre-trained Masked Language Model
by: Higuchi, Yosuke, et al.
Published: (2024) -
Decoder-only Architecture for Streaming End-to-end Speech Recognition
by: Tsunoo, Emiru, et al.
Published: (2024)