Saved in:
| Main Authors: | Huo, Mingyue, Zhang, Yuheng, Tang, Yan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.07195 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TagSpeech: End-to-End Multi-Speaker ASR and Diarization with Fine-Grained Temporal Grounding
by: Huo, Mingyue, et al.
Published: (2026)
by: Huo, Mingyue, et al.
Published: (2026)
Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding
by: Huo, Mingyue, et al.
Published: (2025)
by: Huo, Mingyue, et al.
Published: (2025)
Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
by: Lin, Guan-Ting, et al.
Published: (2024)
by: Lin, Guan-Ting, et al.
Published: (2024)
Visual-Aware Speech Recognition for Noisy Scenarios
by: Balaji, Lakshmipathi, et al.
Published: (2025)
by: Balaji, Lakshmipathi, et al.
Published: (2025)
Noisy Disentanglement with Tri-stage Training for Noise-Robust Speech Recognition
by: Chen, Shuangyuan, et al.
Published: (2025)
by: Chen, Shuangyuan, et al.
Published: (2025)
Beyond Speaker Identity: Text Guided Target Speech Extraction
by: Huo, Mingyue, et al.
Published: (2025)
by: Huo, Mingyue, et al.
Published: (2025)
Speech Emotion Recognition Via CNN-Transformer and Multidimensional Attention Mechanism
by: Tang, Xiaoyu, et al.
Published: (2024)
by: Tang, Xiaoyu, et al.
Published: (2024)
End-to-End DOA-Guided Speech Extraction in Noisy Multi-Talker Scenarios
by: Jing, Kangqi, et al.
Published: (2025)
by: Jing, Kangqi, et al.
Published: (2025)
A Semi-spontaneous Dutch Speech Dataset for Speech Enhancement and Speech Recognition
by: de Groot, Dimme, et al.
Published: (2026)
by: de Groot, Dimme, et al.
Published: (2026)
Aligning Speech to Languages to Enhance Code-switching Speech Recognition
by: Liu, Hexin, et al.
Published: (2024)
by: Liu, Hexin, et al.
Published: (2024)
AS-ASR: A Lightweight Framework for Aphasia-Specific Automatic Speech Recognition
by: Bao, Chen, et al.
Published: (2025)
by: Bao, Chen, et al.
Published: (2025)
Coverage-Guaranteed Speech Emotion Recognition via Calibrated Uncertainty-Adaptive Prediction Sets
by: Jia, Zijun, et al.
Published: (2025)
by: Jia, Zijun, et al.
Published: (2025)
CEC: A Noisy Label Detection Method for Speaker Recognition
by: Shen, Yao, et al.
Published: (2024)
by: Shen, Yao, et al.
Published: (2024)
Multilingual Speech Recognition Using Discrete Tokens with a Two-step Training Strategy
by: Li, Zehan, et al.
Published: (2025)
by: Li, Zehan, et al.
Published: (2025)
DiTReducio: A Training-Free Acceleration for DiT-Based TTS via Progressive Calibration
by: Huo, Yanru, et al.
Published: (2025)
by: Huo, Yanru, et al.
Published: (2025)
Fairness of Automatic Speech Recognition in Cleft Lip and Palate Speech
by: Bhattacharjee, Susmita, et al.
Published: (2025)
by: Bhattacharjee, Susmita, et al.
Published: (2025)
Diarization-Aware Multi-Speaker Automatic Speech Recognition via Large Language Models
by: Lin, Yuke, et al.
Published: (2025)
by: Lin, Yuke, et al.
Published: (2025)
Adapting Speech Foundation Models for Unified Multimodal Speech Recognition with Large Language Models
by: Zhang, Jing-Xuan, et al.
Published: (2025)
by: Zhang, Jing-Xuan, et al.
Published: (2025)
Multi-Scale Temporal Transformer For Speech Emotion Recognition
by: Li, Zhipeng, et al.
Published: (2024)
by: Li, Zhipeng, et al.
Published: (2024)
Chunkwise Aligners for Streaming Speech Recognition
by: Teo, Wen Shen, et al.
Published: (2026)
by: Teo, Wen Shen, et al.
Published: (2026)
Speech-dependent Data Augmentation for Own Voice Reconstruction with Hearable Microphones in Noisy Environments
by: Ohlenbusch, Mattes, et al.
Published: (2024)
by: Ohlenbusch, Mattes, et al.
Published: (2024)
Exploiting Noise Inseparability for Weakly-Supervised Discriminative Speech Denoising Using Noisy Targets
by: Maciejewski, Matthew, et al.
Published: (2026)
by: Maciejewski, Matthew, et al.
Published: (2026)
Speaker Attributed Automatic Speech Recognition Using Speech Aware LLMS
by: Aronowitz, Hagai, et al.
Published: (2026)
by: Aronowitz, Hagai, et al.
Published: (2026)
Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition
by: Ravenscroft, William, et al.
Published: (2024)
by: Ravenscroft, William, et al.
Published: (2024)
In-Materia Speech Recognition
by: Zolfagharinejad, Mohamadreza, et al.
Published: (2024)
by: Zolfagharinejad, Mohamadreza, et al.
Published: (2024)
Transfer Learning-Based Deep Residual Learning for Speech Recognition in Clean and Noisy Environments
by: Djeffal, Noussaiba, et al.
Published: (2025)
by: Djeffal, Noussaiba, et al.
Published: (2025)
Exploring Cross-Utterance Speech Contexts for Conformer-Transducer Speech Recognition Systems
by: Cui, Mingyu, et al.
Published: (2025)
by: Cui, Mingyu, et al.
Published: (2025)
Group Relative Policy Optimization for Speech Recognition
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2025)
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2025)
Code-switching Speech Recognition Under the Lens: Model- and Data-Centric Perspectives
by: Liu, Hexin, et al.
Published: (2025)
by: Liu, Hexin, et al.
Published: (2025)
Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty
by: Xue, Hongfei, et al.
Published: (2025)
by: Xue, Hongfei, et al.
Published: (2025)
Reasoning Beyond Majority Vote: An Explainable SpeechLM Framework for Speech Emotion Recognition
by: Su, Bo-Hao, et al.
Published: (2025)
by: Su, Bo-Hao, et al.
Published: (2025)
Interpreting the Role of Visemes in Audio-Visual Speech Recognition
by: Papadopoulos, Aristeidis, et al.
Published: (2025)
by: Papadopoulos, Aristeidis, et al.
Published: (2025)
Unsupervised Online Continual Learning for Automatic Speech Recognition
by: Eeckt, Steven Vander, et al.
Published: (2024)
by: Eeckt, Steven Vander, et al.
Published: (2024)
Using Songs to Improve Kazakh Automatic Speech Recognition
by: Yeshpanov, Rustem
Published: (2026)
by: Yeshpanov, Rustem
Published: (2026)
Leveraging Content and Acoustic Representations for Speech Emotion Recognition
by: Dutta, Soumya, et al.
Published: (2024)
by: Dutta, Soumya, et al.
Published: (2024)
PartialEdit: Identifying Partial Deepfakes in the Era of Neural Speech Editing
by: Zhang, You, et al.
Published: (2025)
by: Zhang, You, et al.
Published: (2025)
Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models
by: Frieske, Rita, et al.
Published: (2024)
by: Frieske, Rita, et al.
Published: (2024)
HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Too Good to Be True: A Study on Modern Automatic Speech Recognition for the Evaluation of Speech Enhancement
by: de Oliveira, Danilo, et al.
Published: (2026)
by: de Oliveira, Danilo, et al.
Published: (2026)
Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition
by: Wang, Kuan-Chen, et al.
Published: (2024)
by: Wang, Kuan-Chen, et al.
Published: (2024)
Similar Items
-
TagSpeech: End-to-End Multi-Speaker ASR and Diarization with Fine-Grained Temporal Grounding
by: Huo, Mingyue, et al.
Published: (2026) -
Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding
by: Huo, Mingyue, et al.
Published: (2025) -
Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
by: Lin, Guan-Ting, et al.
Published: (2024) -
Visual-Aware Speech Recognition for Noisy Scenarios
by: Balaji, Lakshmipathi, et al.
Published: (2025) -
Noisy Disentanglement with Tri-stage Training for Noise-Robust Speech Recognition
by: Chen, Shuangyuan, et al.
Published: (2025)