:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huo, Mingyue, Zhang, Yuheng, Tang, Yan
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2509.07195
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TagSpeech: End-to-End Multi-Speaker ASR and Diarization with Fine-Grained Temporal Grounding
by: Huo, Mingyue, et al.
Published: (2026)

Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding
by: Huo, Mingyue, et al.
Published: (2025)

Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
by: Lin, Guan-Ting, et al.
Published: (2024)

Visual-Aware Speech Recognition for Noisy Scenarios
by: Balaji, Lakshmipathi, et al.
Published: (2025)

Noisy Disentanglement with Tri-stage Training for Noise-Robust Speech Recognition
by: Chen, Shuangyuan, et al.
Published: (2025)

Beyond Speaker Identity: Text Guided Target Speech Extraction
by: Huo, Mingyue, et al.
Published: (2025)

Speech Emotion Recognition Via CNN-Transformer and Multidimensional Attention Mechanism
by: Tang, Xiaoyu, et al.
Published: (2024)

End-to-End DOA-Guided Speech Extraction in Noisy Multi-Talker Scenarios
by: Jing, Kangqi, et al.
Published: (2025)

A Semi-spontaneous Dutch Speech Dataset for Speech Enhancement and Speech Recognition
by: de Groot, Dimme, et al.
Published: (2026)

Aligning Speech to Languages to Enhance Code-switching Speech Recognition
by: Liu, Hexin, et al.
Published: (2024)

AS-ASR: A Lightweight Framework for Aphasia-Specific Automatic Speech Recognition
by: Bao, Chen, et al.
Published: (2025)

Coverage-Guaranteed Speech Emotion Recognition via Calibrated Uncertainty-Adaptive Prediction Sets
by: Jia, Zijun, et al.
Published: (2025)

CEC: A Noisy Label Detection Method for Speaker Recognition
by: Shen, Yao, et al.
Published: (2024)

Multilingual Speech Recognition Using Discrete Tokens with a Two-step Training Strategy
by: Li, Zehan, et al.
Published: (2025)

DiTReducio: A Training-Free Acceleration for DiT-Based TTS via Progressive Calibration
by: Huo, Yanru, et al.
Published: (2025)

Fairness of Automatic Speech Recognition in Cleft Lip and Palate Speech
by: Bhattacharjee, Susmita, et al.
Published: (2025)

Diarization-Aware Multi-Speaker Automatic Speech Recognition via Large Language Models
by: Lin, Yuke, et al.
Published: (2025)

Adapting Speech Foundation Models for Unified Multimodal Speech Recognition with Large Language Models
by: Zhang, Jing-Xuan, et al.
Published: (2025)

Multi-Scale Temporal Transformer For Speech Emotion Recognition
by: Li, Zhipeng, et al.
Published: (2024)

Chunkwise Aligners for Streaming Speech Recognition
by: Teo, Wen Shen, et al.
Published: (2026)

Speech-dependent Data Augmentation for Own Voice Reconstruction with Hearable Microphones in Noisy Environments
by: Ohlenbusch, Mattes, et al.
Published: (2024)

Exploiting Noise Inseparability for Weakly-Supervised Discriminative Speech Denoising Using Noisy Targets
by: Maciejewski, Matthew, et al.
Published: (2026)

Speaker Attributed Automatic Speech Recognition Using Speech Aware LLMS
by: Aronowitz, Hagai, et al.
Published: (2026)

Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition
by: Ravenscroft, William, et al.
Published: (2024)

In-Materia Speech Recognition
by: Zolfagharinejad, Mohamadreza, et al.
Published: (2024)

Transfer Learning-Based Deep Residual Learning for Speech Recognition in Clean and Noisy Environments
by: Djeffal, Noussaiba, et al.
Published: (2025)

Exploring Cross-Utterance Speech Contexts for Conformer-Transducer Speech Recognition Systems
by: Cui, Mingyu, et al.
Published: (2025)

Group Relative Policy Optimization for Speech Recognition
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2025)

Code-switching Speech Recognition Under the Lens: Model- and Data-Centric Perspectives
by: Liu, Hexin, et al.
Published: (2025)

Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty
by: Xue, Hongfei, et al.
Published: (2025)

Reasoning Beyond Majority Vote: An Explainable SpeechLM Framework for Speech Emotion Recognition
by: Su, Bo-Hao, et al.
Published: (2025)

Interpreting the Role of Visemes in Audio-Visual Speech Recognition
by: Papadopoulos, Aristeidis, et al.
Published: (2025)

Unsupervised Online Continual Learning for Automatic Speech Recognition
by: Eeckt, Steven Vander, et al.
Published: (2024)

Using Songs to Improve Kazakh Automatic Speech Recognition
by: Yeshpanov, Rustem
Published: (2026)

Leveraging Content and Acoustic Representations for Speech Emotion Recognition
by: Dutta, Soumya, et al.
Published: (2024)

PartialEdit: Identifying Partial Deepfakes in the Era of Neural Speech Editing
by: Zhang, You, et al.
Published: (2025)

Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models
by: Frieske, Rita, et al.
Published: (2024)

HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2025)

Too Good to Be True: A Study on Modern Automatic Speech Recognition for the Evaluation of Speech Enhancement
by: de Oliveira, Danilo, et al.
Published: (2026)

Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition
by: Wang, Kuan-Chen, et al.
Published: (2024)