:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Yiming, Wang, Guangyong, Guan, Haixin, Long, Yanhua
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing Sound
Online Access:	https://arxiv.org/abs/2602.15519
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Unified Architecture and Unsupervised Speech Disentanglement for Speaker Embedding-Free Enrollment in Personalized Speech Enhancement
by: Huang, Ziling, et al.
Published: (2025)

SEF-PNet: Speaker Encoder-Free Personalized Speech Enhancement with Local and Global Contexts Aggregation
by: Huang, Ziling, et al.
Published: (2025)

Noisy Disentanglement with Tri-stage Training for Noise-Robust Speech Recognition
by: Chen, Shuangyuan, et al.
Published: (2025)

Study of Lightweight Transformer Architectures for Single-Channel Speech Enhancement
by: Zhao, Haixin, et al.
Published: (2025)

Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
by: Lin, Guan-Ting, et al.
Published: (2024)

Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments
by: Xu, Shitong, et al.
Published: (2025)

Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex Recordings
by: Zhao, He, et al.
Published: (2024)

DENSE: Dynamic Embedding Causal Target Speech Extraction
by: Wang, Yiwen, et al.
Published: (2024)

Distance Based Single-Channel Target Speech Extraction
by: Shi, Runwu, et al.
Published: (2024)

Beyond Speaker Identity: Text Guided Target Speech Extraction
by: Huo, Mingyue, et al.
Published: (2025)

Probing Self-supervised Learning Models with Target Speech Extraction
by: Peng, Junyi, et al.
Published: (2024)

DDTSE: Discriminative Diffusion Model for Target Speech Extraction
by: Zhang, Leying, et al.
Published: (2023)

Target Speech Extraction with Pre-trained Self-supervised Learning Models
by: Peng, Junyi, et al.
Published: (2024)

Single-Channel Target Speech Extraction Utilizing Distance and Room Clues
by: Shi, Runwu, et al.
Published: (2025)

Inter-Speaker Relative Cues for Text-Guided Target Speech Extraction
by: Dai, Wang, et al.
Published: (2025)

Enhancing Intelligibility for Generative Target Speech Extraction via Joint Optimization with Target Speaker ASR
by: Ma, Hao, et al.
Published: (2025)

A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments
by: Khondkar, Md Jahangir Alam, et al.
Published: (2025)

Comparative Evaluation of Acoustic Feature Extraction Tools for Clinical Speech Analysis
by: Choi, Anna Seo Gyeong, et al.
Published: (2025)

HRTF-guided Binaural Target Speaker Extraction with Real-World Validation
by: Ellinson, Yoav, et al.
Published: (2026)

Contextual Speech Extraction: Leveraging Textual History as an Implicit Cue for Target Speech Extraction
by: Kim, Minsu, et al.
Published: (2025)

Look Once to Hear: Target Speech Hearing with Noisy Examples
by: Veluri, Bandhav, et al.
Published: (2024)

Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy
by: Wu, Wenxuan, et al.
Published: (2024)

Unmixing the Crowd: Learning Mixture-to-Set Speaker Embeddings for Enrollment-Free Target Speech Extraction
by: Sidharth, FNU, et al.
Published: (2026)

End-to-End Target Speaker Speech Recognition Using Context-Aware Attention Mechanisms for Challenging Enrollment Scenario
by: Ghane, Mohsen, et al.
Published: (2025)

Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
by: Choi, Jeongsoo, et al.
Published: (2025)

Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR
by: Mei, Yuxiang, et al.
Published: (2026)

Neural Speech Extraction with Human Feedback
by: Itani, Malek, et al.
Published: (2025)

Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions
by: Mi, Jinyi, et al.
Published: (2024)

From Human Speech to Ocean Signals: Transferring Speech Large Models for Underwater Acoustic Target Recognition
by: Huang, Mengcheng, et al.
Published: (2026)

ICSD: An Open-source Dataset for Infant Cry and Snoring Detection
by: Liu, Qingyu, et al.
Published: (2024)

Comparison of Speech Tasks in Human Expert and Machine Detection of Parkinson's Disease
by: Plantinga, Peter, et al.
Published: (2025)

Two-stage Audio-Visual Target Speaker Extraction System for Real-Time Processing On Edge Device
by: Li, Zixuan, et al.
Published: (2025)

3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications
by: He, Shulin, et al.
Published: (2023)

DialoSpeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching
by: Xie, Hanke, et al.
Published: (2025)

Target Speaker Extraction with Curriculum Learning
by: Liu, Yun, et al.
Published: (2024)

Improved Feature Extraction Network for Neuro-Oriented Target Speaker Extraction
by: Fan, Cunhang, et al.
Published: (2025)

DualStream Contextual Fusion Network: Efficient Target Speaker Extraction by Leveraging Mixture and Enrollment Interactions
by: Xue, Ke, et al.
Published: (2025)

SaD: A Scenario-Aware Discriminator for Speech Enhancement
by: Yuan, Xihao, et al.
Published: (2025)

SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios
by: Bukhari, Hazim, et al.
Published: (2024)

Harmonic Detection from Noisy Speech with Auditory Frame Gain for Intelligibility Enhancement
by: Queiroz, A., et al.
Published: (2024)