:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Su, Rongfeng, Du, Mengjie, Liu, Xiaokang, Wang, Lan, Yan, Nan
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2510.15659
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

An Audio-textual Diffusion Model For Converting Speech Signals Into Ultrasound Tongue Imaging Data
by: Yang, Yudong, et al.
Published: (2024)

Automatic Assessment of Dysarthria Using Audio-visual Vowel Graph Attention Network
by: Liu, Xiaokang, et al.
Published: (2024)

An End-To-End Stuttering Detection Method Based On Conformer And BILSTM
by: Liu, Xiaokang, et al.
Published: (2024)

Speaker Contrastive Learning for Source Speaker Tracing
by: Wang, Qing, et al.
Published: (2024)

Explainable speech emotion recognition through attentive pooling: insights from attention-based temporal localization
by: Leygue, Tahitoa, et al.
Published: (2025)

Graph-based multi-Feature fusion method for speech emotion recognition
by: Liu, Xueyu, et al.
Published: (2024)

Rhythm Features for Speaker Identification
by: Mehlman, Nick, et al.
Published: (2025)

Exploring Frequency-Domain Feature Modeling for HRTF Magnitude Upsampling
by: Chen, Xingyu, et al.
Published: (2026)

Investigating Acoustic-Textual Emotional Inconsistency Information for Automatic Depression Detection
by: Su, Rongfeng, et al.
Published: (2024)

MC-LExt: Multi-Channel Target Speaker Extraction with Onset-Prompted Speaker Conditioning Mechanism
by: Ling, Tongtao, et al.
Published: (2025)

SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
by: Li, Yue, et al.
Published: (2024)

Robust Audio-Visual Target Speaker Extraction with Emotion-Aware Multiple Enrollment Fusion
by: Jin, Zhan, et al.
Published: (2025)

Heterogeneous bimodal attention fusion for speech emotion recognition
by: Luo, Jiachen, et al.
Published: (2025)

Comparison of Frequency-Fusion Mechanisms for Binaural Direction-of-Arrival Estimation for Multiple Speakers
by: Fejgin, Daniel, et al.
Published: (2024)

Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis
by: Wang, Xin, et al.
Published: (2024)

Improving Speaker Representations Using Contrastive Losses on Multi-scale Features
by: Dixit, Satvik, et al.
Published: (2024)

MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra
by: Lu, Ye-Xin, et al.
Published: (2023)

Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment
by: Shao, Yiwen, et al.
Published: (2024)

Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
by: Jiang, Yicong, et al.
Published: (2024)

Speech-Based Estimation of Schizophrenia Severity Using Feature Fusion
by: Premananth, Gowtham, et al.
Published: (2024)

On the Role of Spatial Features in Foundation-Model-Based Speaker Diarization
by: Deegen, Marc, et al.
Published: (2026)

Emotion Recognition in Multi-Speaker Conversations through Speaker Identification, Knowledge Distillation, and Hierarchical Fusion
by: Li, Xiao, et al.
Published: (2025)

SpeakerRPL v2: Robust Open-set Speaker Identification through Enhanced Few-shot Foundation Tuning and Model Fusion
by: Chen, Zhiyong, et al.
Published: (2026)

Investigating the Potential of Multi-Stage Score Fusion in Spoofing-Aware Speaker Verification
by: Kurnaz, Oguzhan, et al.
Published: (2025)

MGFF-TDNN: A Multi-Granularity Feature Fusion TDNN Model with Depth-Wise Separable Module for Speaker Verification
by: Li, Ya, et al.
Published: (2025)

Multi-Speaker DOA Estimation in Binaural Hearing Aids using Deep Learning and Speaker Count Fusion
by: Jazaeri, Farnaz, et al.
Published: (2025)

UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition
by: Gan, Chong-Xin, et al.
Published: (2026)

Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement
by: Lu, Ye-Xin, et al.
Published: (2023)

Effective Modeling of Critical Contextual Information for TDNN-based Speaker Verification
by: Weng, Shilong, et al.
Published: (2025)

Neural Codec-based Adversarial Sample Detection for Speaker Verification
by: Chen, Xuanjun, et al.
Published: (2024)

A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification
by: Zhang, You, et al.
Published: (2022)

Neighborhood Attention Transformer with Progressive Channel Fusion for Speaker Verification
by: Li, Nian, et al.
Published: (2024)

Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
by: Li, Guinan, et al.
Published: (2024)

Phase Aware Ear-Conditioned Learning for Multi-Channel Binaural Speaker Separation
by: Jeremiah, Ruben Johnson Robert, et al.
Published: (2025)

Vclip: Face-based Speaker Generation by Face-voice Association Learning
by: Shi, Yao, et al.
Published: (2026)

Phase-Retrieval-Based Physics-Informed Neural Networks For Acoustic Magnitude Field Reconstruction
by: Schrader, Karl, et al.
Published: (2026)

Spatially Aware Self-Supervised Models for Multi-Channel Neural Speaker Diarization
by: Han, Jiangyu, et al.
Published: (2025)

Token-based Attractors and Cross-attention in Spoof Diarization
by: Koo, Kyo-Won, et al.
Published: (2025)

IDMap: A Pseudo-Speaker Generator Framework Based on Speaker Identity Index to Vector Mapping
by: Liu, Zeyan, et al.
Published: (2025)

Study on Inter and Intra Speaker Variability in Speaker Recognition
by: Okhotnikov, Anton, et al.
Published: (2024)