Saved in:
| Main Authors: | Su, Rongfeng, Du, Mengjie, Liu, Xiaokang, Wang, Lan, Yan, Nan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.15659 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
An Audio-textual Diffusion Model For Converting Speech Signals Into Ultrasound Tongue Imaging Data
by: Yang, Yudong, et al.
Published: (2024)
by: Yang, Yudong, et al.
Published: (2024)
Automatic Assessment of Dysarthria Using Audio-visual Vowel Graph Attention Network
by: Liu, Xiaokang, et al.
Published: (2024)
by: Liu, Xiaokang, et al.
Published: (2024)
An End-To-End Stuttering Detection Method Based On Conformer And BILSTM
by: Liu, Xiaokang, et al.
Published: (2024)
by: Liu, Xiaokang, et al.
Published: (2024)
Speaker Contrastive Learning for Source Speaker Tracing
by: Wang, Qing, et al.
Published: (2024)
by: Wang, Qing, et al.
Published: (2024)
Explainable speech emotion recognition through attentive pooling: insights from attention-based temporal localization
by: Leygue, Tahitoa, et al.
Published: (2025)
by: Leygue, Tahitoa, et al.
Published: (2025)
Graph-based multi-Feature fusion method for speech emotion recognition
by: Liu, Xueyu, et al.
Published: (2024)
by: Liu, Xueyu, et al.
Published: (2024)
Rhythm Features for Speaker Identification
by: Mehlman, Nick, et al.
Published: (2025)
by: Mehlman, Nick, et al.
Published: (2025)
Exploring Frequency-Domain Feature Modeling for HRTF Magnitude Upsampling
by: Chen, Xingyu, et al.
Published: (2026)
by: Chen, Xingyu, et al.
Published: (2026)
Investigating Acoustic-Textual Emotional Inconsistency Information for Automatic Depression Detection
by: Su, Rongfeng, et al.
Published: (2024)
by: Su, Rongfeng, et al.
Published: (2024)
MC-LExt: Multi-Channel Target Speaker Extraction with Onset-Prompted Speaker Conditioning Mechanism
by: Ling, Tongtao, et al.
Published: (2025)
by: Ling, Tongtao, et al.
Published: (2025)
SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
by: Li, Yue, et al.
Published: (2024)
by: Li, Yue, et al.
Published: (2024)
Robust Audio-Visual Target Speaker Extraction with Emotion-Aware Multiple Enrollment Fusion
by: Jin, Zhan, et al.
Published: (2025)
by: Jin, Zhan, et al.
Published: (2025)
Heterogeneous bimodal attention fusion for speech emotion recognition
by: Luo, Jiachen, et al.
Published: (2025)
by: Luo, Jiachen, et al.
Published: (2025)
Comparison of Frequency-Fusion Mechanisms for Binaural Direction-of-Arrival Estimation for Multiple Speakers
by: Fejgin, Daniel, et al.
Published: (2024)
by: Fejgin, Daniel, et al.
Published: (2024)
Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis
by: Wang, Xin, et al.
Published: (2024)
by: Wang, Xin, et al.
Published: (2024)
Improving Speaker Representations Using Contrastive Losses on Multi-scale Features
by: Dixit, Satvik, et al.
Published: (2024)
by: Dixit, Satvik, et al.
Published: (2024)
MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra
by: Lu, Ye-Xin, et al.
Published: (2023)
by: Lu, Ye-Xin, et al.
Published: (2023)
Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment
by: Shao, Yiwen, et al.
Published: (2024)
by: Shao, Yiwen, et al.
Published: (2024)
Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
by: Jiang, Yicong, et al.
Published: (2024)
by: Jiang, Yicong, et al.
Published: (2024)
Speech-Based Estimation of Schizophrenia Severity Using Feature Fusion
by: Premananth, Gowtham, et al.
Published: (2024)
by: Premananth, Gowtham, et al.
Published: (2024)
On the Role of Spatial Features in Foundation-Model-Based Speaker Diarization
by: Deegen, Marc, et al.
Published: (2026)
by: Deegen, Marc, et al.
Published: (2026)
Emotion Recognition in Multi-Speaker Conversations through Speaker Identification, Knowledge Distillation, and Hierarchical Fusion
by: Li, Xiao, et al.
Published: (2025)
by: Li, Xiao, et al.
Published: (2025)
SpeakerRPL v2: Robust Open-set Speaker Identification through Enhanced Few-shot Foundation Tuning and Model Fusion
by: Chen, Zhiyong, et al.
Published: (2026)
by: Chen, Zhiyong, et al.
Published: (2026)
Investigating the Potential of Multi-Stage Score Fusion in Spoofing-Aware Speaker Verification
by: Kurnaz, Oguzhan, et al.
Published: (2025)
by: Kurnaz, Oguzhan, et al.
Published: (2025)
MGFF-TDNN: A Multi-Granularity Feature Fusion TDNN Model with Depth-Wise Separable Module for Speaker Verification
by: Li, Ya, et al.
Published: (2025)
by: Li, Ya, et al.
Published: (2025)
Multi-Speaker DOA Estimation in Binaural Hearing Aids using Deep Learning and Speaker Count Fusion
by: Jazaeri, Farnaz, et al.
Published: (2025)
by: Jazaeri, Farnaz, et al.
Published: (2025)
UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition
by: Gan, Chong-Xin, et al.
Published: (2026)
by: Gan, Chong-Xin, et al.
Published: (2026)
Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement
by: Lu, Ye-Xin, et al.
Published: (2023)
by: Lu, Ye-Xin, et al.
Published: (2023)
Effective Modeling of Critical Contextual Information for TDNN-based Speaker Verification
by: Weng, Shilong, et al.
Published: (2025)
by: Weng, Shilong, et al.
Published: (2025)
Neural Codec-based Adversarial Sample Detection for Speaker Verification
by: Chen, Xuanjun, et al.
Published: (2024)
by: Chen, Xuanjun, et al.
Published: (2024)
A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification
by: Zhang, You, et al.
Published: (2022)
by: Zhang, You, et al.
Published: (2022)
Neighborhood Attention Transformer with Progressive Channel Fusion for Speaker Verification
by: Li, Nian, et al.
Published: (2024)
by: Li, Nian, et al.
Published: (2024)
Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
by: Li, Guinan, et al.
Published: (2024)
by: Li, Guinan, et al.
Published: (2024)
Phase Aware Ear-Conditioned Learning for Multi-Channel Binaural Speaker Separation
by: Jeremiah, Ruben Johnson Robert, et al.
Published: (2025)
by: Jeremiah, Ruben Johnson Robert, et al.
Published: (2025)
Vclip: Face-based Speaker Generation by Face-voice Association Learning
by: Shi, Yao, et al.
Published: (2026)
by: Shi, Yao, et al.
Published: (2026)
Phase-Retrieval-Based Physics-Informed Neural Networks For Acoustic Magnitude Field Reconstruction
by: Schrader, Karl, et al.
Published: (2026)
by: Schrader, Karl, et al.
Published: (2026)
Spatially Aware Self-Supervised Models for Multi-Channel Neural Speaker Diarization
by: Han, Jiangyu, et al.
Published: (2025)
by: Han, Jiangyu, et al.
Published: (2025)
Token-based Attractors and Cross-attention in Spoof Diarization
by: Koo, Kyo-Won, et al.
Published: (2025)
by: Koo, Kyo-Won, et al.
Published: (2025)
IDMap: A Pseudo-Speaker Generator Framework Based on Speaker Identity Index to Vector Mapping
by: Liu, Zeyan, et al.
Published: (2025)
by: Liu, Zeyan, et al.
Published: (2025)
Study on Inter and Intra Speaker Variability in Speaker Recognition
by: Okhotnikov, Anton, et al.
Published: (2024)
by: Okhotnikov, Anton, et al.
Published: (2024)
Similar Items
-
An Audio-textual Diffusion Model For Converting Speech Signals Into Ultrasound Tongue Imaging Data
by: Yang, Yudong, et al.
Published: (2024) -
Automatic Assessment of Dysarthria Using Audio-visual Vowel Graph Attention Network
by: Liu, Xiaokang, et al.
Published: (2024) -
An End-To-End Stuttering Detection Method Based On Conformer And BILSTM
by: Liu, Xiaokang, et al.
Published: (2024) -
Speaker Contrastive Learning for Source Speaker Tracing
by: Wang, Qing, et al.
Published: (2024) -
Explainable speech emotion recognition through attentive pooling: insights from attention-based temporal localization
by: Leygue, Tahitoa, et al.
Published: (2025)