Saved in:
| Main Authors: | De Silva, Dashanka, Cai, Siqi, Pahuja, Saurav, Schultz, Tanja, Li, Haizhou |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.02489 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Improved Feature Extraction Network for Neuro-Oriented Target Speaker Extraction
by: Fan, Cunhang, et al.
Published: (2025)
by: Fan, Cunhang, et al.
Published: (2025)
Low-latency auditory spatial attention detection based on spectro-spatial features from EEG
by: Cai, Siqi, et al.
Published: (2021)
by: Cai, Siqi, et al.
Published: (2021)
ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification
by: Ma, Yi, et al.
Published: (2025)
by: Ma, Yi, et al.
Published: (2025)
Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction
by: Pan, Zexu, et al.
Published: (2025)
by: Pan, Zexu, et al.
Published: (2025)
Neuro-MSBG: An End-to-End Neural Model for Hearing Loss Simulation
by: Yuan, Hui-Guan, et al.
Published: (2025)
by: Yuan, Hui-Guan, et al.
Published: (2025)
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
by: Tao, Ruijie, et al.
Published: (2024)
by: Tao, Ruijie, et al.
Published: (2024)
NeuroAMP: A Novel End-to-end General Purpose Deep Neural Amplifier for Personalized Hearing Aids
by: Ahmed, Shafique, et al.
Published: (2025)
by: Ahmed, Shafique, et al.
Published: (2025)
Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)
by: Zhang, Ke, et al.
Published: (2024)
Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments
by: Xu, Shitong, et al.
Published: (2025)
by: Xu, Shitong, et al.
Published: (2025)
NeuroVoz: a Castillian Spanish corpus of parkinsonian speech
by: Mendes-Laureano, Janaína, et al.
Published: (2024)
by: Mendes-Laureano, Janaína, et al.
Published: (2024)
USED: Universal Speaker Extraction and Diarization
by: Ao, Junyi, et al.
Published: (2023)
by: Ao, Junyi, et al.
Published: (2023)
Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm
by: Li, Zhaoyang, et al.
Published: (2025)
by: Li, Zhaoyang, et al.
Published: (2025)
Speaker Embeddings to Improve Tracking of Intermittent and Moving Speakers
by: Iatariene, Taous, et al.
Published: (2025)
by: Iatariene, Taous, et al.
Published: (2025)
Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation
by: Chang, Yi, et al.
Published: (2024)
by: Chang, Yi, et al.
Published: (2024)
On the effectiveness of enrollment speech augmentation for Target Speaker Extraction
by: Li, Junjie, et al.
Published: (2024)
by: Li, Junjie, et al.
Published: (2024)
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)
by: Kang, Jiawen, et al.
Published: (2024)
Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification
by: Liu, Bei, et al.
Published: (2024)
by: Liu, Bei, et al.
Published: (2024)
Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification
by: Zhang, Fengrun, et al.
Published: (2024)
by: Zhang, Fengrun, et al.
Published: (2024)
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
by: Kim, Ji-Hoon, et al.
Published: (2024)
by: Kim, Ji-Hoon, et al.
Published: (2024)
Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)
Sync-TVA: A Graph-Attention Framework for Multimodal Emotion Recognition with Cross-Modal Fusion
by: Deng, Zeyu, et al.
Published: (2025)
by: Deng, Zeyu, et al.
Published: (2025)
Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition
by: Tan, Chao, et al.
Published: (2024)
by: Tan, Chao, et al.
Published: (2024)
Explainable Attribute-Based Speaker Verification
by: Wu, Xiaoliang, et al.
Published: (2024)
by: Wu, Xiaoliang, et al.
Published: (2024)
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
by: Cho, Deok-Hyeon, et al.
Published: (2025)
by: Cho, Deok-Hyeon, et al.
Published: (2025)
Affect Decoding in Phonated and Silent Speech Production from Surface EMG
by: Pistrosch, Simon, et al.
Published: (2026)
by: Pistrosch, Simon, et al.
Published: (2026)
From Modular to End-to-End Speaker Diarization
by: Landini, Federico
Published: (2024)
by: Landini, Federico
Published: (2024)
Certification of Speaker Recognition Models to Additive Perturbations
by: Korzh, Dmitrii, et al.
Published: (2024)
by: Korzh, Dmitrii, et al.
Published: (2024)
WhisQ: Cross-Modal Representation Learning for Text-to-Music MOS Prediction
by: Emon, Jakaria Islam, et al.
Published: (2025)
by: Emon, Jakaria Islam, et al.
Published: (2025)
MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing
by: Huang, Yu-Fen, et al.
Published: (2024)
by: Huang, Yu-Fen, et al.
Published: (2024)
ED-sKWS: Early-Decision Spiking Neural Networks for Rapid,and Energy-Efficient Keyword Spotting
by: Song, Zeyang, et al.
Published: (2024)
by: Song, Zeyang, et al.
Published: (2024)
CrossMuSim: A Cross-Modal Framework for Music Similarity Retrieval with LLM-Powered Text Description Sourcing and Mining
by: Tsoi, Tristan, et al.
Published: (2025)
by: Tsoi, Tristan, et al.
Published: (2025)
SDBench: A Comprehensive Benchmark Suite for Speaker Diarization
by: Pacheco, Eduardo, et al.
Published: (2025)
by: Pacheco, Eduardo, et al.
Published: (2025)
The VoxCeleb Speaker Recognition Challenge: A Retrospective
by: Huh, Jaesung, et al.
Published: (2024)
by: Huh, Jaesung, et al.
Published: (2024)
Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
by: Liu, Rui, et al.
Published: (2025)
by: Liu, Rui, et al.
Published: (2025)
End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization
by: Singh, Prachi, et al.
Published: (2024)
by: Singh, Prachi, et al.
Published: (2024)
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
by: Guo, Yiwei, et al.
Published: (2024)
by: Guo, Yiwei, et al.
Published: (2024)
Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation
by: Sheng, Zhengyan, et al.
Published: (2025)
by: Sheng, Zhengyan, et al.
Published: (2025)
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
by: Melechovsky, Jan, et al.
Published: (2024)
by: Melechovsky, Jan, et al.
Published: (2024)
Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding
by: Wang, Rui, et al.
Published: (2024)
by: Wang, Rui, et al.
Published: (2024)
Evaluating Speaker Identity Coding in Self-supervised Models and Humans
by: Elbanna, Gasser
Published: (2024)
by: Elbanna, Gasser
Published: (2024)
Similar Items
-
Improved Feature Extraction Network for Neuro-Oriented Target Speaker Extraction
by: Fan, Cunhang, et al.
Published: (2025) -
Low-latency auditory spatial attention detection based on spectro-spatial features from EEG
by: Cai, Siqi, et al.
Published: (2021) -
ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification
by: Ma, Yi, et al.
Published: (2025) -
Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction
by: Pan, Zexu, et al.
Published: (2025) -
Neuro-MSBG: An End-to-End Neural Model for Hearing Loss Simulation
by: Yuan, Hui-Guan, et al.
Published: (2025)