:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	De Silva, Dashanka, Cai, Siqi, Pahuja, Saurav, Schultz, Tanja, Li, Haizhou
Format:	Preprint
Published:	2024
Subjects:	Sound Artificial Intelligence Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2409.02489
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Improved Feature Extraction Network for Neuro-Oriented Target Speaker Extraction
by: Fan, Cunhang, et al.
Published: (2025)

Low-latency auditory spatial attention detection based on spectro-spatial features from EEG
by: Cai, Siqi, et al.
Published: (2021)

ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification
by: Ma, Yi, et al.
Published: (2025)

Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction
by: Pan, Zexu, et al.
Published: (2025)

Neuro-MSBG: An End-to-End Neural Model for Hearing Loss Simulation
by: Yuan, Hui-Guan, et al.
Published: (2025)

Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
by: Tao, Ruijie, et al.
Published: (2024)

NeuroAMP: A Novel End-to-end General Purpose Deep Neural Amplifier for Personalized Hearing Aids
by: Ahmed, Shafique, et al.
Published: (2025)

Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)

Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments
by: Xu, Shitong, et al.
Published: (2025)

NeuroVoz: a Castillian Spanish corpus of parkinsonian speech
by: Mendes-Laureano, Janaína, et al.
Published: (2024)

USED: Universal Speaker Extraction and Diarization
by: Ao, Junyi, et al.
Published: (2023)

Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm
by: Li, Zhaoyang, et al.
Published: (2025)

Speaker Embeddings to Improve Tracking of Intermittent and Moving Speakers
by: Iatariene, Taous, et al.
Published: (2025)

Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation
by: Chang, Yi, et al.
Published: (2024)

On the effectiveness of enrollment speech augmentation for Target Speaker Extraction
by: Li, Junjie, et al.
Published: (2024)

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)

Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification
by: Liu, Bei, et al.
Published: (2024)

Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification
by: Zhang, Fengrun, et al.
Published: (2024)

CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
by: Kim, Ji-Hoon, et al.
Published: (2024)

Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)

Sync-TVA: A Graph-Attention Framework for Multimodal Emotion Recognition with Cross-Modal Fusion
by: Deng, Zeyu, et al.
Published: (2025)

Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition
by: Tan, Chao, et al.
Published: (2024)

Explainable Attribute-Based Speaker Verification
by: Wu, Xiaoliang, et al.
Published: (2024)

DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
by: Cho, Deok-Hyeon, et al.
Published: (2025)

Affect Decoding in Phonated and Silent Speech Production from Surface EMG
by: Pistrosch, Simon, et al.
Published: (2026)

From Modular to End-to-End Speaker Diarization
by: Landini, Federico
Published: (2024)

Certification of Speaker Recognition Models to Additive Perturbations
by: Korzh, Dmitrii, et al.
Published: (2024)

WhisQ: Cross-Modal Representation Learning for Text-to-Music MOS Prediction
by: Emon, Jakaria Islam, et al.
Published: (2025)

MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing
by: Huang, Yu-Fen, et al.
Published: (2024)

ED-sKWS: Early-Decision Spiking Neural Networks for Rapid,and Energy-Efficient Keyword Spotting
by: Song, Zeyang, et al.
Published: (2024)

CrossMuSim: A Cross-Modal Framework for Music Similarity Retrieval with LLM-Powered Text Description Sourcing and Mining
by: Tsoi, Tristan, et al.
Published: (2025)

SDBench: A Comprehensive Benchmark Suite for Speaker Diarization
by: Pacheco, Eduardo, et al.
Published: (2025)

The VoxCeleb Speaker Recognition Challenge: A Retrospective
by: Huh, Jaesung, et al.
Published: (2024)

Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
by: Liu, Rui, et al.
Published: (2025)

End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization
by: Singh, Prachi, et al.
Published: (2024)

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
by: Guo, Yiwei, et al.
Published: (2024)

Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation
by: Sheng, Zhengyan, et al.
Published: (2025)

DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
by: Melechovsky, Jan, et al.
Published: (2024)

Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding
by: Wang, Rui, et al.
Published: (2024)

Evaluating Speaker Identity Coding in Self-supervised Models and Humans
by: Elbanna, Gasser
Published: (2024)