Saved in:
| Main Authors: | Ravizza, Gabriele, Villegas, Julián, Volk, Christer P., Stegenborg-Andersen, Tore, Pei, Yan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.08313 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Adaptive Deterministic Flow Matching for Target Speaker Extraction
by: Hsieh, Tsun-An, et al.
Published: (2025)
by: Hsieh, Tsun-An, et al.
Published: (2025)
Spatial-Filter-Bank-Based Neural Method for Multichannel Speech Enhancement
by: Zheng, Tianqin, et al.
Published: (2025)
by: Zheng, Tianqin, et al.
Published: (2025)
Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios
by: Fiorio, Luan Vinícius, et al.
Published: (2025)
by: Fiorio, Luan Vinícius, et al.
Published: (2025)
A Unified Neural Codec Language Model for Selective Editable Text to Speech Generation
by: Pei, Hanchen, et al.
Published: (2026)
by: Pei, Hanchen, et al.
Published: (2026)
Flexible Multi-Channel Target Speaker Extraction Using Geometry-Conditioned Spatially Selective Non-linear Filters
by: Li, Jiatong, et al.
Published: (2026)
by: Li, Jiatong, et al.
Published: (2026)
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
by: Tao, Ruijie, et al.
Published: (2024)
by: Tao, Ruijie, et al.
Published: (2024)
Binaural Selective Attention Model for Target Speaker Extraction
by: Meng, Hanyu, et al.
Published: (2024)
by: Meng, Hanyu, et al.
Published: (2024)
DITTO: Data-efficient and Fair Targeted Subset Selection for ASR Accent Adaptation
by: Kothawade, Suraj, et al.
Published: (2021)
by: Kothawade, Suraj, et al.
Published: (2021)
Cross-attention Inspired Selective State Space Models for Target Sound Extraction
by: Wu, Donghang, et al.
Published: (2024)
by: Wu, Donghang, et al.
Published: (2024)
GAN-Based Multi-Microphone Spatial Target Speaker Extraction
by: Shetu, Shrishti Saha, et al.
Published: (2025)
by: Shetu, Shrishti Saha, et al.
Published: (2025)
Adaptive Federated Fine-Tuning of Self-Supervised Speech Representations
by: Guo, Xin, et al.
Published: (2026)
by: Guo, Xin, et al.
Published: (2026)
An End-To-End Stuttering Detection Method Based On Conformer And BILSTM
by: Liu, Xiaokang, et al.
Published: (2024)
by: Liu, Xiaokang, et al.
Published: (2024)
Room Impulse Response Prediction with Neural Networks: From Energy Decay Curves to Perceptual Validation
by: Muhammad, Imran, et al.
Published: (2025)
by: Muhammad, Imran, et al.
Published: (2025)
Deep Learning-Based Prediction of Energy Decay Curves from Room Geometry and Material Properties
by: Muhammad, Imran, et al.
Published: (2025)
by: Muhammad, Imran, et al.
Published: (2025)
Training Strategies for Modality Dropout Resilient Multi-Modal Target Speaker Extraction
by: Korse, Srikanth, et al.
Published: (2025)
by: Korse, Srikanth, et al.
Published: (2025)
Curved Worlds, Clear Boundaries: Generalizing Speech Deepfake Detection using Hyperbolic and Spherical Geometry Spaces
by: Sheth, Farhan, et al.
Published: (2025)
by: Sheth, Farhan, et al.
Published: (2025)
Adaptive Learning via a Negative Selection Strategy for Few-Shot Bioacoustic Event Detection
by: Chen, Yaxiong, et al.
Published: (2024)
by: Chen, Yaxiong, et al.
Published: (2024)
Text-Queried Target Sound Event Localization
by: Zhao, Jinzheng, et al.
Published: (2024)
by: Zhao, Jinzheng, et al.
Published: (2024)
EvoTSE: Evolving Enrollment for Target Speaker Extraction
by: Liu, Zikai, et al.
Published: (2026)
by: Liu, Zikai, et al.
Published: (2026)
Plug-and-Steer: Decoupling Separation and Selection in Audio-Visual Target Speaker Extraction
by: Kwak, Doyeop, et al.
Published: (2026)
by: Kwak, Doyeop, et al.
Published: (2026)
TGIF: Talker Group-Informed Familiarization of Target Speaker Extraction
by: Hsieh, Tsun-An, et al.
Published: (2025)
by: Hsieh, Tsun-An, et al.
Published: (2025)
Detect, Attend and Extract: Keyword Guided Target Speaker Extraction
by: Li, Haoyu, et al.
Published: (2026)
by: Li, Haoyu, et al.
Published: (2026)
Multi-View Based Audio Visual Target Speaker Extraction
by: Yang, Peijun, et al.
Published: (2026)
by: Yang, Peijun, et al.
Published: (2026)
Adaptive high-precision sound source localization at low frequencies based on convolutional neural network
by: Ma, Wenbo, et al.
Published: (2024)
by: Ma, Wenbo, et al.
Published: (2024)
AS-Speech: Adaptive Style For Speech Synthesis
by: Li, Zhipeng, et al.
Published: (2024)
by: Li, Zhipeng, et al.
Published: (2024)
Low-resource keyword spotting using contrastively trained transformer acoustic word embeddings
by: Herreilers, Julian, et al.
Published: (2025)
by: Herreilers, Julian, et al.
Published: (2025)
Analysis and Extension of Noisy-target Training for Unsupervised Target Signal Enhancement
by: Fujimura, Takuya, et al.
Published: (2025)
by: Fujimura, Takuya, et al.
Published: (2025)
ARTT: Augmented Reverberant-Target Training for Unsupervised Monaural Speech Dereverberation
by: Song, Siqi, et al.
Published: (2026)
by: Song, Siqi, et al.
Published: (2026)
EEND-SAA: Enrollment-Less Main Speaker Voice Activity Detection Using Self-Attention Attractors
by: Wu, Wen-Yung, et al.
Published: (2025)
by: Wu, Wen-Yung, et al.
Published: (2025)
Two-stage Audio-Visual Target Speaker Extraction System for Real-Time Processing On Edge Device
by: Li, Zixuan, et al.
Published: (2025)
by: Li, Zixuan, et al.
Published: (2025)
Target Speaker Lipreading by Audio-Visual Self-Distillation Pretraining and Speaker Adaptation
by: Zhang, Jing-Xuan, et al.
Published: (2025)
by: Zhang, Jing-Xuan, et al.
Published: (2025)
Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment
by: Shao, Yiwen, et al.
Published: (2024)
by: Shao, Yiwen, et al.
Published: (2024)
Target Speaker Extraction by Directly Exploiting Contextual Information in the Time-Frequency Domain
by: Yang, Xue, et al.
Published: (2024)
by: Yang, Xue, et al.
Published: (2024)
Prompt-driven Target Speech Diarization
by: Jiang, Yidi, et al.
Published: (2023)
by: Jiang, Yidi, et al.
Published: (2023)
Trainable Adaptive Score Normalization for Automatic Speaker Verification
by: Choi, Jeong-Hwan, et al.
Published: (2025)
by: Choi, Jeong-Hwan, et al.
Published: (2025)
TripleC Learning and Lightweight Speech Enhancement for Multi-Condition Target Speech Extraction
by: Huang, Ziling
Published: (2025)
by: Huang, Ziling
Published: (2025)
MeanFlow-TSE: One-Step Generative Target Speaker Extraction with Mean Flow
by: Shimizu, Riki, et al.
Published: (2025)
by: Shimizu, Riki, et al.
Published: (2025)
Inter-Speaker Relative Cues for Two-Stage Text-Guided Target Speech Extraction
by: Dai, Wang, et al.
Published: (2026)
by: Dai, Wang, et al.
Published: (2026)
Subspace Track-before-Detect for Passive Multi-Target Tracking with Unknown Emitted Signals
by: Ito, Nobutaka, et al.
Published: (2026)
by: Ito, Nobutaka, et al.
Published: (2026)
Exploiting Noise Inseparability for Weakly-Supervised Discriminative Speech Denoising Using Noisy Targets
by: Maciejewski, Matthew, et al.
Published: (2026)
by: Maciejewski, Matthew, et al.
Published: (2026)
Similar Items
-
Adaptive Deterministic Flow Matching for Target Speaker Extraction
by: Hsieh, Tsun-An, et al.
Published: (2025) -
Spatial-Filter-Bank-Based Neural Method for Multichannel Speech Enhancement
by: Zheng, Tianqin, et al.
Published: (2025) -
Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios
by: Fiorio, Luan Vinícius, et al.
Published: (2025) -
A Unified Neural Codec Language Model for Selective Editable Text to Speech Generation
by: Pei, Hanchen, et al.
Published: (2026) -
Flexible Multi-Channel Target Speaker Extraction Using Geometry-Conditioned Spatially Selective Non-linear Filters
by: Li, Jiatong, et al.
Published: (2026)