Saved in:
| Main Authors: | Ko, Byeong-Yun, Min, Deokki, Nam, Hyeonuk, Park, Yong-Hwa |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.14817 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Understanding of Frequency Dependence on Sound Event Detection
by: Nam, Hyeonuk, et al.
Published: (2025)
by: Nam, Hyeonuk, et al.
Published: (2025)
Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes
by: Nam, Hyeonuk, et al.
Published: (2024)
by: Nam, Hyeonuk, et al.
Published: (2024)
Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection
by: Nam, Hyeonuk, et al.
Published: (2024)
by: Nam, Hyeonuk, et al.
Published: (2024)
Temporal Attention Pooling for Frequency Dynamic Convolution in Sound Event Detection
by: Nam, Hyeonuk, et al.
Published: (2025)
by: Nam, Hyeonuk, et al.
Published: (2025)
JiTTER: Jigsaw Temporal Transformer for Event Reconstruction for Self-Supervised Sound Event Detection
by: Nam, Hyeonuk, et al.
Published: (2025)
by: Nam, Hyeonuk, et al.
Published: (2025)
Pushing the Limit of Sound Event Detection with Multi-Dilated Frequency Dynamic Convolution
by: Nam, Hyeonuk, et al.
Published: (2024)
by: Nam, Hyeonuk, et al.
Published: (2024)
Binaural Sound Event Localization and Detection based on HRTF Cues for Humanoid Robots
by: Lee, Gyeong-Tae, et al.
Published: (2025)
by: Lee, Gyeong-Tae, et al.
Published: (2025)
Frequency Dynamic Convolutions for Sound Event Detection
by: Nam, Hyeonuk
Published: (2025)
by: Nam, Hyeonuk
Published: (2025)
Auditory Intelligence: Understanding the World Through Sound
by: Nam, Hyeonuk
Published: (2025)
by: Nam, Hyeonuk
Published: (2025)
SRP-PHAT-NET: A Reliability-Driven DNN for Reverberant Speaker Localization
by: Shaybet, Bar, et al.
Published: (2025)
by: Shaybet, Bar, et al.
Published: (2025)
Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor
by: Lee, Younglo, et al.
Published: (2024)
by: Lee, Younglo, et al.
Published: (2024)
Rhythm Features for Speaker Identification
by: Mehlman, Nick, et al.
Published: (2025)
by: Mehlman, Nick, et al.
Published: (2025)
Cochleagram-based Noise Adapted Speaker Identification System for Distorted Speech
by: Ahmed, Sabbir, et al.
Published: (2025)
by: Ahmed, Sabbir, et al.
Published: (2025)
Pretraining Multi-Speaker Identification for Neural Speaker Diarization
by: Horiguchi, Shota, et al.
Published: (2025)
by: Horiguchi, Shota, et al.
Published: (2025)
Explainable DNN-based Beamformer with Postfilter
by: Cohen, Adi, et al.
Published: (2024)
by: Cohen, Adi, et al.
Published: (2024)
LG Uplus System with Multi-Speaker IDs and Discriminator-based Sub-Judges for the WildSpoof Challenge
by: Park, Jinyoung, et al.
Published: (2025)
by: Park, Jinyoung, et al.
Published: (2025)
Hybrid Decoding: Rapid Pass and Selective Detailed Correction for Sequence Models
by: Lim, Yunkyu, et al.
Published: (2025)
by: Lim, Yunkyu, et al.
Published: (2025)
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
by: Ko, Myeongjin, et al.
Published: (2023)
by: Ko, Myeongjin, et al.
Published: (2023)
VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark
by: Lin, Yuke, et al.
Published: (2024)
by: Lin, Yuke, et al.
Published: (2024)
A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR
by: Morrone, Giovanni, et al.
Published: (2024)
by: Morrone, Giovanni, et al.
Published: (2024)
Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array
by: Qiao, Yue, et al.
Published: (2024)
by: Qiao, Yue, et al.
Published: (2024)
DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification
by: Wang, Qing, et al.
Published: (2025)
by: Wang, Qing, et al.
Published: (2025)
Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment
by: Shao, Yiwen, et al.
Published: (2024)
by: Shao, Yiwen, et al.
Published: (2024)
Emotion Recognition in Multi-Speaker Conversations through Speaker Identification, Knowledge Distillation, and Hierarchical Fusion
by: Li, Xiao, et al.
Published: (2025)
by: Li, Xiao, et al.
Published: (2025)
SpeakerRPL v2: Robust Open-set Speaker Identification through Enhanced Few-shot Foundation Tuning and Model Fusion
by: Chen, Zhiyong, et al.
Published: (2026)
by: Chen, Zhiyong, et al.
Published: (2026)
Array Geometry-Robust Attention-Based Neural Beamformer for Moving Speakers
by: Tammen, Marvin, et al.
Published: (2024)
by: Tammen, Marvin, et al.
Published: (2024)
Multi-Label Training for Text-Independent Speaker Identification
by: Xue, Yuqi
Published: (2022)
by: Xue, Yuqi
Published: (2022)
Libri2Vox Dataset: Target Speaker Extraction with Diverse Speaker Conditions and Synthetic Data
by: Liu, Yun, et al.
Published: (2024)
by: Liu, Yun, et al.
Published: (2024)
Rec-RIR: Monaural Blind Room Impulse Response Identification via DNN-based Reverberant Speech Reconstruction in STFT Domain
by: Wang, Pengyu, et al.
Published: (2025)
by: Wang, Pengyu, et al.
Published: (2025)
SEED: Speaker Embedding Enhancement Diffusion Model
by: Nam, KiHyun, et al.
Published: (2025)
by: Nam, KiHyun, et al.
Published: (2025)
Enhancing Open-Set Speaker Identification through Rapid Tuning with Speaker Reciprocal Points and Negative Sample
by: Chen, Zhiyong, et al.
Published: (2024)
by: Chen, Zhiyong, et al.
Published: (2024)
Stack Less, Repeat More: A Block Reusing Approach for Progressive Speech Enhancement
by: Kim, Jangyeon, et al.
Published: (2025)
by: Kim, Jangyeon, et al.
Published: (2025)
Query-Based Asymmetric Modeling with Decoupled Input-Output Rates for Speech Restoration
by: Shin, Ui-Hyeop, et al.
Published: (2025)
by: Shin, Ui-Hyeop, et al.
Published: (2025)
NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers
by: Park, Nohil, et al.
Published: (2024)
by: Park, Nohil, et al.
Published: (2024)
Disentangled Representation Learning for Environment-agnostic Speaker Recognition
by: Nam, KiHyun, et al.
Published: (2024)
by: Nam, KiHyun, et al.
Published: (2024)
Target Speaker Extraction with Curriculum Learning
by: Liu, Yun, et al.
Published: (2024)
by: Liu, Yun, et al.
Published: (2024)
Uncertainty Quantification in Machine Learning for Joint Speaker Diarization and Identification
by: McKnight, Simon W., et al.
Published: (2023)
by: McKnight, Simon W., et al.
Published: (2023)
Design and Analysis of Binaural Signal Matching with Arbitrary Microphone Arrays and Listener Head Rotations
by: Madmoni, Lior, et al.
Published: (2024)
by: Madmoni, Lior, et al.
Published: (2024)
openFEAT: Improving Speaker Identification by Open-set Few-shot Embedding Adaptation with Transformer
by: C, Kishan K, et al.
Published: (2022)
by: C, Kishan K, et al.
Published: (2022)
Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR
by: Wang, Weiqing, et al.
Published: (2025)
by: Wang, Weiqing, et al.
Published: (2025)
Similar Items
-
Towards Understanding of Frequency Dependence on Sound Event Detection
by: Nam, Hyeonuk, et al.
Published: (2025) -
Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes
by: Nam, Hyeonuk, et al.
Published: (2024) -
Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection
by: Nam, Hyeonuk, et al.
Published: (2024) -
Temporal Attention Pooling for Frequency Dynamic Convolution in Sound Event Detection
by: Nam, Hyeonuk, et al.
Published: (2025) -
JiTTER: Jigsaw Temporal Transformer for Event Reconstruction for Self-Supervised Sound Event Detection
by: Nam, Hyeonuk, et al.
Published: (2025)