:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ko, Byeong-Yun, Min, Deokki, Nam, Hyeonuk, Park, Yong-Hwa
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2504.14817
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards Understanding of Frequency Dependence on Sound Event Detection
by: Nam, Hyeonuk, et al.
Published: (2025)

Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes
by: Nam, Hyeonuk, et al.
Published: (2024)

Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection
by: Nam, Hyeonuk, et al.
Published: (2024)

Temporal Attention Pooling for Frequency Dynamic Convolution in Sound Event Detection
by: Nam, Hyeonuk, et al.
Published: (2025)

JiTTER: Jigsaw Temporal Transformer for Event Reconstruction for Self-Supervised Sound Event Detection
by: Nam, Hyeonuk, et al.
Published: (2025)

Pushing the Limit of Sound Event Detection with Multi-Dilated Frequency Dynamic Convolution
by: Nam, Hyeonuk, et al.
Published: (2024)

Binaural Sound Event Localization and Detection based on HRTF Cues for Humanoid Robots
by: Lee, Gyeong-Tae, et al.
Published: (2025)

Frequency Dynamic Convolutions for Sound Event Detection
by: Nam, Hyeonuk
Published: (2025)

Auditory Intelligence: Understanding the World Through Sound
by: Nam, Hyeonuk
Published: (2025)

SRP-PHAT-NET: A Reliability-Driven DNN for Reverberant Speaker Localization
by: Shaybet, Bar, et al.
Published: (2025)

Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor
by: Lee, Younglo, et al.
Published: (2024)

Rhythm Features for Speaker Identification
by: Mehlman, Nick, et al.
Published: (2025)

Cochleagram-based Noise Adapted Speaker Identification System for Distorted Speech
by: Ahmed, Sabbir, et al.
Published: (2025)

Pretraining Multi-Speaker Identification for Neural Speaker Diarization
by: Horiguchi, Shota, et al.
Published: (2025)

Explainable DNN-based Beamformer with Postfilter
by: Cohen, Adi, et al.
Published: (2024)

LG Uplus System with Multi-Speaker IDs and Discriminator-based Sub-Judges for the WildSpoof Challenge
by: Park, Jinyoung, et al.
Published: (2025)

Hybrid Decoding: Rapid Pass and Selective Detailed Correction for Sequence Models
by: Lim, Yunkyu, et al.
Published: (2025)

Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
by: Ko, Myeongjin, et al.
Published: (2023)

VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark
by: Lin, Yuke, et al.
Published: (2024)

A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR
by: Morrone, Giovanni, et al.
Published: (2024)

Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array
by: Qiao, Yue, et al.
Published: (2024)

DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification
by: Wang, Qing, et al.
Published: (2025)

Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment
by: Shao, Yiwen, et al.
Published: (2024)

Emotion Recognition in Multi-Speaker Conversations through Speaker Identification, Knowledge Distillation, and Hierarchical Fusion
by: Li, Xiao, et al.
Published: (2025)

SpeakerRPL v2: Robust Open-set Speaker Identification through Enhanced Few-shot Foundation Tuning and Model Fusion
by: Chen, Zhiyong, et al.
Published: (2026)

Array Geometry-Robust Attention-Based Neural Beamformer for Moving Speakers
by: Tammen, Marvin, et al.
Published: (2024)

Multi-Label Training for Text-Independent Speaker Identification
by: Xue, Yuqi
Published: (2022)

Libri2Vox Dataset: Target Speaker Extraction with Diverse Speaker Conditions and Synthetic Data
by: Liu, Yun, et al.
Published: (2024)

Rec-RIR: Monaural Blind Room Impulse Response Identification via DNN-based Reverberant Speech Reconstruction in STFT Domain
by: Wang, Pengyu, et al.
Published: (2025)

SEED: Speaker Embedding Enhancement Diffusion Model
by: Nam, KiHyun, et al.
Published: (2025)

Enhancing Open-Set Speaker Identification through Rapid Tuning with Speaker Reciprocal Points and Negative Sample
by: Chen, Zhiyong, et al.
Published: (2024)

Stack Less, Repeat More: A Block Reusing Approach for Progressive Speech Enhancement
by: Kim, Jangyeon, et al.
Published: (2025)

Query-Based Asymmetric Modeling with Decoupled Input-Output Rates for Speech Restoration
by: Shin, Ui-Hyeop, et al.
Published: (2025)

NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers
by: Park, Nohil, et al.
Published: (2024)

Disentangled Representation Learning for Environment-agnostic Speaker Recognition
by: Nam, KiHyun, et al.
Published: (2024)

Target Speaker Extraction with Curriculum Learning
by: Liu, Yun, et al.
Published: (2024)

Uncertainty Quantification in Machine Learning for Joint Speaker Diarization and Identification
by: McKnight, Simon W., et al.
Published: (2023)

Design and Analysis of Binaural Signal Matching with Arbitrary Microphone Arrays and Listener Head Rotations
by: Madmoni, Lior, et al.
Published: (2024)

openFEAT: Improving Speaker Identification by Open-set Few-shot Embedding Adaptation with Transformer
by: C, Kishan K, et al.
Published: (2022)

Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR
by: Wang, Weiqing, et al.
Published: (2025)