:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shibata, Yuto, Tanaka, Keitaro, Bando, Yoshiaki, Imoto, Keisuke, Kataoka, Hirokatsu, Aoki, Yoshimitsu
Format:	Preprint
Published:	2025
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2504.04428
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DOA-Aware Audio-Visual Self-Supervised Learning for Sound Event Localization and Detection
by: Fujita, Yoto, et al.
Published: (2024)

LEAD Dataset: How Can Labels for Sound Event Detection Vary Depending on Annotators?
by: Koga, Naoki, et al.
Published: (2024)

Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Training of Sound Events With Partial Labels
by: Imoto, Keisuke
Published: (2025)

BGM2Pose: Active 3D Human Pose Estimation with Non-Stationary Sounds
by: Shibata, Yuto, et al.
Published: (2025)

Acoustic-based 3D Human Pose Estimation Robust to Human Position
by: Oumi, Yusuke, et al.
Published: (2024)

Sign-to-Speech Prosody Transfer via Sign Reconstruction-based GAN
by: Manabe, Toranosuke, et al.
Published: (2026)

SHAMaNS: Sound Localization with Hybrid Alpha-Stable Spatial Measure and Neural Steerer
by: Di Carlo, Diego, et al.
Published: (2025)

Sound Scene Synthesis at the DCASE 2024 Challenge
by: Lagrange, Mathieu, et al.
Published: (2025)

Towards Open World Sound Event Detection
by: Hai, P. H., et al.
Published: (2026)

MoireDB: Formula-generated Interference-fringe Image Dataset
by: Matsuo, Yuto, et al.
Published: (2025)

Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection
by: Cai, Pengfei, et al.
Published: (2024)

Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes
by: Roman, Adrian S., et al.
Published: (2024)

MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection
by: Cai, Pengfei, et al.
Published: (2024)

Challenge on Sound Scene Synthesis: Evaluating Text-to-Audio Generation
by: Lee, Junwon, et al.
Published: (2024)

Detect Any Sound: Open-Vocabulary Sound Event Detection with Multi-Modal Queries
by: Cai, Pengfei, et al.
Published: (2025)

Improving Anomalous Sound Detection with Attribute-aware Representation from Domain-adaptive Pre-training
by: Fang, Xin, et al.
Published: (2025)

Discrete Speech Unit Extraction via Independent Component Analysis
by: Nakamura, Tomohiko, et al.
Published: (2025)

Leveraging Language Model Capabilities for Sound Event Detection
by: Wang, Hualei, et al.
Published: (2023)

Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models
by: Zheng, Xinhu, et al.
Published: (2024)

The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection
by: Bibbó, Gabriel, et al.
Published: (2024)

Unlocking Strong Supervision: A Data-Centric Study of General-Purpose Audio Pre-Training Methods
by: Zhou, Xuanru, et al.
Published: (2026)

MoireMix: A Formula-Based Data Augmentation for Improving Image Classification Robustness
by: Matsuo, Yuto, et al.
Published: (2026)

Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training
by: You, Hong-Jie, et al.
Published: (2025)

Studying the Effect of Audio Filters in Pre-Trained Models for Environmental Sound Classification
by: Dawn, Aditya, et al.
Published: (2024)

FlexSED: Towards Open-Vocabulary Sound Event Detection
by: Hai, Jiarui, et al.
Published: (2025)

Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising
by: Fujita, Yoto, et al.
Published: (2024)

How Much Does Machine Identity Matter in Anomalous Sound Detection at Test Time?
by: Wilkinghoff, Kevin, et al.
Published: (2026)

Semi-Supervised Diseased Detection from Speech Dialogues with Multi-Level Data Modeling
by: Li, Xingyuan, et al.
Published: (2026)

'Studies for': A Human-AI Co-Creative Sound Artwork Using a Real-time Multi-channel Sound Generation Model
by: Nagashima, Chihiro, et al.
Published: (2025)

Rethinking Leveraging Pre-Trained Multi-Layer Representations for Speaker Verification
by: Kim, Jin Sob, et al.
Published: (2025)

Addressing Gradient Misalignment in Data-Augmented Training for Robust Speech Deepfake Detection
by: Truong, Duc-Tuan, et al.
Published: (2025)

Pre-training Vision Transformers with Formula-driven Supervised Learning
by: Kataoka, Hirokatsu, et al.
Published: (2022)

Environmental Sound Deepfake Detection Using Deep-Learning Framework
by: Pham, Lam, et al.
Published: (2026)

Gaussian Process Regression of Steering Vectors With Physics-Aware Deep Composite Kernels for Augmented Listening
by: Di Carlo, Diego, et al.
Published: (2025)

Hybrid Disagreement-Diversity Active Learning for Bioacoustic Sound Event Detection
by: Zhang, Shiqi, et al.
Published: (2025)

SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation
by: Mu, Da, et al.
Published: (2024)

MeanVoiceFlow: One-step Nonparallel Voice Conversion with Mean Flows
by: Kaneko, Takuhiro, et al.
Published: (2026)

Vocoder-Projected Feature Discriminator
by: Kaneko, Takuhiro, et al.
Published: (2025)

FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation
by: Kaneko, Takuhiro, et al.
Published: (2024)

FasterVoiceGrad: Faster One-step Diffusion-Based Voice Conversion with Adversarial Diffusion Conversion Distillation
by: Kaneko, Takuhiro, et al.
Published: (2025)