Saved in:
| Main Authors: | Shibata, Yuto, Tanaka, Keitaro, Bando, Yoshiaki, Imoto, Keisuke, Kataoka, Hirokatsu, Aoki, Yoshimitsu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.04428 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DOA-Aware Audio-Visual Self-Supervised Learning for Sound Event Localization and Detection
by: Fujita, Yoto, et al.
Published: (2024)
by: Fujita, Yoto, et al.
Published: (2024)
LEAD Dataset: How Can Labels for Sound Event Detection Vary Depending on Annotators?
by: Koga, Naoki, et al.
Published: (2024)
by: Koga, Naoki, et al.
Published: (2024)
Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Training of Sound Events With Partial Labels
by: Imoto, Keisuke
Published: (2025)
by: Imoto, Keisuke
Published: (2025)
BGM2Pose: Active 3D Human Pose Estimation with Non-Stationary Sounds
by: Shibata, Yuto, et al.
Published: (2025)
by: Shibata, Yuto, et al.
Published: (2025)
Acoustic-based 3D Human Pose Estimation Robust to Human Position
by: Oumi, Yusuke, et al.
Published: (2024)
by: Oumi, Yusuke, et al.
Published: (2024)
Sign-to-Speech Prosody Transfer via Sign Reconstruction-based GAN
by: Manabe, Toranosuke, et al.
Published: (2026)
by: Manabe, Toranosuke, et al.
Published: (2026)
SHAMaNS: Sound Localization with Hybrid Alpha-Stable Spatial Measure and Neural Steerer
by: Di Carlo, Diego, et al.
Published: (2025)
by: Di Carlo, Diego, et al.
Published: (2025)
Sound Scene Synthesis at the DCASE 2024 Challenge
by: Lagrange, Mathieu, et al.
Published: (2025)
by: Lagrange, Mathieu, et al.
Published: (2025)
Towards Open World Sound Event Detection
by: Hai, P. H., et al.
Published: (2026)
by: Hai, P. H., et al.
Published: (2026)
MoireDB: Formula-generated Interference-fringe Image Dataset
by: Matsuo, Yuto, et al.
Published: (2025)
by: Matsuo, Yuto, et al.
Published: (2025)
Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection
by: Cai, Pengfei, et al.
Published: (2024)
by: Cai, Pengfei, et al.
Published: (2024)
Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes
by: Roman, Adrian S., et al.
Published: (2024)
by: Roman, Adrian S., et al.
Published: (2024)
MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection
by: Cai, Pengfei, et al.
Published: (2024)
by: Cai, Pengfei, et al.
Published: (2024)
Challenge on Sound Scene Synthesis: Evaluating Text-to-Audio Generation
by: Lee, Junwon, et al.
Published: (2024)
by: Lee, Junwon, et al.
Published: (2024)
Detect Any Sound: Open-Vocabulary Sound Event Detection with Multi-Modal Queries
by: Cai, Pengfei, et al.
Published: (2025)
by: Cai, Pengfei, et al.
Published: (2025)
Improving Anomalous Sound Detection with Attribute-aware Representation from Domain-adaptive Pre-training
by: Fang, Xin, et al.
Published: (2025)
by: Fang, Xin, et al.
Published: (2025)
Discrete Speech Unit Extraction via Independent Component Analysis
by: Nakamura, Tomohiko, et al.
Published: (2025)
by: Nakamura, Tomohiko, et al.
Published: (2025)
Leveraging Language Model Capabilities for Sound Event Detection
by: Wang, Hualei, et al.
Published: (2023)
by: Wang, Hualei, et al.
Published: (2023)
Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models
by: Zheng, Xinhu, et al.
Published: (2024)
by: Zheng, Xinhu, et al.
Published: (2024)
The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection
by: Bibbó, Gabriel, et al.
Published: (2024)
by: Bibbó, Gabriel, et al.
Published: (2024)
Unlocking Strong Supervision: A Data-Centric Study of General-Purpose Audio Pre-Training Methods
by: Zhou, Xuanru, et al.
Published: (2026)
by: Zhou, Xuanru, et al.
Published: (2026)
MoireMix: A Formula-Based Data Augmentation for Improving Image Classification Robustness
by: Matsuo, Yuto, et al.
Published: (2026)
by: Matsuo, Yuto, et al.
Published: (2026)
Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training
by: You, Hong-Jie, et al.
Published: (2025)
by: You, Hong-Jie, et al.
Published: (2025)
Studying the Effect of Audio Filters in Pre-Trained Models for Environmental Sound Classification
by: Dawn, Aditya, et al.
Published: (2024)
by: Dawn, Aditya, et al.
Published: (2024)
FlexSED: Towards Open-Vocabulary Sound Event Detection
by: Hai, Jiarui, et al.
Published: (2025)
by: Hai, Jiarui, et al.
Published: (2025)
Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising
by: Fujita, Yoto, et al.
Published: (2024)
by: Fujita, Yoto, et al.
Published: (2024)
How Much Does Machine Identity Matter in Anomalous Sound Detection at Test Time?
by: Wilkinghoff, Kevin, et al.
Published: (2026)
by: Wilkinghoff, Kevin, et al.
Published: (2026)
Semi-Supervised Diseased Detection from Speech Dialogues with Multi-Level Data Modeling
by: Li, Xingyuan, et al.
Published: (2026)
by: Li, Xingyuan, et al.
Published: (2026)
'Studies for': A Human-AI Co-Creative Sound Artwork Using a Real-time Multi-channel Sound Generation Model
by: Nagashima, Chihiro, et al.
Published: (2025)
by: Nagashima, Chihiro, et al.
Published: (2025)
Rethinking Leveraging Pre-Trained Multi-Layer Representations for Speaker Verification
by: Kim, Jin Sob, et al.
Published: (2025)
by: Kim, Jin Sob, et al.
Published: (2025)
Addressing Gradient Misalignment in Data-Augmented Training for Robust Speech Deepfake Detection
by: Truong, Duc-Tuan, et al.
Published: (2025)
by: Truong, Duc-Tuan, et al.
Published: (2025)
Pre-training Vision Transformers with Formula-driven Supervised Learning
by: Kataoka, Hirokatsu, et al.
Published: (2022)
by: Kataoka, Hirokatsu, et al.
Published: (2022)
Environmental Sound Deepfake Detection Using Deep-Learning Framework
by: Pham, Lam, et al.
Published: (2026)
by: Pham, Lam, et al.
Published: (2026)
Gaussian Process Regression of Steering Vectors With Physics-Aware Deep Composite Kernels for Augmented Listening
by: Di Carlo, Diego, et al.
Published: (2025)
by: Di Carlo, Diego, et al.
Published: (2025)
Hybrid Disagreement-Diversity Active Learning for Bioacoustic Sound Event Detection
by: Zhang, Shiqi, et al.
Published: (2025)
by: Zhang, Shiqi, et al.
Published: (2025)
SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation
by: Mu, Da, et al.
Published: (2024)
by: Mu, Da, et al.
Published: (2024)
MeanVoiceFlow: One-step Nonparallel Voice Conversion with Mean Flows
by: Kaneko, Takuhiro, et al.
Published: (2026)
by: Kaneko, Takuhiro, et al.
Published: (2026)
Vocoder-Projected Feature Discriminator
by: Kaneko, Takuhiro, et al.
Published: (2025)
by: Kaneko, Takuhiro, et al.
Published: (2025)
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation
by: Kaneko, Takuhiro, et al.
Published: (2024)
by: Kaneko, Takuhiro, et al.
Published: (2024)
FasterVoiceGrad: Faster One-step Diffusion-Based Voice Conversion with Adversarial Diffusion Conversion Distillation
by: Kaneko, Takuhiro, et al.
Published: (2025)
by: Kaneko, Takuhiro, et al.
Published: (2025)
Similar Items
-
DOA-Aware Audio-Visual Self-Supervised Learning for Sound Event Localization and Detection
by: Fujita, Yoto, et al.
Published: (2024) -
LEAD Dataset: How Can Labels for Sound Event Detection Vary Depending on Annotators?
by: Koga, Naoki, et al.
Published: (2024) -
Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Training of Sound Events With Partial Labels
by: Imoto, Keisuke
Published: (2025) -
BGM2Pose: Active 3D Human Pose Estimation with Non-Stationary Sounds
by: Shibata, Yuto, et al.
Published: (2025) -
Acoustic-based 3D Human Pose Estimation Robust to Human Position
by: Oumi, Yusuke, et al.
Published: (2024)