:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sashida, Kurumi, Tanaka, Gouhei
Format:	Preprint
Published:	2026
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2602.06271
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Robust Bioacoustic Detection via Richly Labelled Synthetic Soundscape Augmentation
by: Soltero, Kaspar, et al.
Published: (2025)

Generating Diverse Audio-Visual 360 Soundscapes for Sound Event Localization and Detection
by: Roman, Adrian S., et al.
Published: (2025)

Soundscape Captioning using Sound Affective Quality Network and Large Language Model
by: Hou, Yuanbo, et al.
Published: (2024)

Effective Pre-Training of Audio Transformers for Sound Event Detection
by: Schmid, Florian, et al.
Published: (2024)

PSELDNets: Pre-trained Neural Networks on a Large-scale Synthetic Dataset for Sound Event Localization and Detection
by: Hu, Jinbo, et al.
Published: (2024)

Sound Tagging in Infant-centric Home Soundscapes
by: Khan, Mohammad Nur Hossain, et al.
Published: (2024)

Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms
by: Roman, Iran R., et al.
Published: (2024)

Feature Selection via Graph Topology Inference for Soundscape Emotion Recognition
by: Rey, Samuel, et al.
Published: (2025)

Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and Participant-linked Inputs
by: Ooi, Kenneth, et al.
Published: (2023)

Source Tracing of Synthetic Speech Systems Through Paralinguistic Pre-Trained Representations
by: Girish, et al.
Published: (2025)

ARAUS: A Large-Scale Dataset and Baseline Models of Affective Responses to Augmented Urban Soundscapes
by: Ooi, Kenneth, et al.
Published: (2022)

w2v-SELD: A Sound Event Localization and Detection Framework for Self-Supervised Spatial Audio Pre-Training
by: Santos, Orlem Lima dos, et al.
Published: (2023)

Evaluating CNN with Stacked Feature Representations and Audio Spectrogram Transformer Models for Sound Classification
by: Dehaghania, Parinaz Binandeh, et al.
Published: (2026)

Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
by: Wang, Zhiyong, et al.
Published: (2024)

Retrieval-Augmented Approach for Unsupervised Anomalous Sound Detection and Captioning without Model Training
by: Ogura, Ryoya, et al.
Published: (2024)

Reducing the Gap Between Pretrained Speech Enhancement and Recognition Models Using a Real Speech-Trained Bridging Module
by: Cui, Zhongjian, et al.
Published: (2025)

CNN-based Robust Sound Source Localization with SRP-PHAT for the Extreme Edge
by: Yin, Jun, et al.
Published: (2025)

Generating Moving 3D Soundscapes with Latent Diffusion Models
by: Templin, Christian, et al.
Published: (2025)

Automating Urban Soundscape Enhancements with AI: In-situ Assessment of Quality and Restorativeness in Traffic-Exposed Residential Areas
by: Lam, Bhan, et al.
Published: (2024)

Improvements of Discriminative Feature Space Training for Anomalous Sound Detection in Unlabeled Conditions
by: Fujimura, Takuya, et al.
Published: (2024)

EmoFormer: A Text-Independent Speech Emotion Recognition using a Hybrid Transformer-CNN model
by: Hasan, Rashedul, et al.
Published: (2025)

Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models
by: Zheng, Xinhu, et al.
Published: (2024)

Fine-grained Soundscape Control for Augmented Hearing
by: Oh, Seunghyun, et al.
Published: (2026)

Studying the Effect of Audio Filters in Pre-Trained Models for Environmental Sound Classification
by: Dawn, Aditya, et al.
Published: (2024)

Improving Audio Spectrogram Transformers for Sound Event Detection Through Multi-Stage Training
by: Schmid, Florian, et al.
Published: (2024)

Temporal Pooling Strategies for Training-Free Anomalous Sound Detection with Self-Supervised Audio Embeddings
by: Wilkinghoff, Kevin, et al.
Published: (2026)

ENACT-Heart -- ENsemble-based Assessment Using CNN and Transformer on Heart Sounds
by: Han, Jiho, et al.
Published: (2025)

The TMU System for the XACLE Challenge: Training Large Audio Language Models with CLAP Pseudo-Labels
by: Tsutsumi, Ayuto, et al.
Published: (2026)

Fine-Grained Engine Fault Sound Event Detection Using Multimodal Signals
by: Fedorishin, Dennis, et al.
Published: (2024)

Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
by: Chen, Li-Wei, et al.
Published: (2024)

Sound Field Reconstruction Using a Compact Acoustics-informed Neural Network
by: Ma, Fei, et al.
Published: (2024)

MAGENTA: Magnitude and Geometry-ENhanced Training Approach for Robust Long-Tailed Sound Event Localization and Detection
by: Yeow, Jun-Wei, et al.
Published: (2025)

How Much Does Machine Identity Matter in Anomalous Sound Detection at Test Time?
by: Wilkinghoff, Kevin, et al.
Published: (2026)

FakeMusicCaps: a Dataset for Detection and Attribution of Synthetic Music Generated via Text-to-Music Models
by: Comanducci, Luca, et al.
Published: (2024)

Automatic Sound Event Detection and Classification of Great Ape Calls Using Neural Networks
by: Jiang, Zifan, et al.
Published: (2023)

NoiseBandNet: Controllable Time-Varying Neural Synthesis of Sound Effects Using Filterbanks
by: Barahona-Ríos, Adrián, et al.
Published: (2023)

Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection
by: Han, Bing, et al.
Published: (2025)

Multichannel Voice Trigger Detection Based on Transform-average-concatenate
by: Higuchi, Takuya, et al.
Published: (2023)

An Enhanced Audio Feature Tailored for Anomalous Sound Detection Based on Pre-trained Models
by: Zhong, Guirui, et al.
Published: (2025)

Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Training of Sound Events With Partial Labels
by: Imoto, Keisuke
Published: (2025)