:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Choi, Ryuhaerang, Chatterjee, Soumyajit, Spathis, Dimitris, Lee, Sung-Ju, Kawsar, Fahim, Malekzadeh, Mohammad
Format:	Preprint
Published:	2024
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2410.23008
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio Classification
by: Lee, Dongheon, et al.
Published: (2024)

SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer
by: Wang, Helin, et al.
Published: (2024)

ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds
by: Wijngaard, Gijs, et al.
Published: (2024)

DnR-nonverbal: Cinematic Audio Source Separation Dataset Containing Non-Verbal Sounds
by: Hasumi, Takuya, et al.
Published: (2025)

Discrete Audio Representations for Automated Audio Captioning
by: Tian, Jingguang, et al.
Published: (2025)

AudioSpa: Spatializing Sound Events with Text
by: Feng, Linfeng, et al.
Published: (2025)

Region-Specific Audio Tagging for Spatial Sound
by: Zhao, Jinzheng, et al.
Published: (2025)

The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection
by: Bibbó, Gabriel, et al.
Published: (2024)

Correlation of Fréchet Audio Distance With Human Perception of Environmental Audio Is Embedding Dependant
by: Tailleur, Modan, et al.
Published: (2024)

AudioCIL: A Python Toolbox for Audio Class-Incremental Learning with Multiple Scenes
by: Xu, Qisheng, et al.
Published: (2024)

Class-Incremental Learning for Multi-Label Audio Classification
by: Mulimani, Manjunath, et al.
Published: (2024)

AudioBERTScore: Objective Evaluation of Environmental Sound Synthesis Based on Similarity of Audio embedding Sequences
by: Kishi, Minoru, et al.
Published: (2025)

SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model
by: Hernandez-Olivan, Carlos, et al.
Published: (2024)

ASPED: An Audio Dataset for Detecting Pedestrians
by: Seshadri, Pavan, et al.
Published: (2023)

DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
by: Du, Jiawei, et al.
Published: (2024)

Effective Pre-Training of Audio Transformers for Sound Event Detection
by: Schmid, Florian, et al.
Published: (2024)

Enhance Temporal Relations in Audio Captioning with Sound Event Detection
by: Xie, Zeyu, et al.
Published: (2023)

Sound Check: Auditing Audio Datasets
by: Agnew, William, et al.
Published: (2024)

Online Single-Channel Audio-Based Sound Speed Estimation for Robust Multi-Channel Audio Control
by: Fuglsig, Andreas Jonas, et al.
Published: (2026)

Robust Audio Tagging under Class-wise Supervision Unreliability
by: Hou, Yuanbo, et al.
Published: (2026)

Can Audio Reveal Music Performance Difficulty? Insights from the Piano Syllabus Dataset
by: Ramoneda, Pedro, et al.
Published: (2024)

A Generalist Audio Foundation Model for Comprehensive Body Sound Auscultation
by: Wang, Pingjie, et al.
Published: (2024)

Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
by: Saijo, Kohei, et al.
Published: (2024)

Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection
by: Han, Bing, et al.
Published: (2025)

Exploring Text-Queried Sound Event Detection with Audio Source Separation
by: Yin, Han, et al.
Published: (2024)

Listen, Analyze, and Adapt to Learn New Attacks: An Exemplar-Free Class Incremental Learning Method for Audio Deepfake Source Tracing
by: Xiao, Yang, et al.
Published: (2025)

PIAST: A Multimodal Piano Dataset with Audio, Symbolic and Text
by: Bang, Hayeon, et al.
Published: (2024)

Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
by: Qian, Xinyuan, et al.
Published: (2024)

UCIL: An Unsupervised Class Incremental Learning Approach for Sound Event Detection
by: Xiao, Yang, et al.
Published: (2024)

Exploring Perceptual Audio Quality Measurement on Stereo Processing Using the Open Dataset of Audio Quality
by: Delgado, Pablo M., et al.
Published: (2025)

E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models
by: Dong, Jiaheng, et al.
Published: (2025)

Audio-Language Datasets of Scenes and Events: A Survey
by: Wijngaard, Gijs, et al.
Published: (2024)

MLAAD: The Multi-Language Audio Anti-Spoofing Dataset
by: Müller, Nicolas M., et al.
Published: (2024)

DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation
by: Li, Baihan, et al.
Published: (2024)

Generating Diverse Audio-Visual 360 Soundscapes for Sound Event Localization and Detection
by: Roman, Adrian S., et al.
Published: (2025)

EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
by: Kim, Jaeyeon, et al.
Published: (2024)

Construction and Analysis of Impression Caption Dataset for Environmental Sounds
by: Okamoto, Yuki, et al.
Published: (2024)

A Comparative Analysis of Poetry Reading Audio: Singing, Narrating, or Somewhere In Between?
by: Choi, Kahyun, et al.
Published: (2024)

AFT: An Exemplar-Free Class Incremental Learning Method for Environmental Sound Classification
by: Chen, Xinyi, et al.
Published: (2025)

Evaluating CNN with Stacked Feature Representations and Audio Spectrogram Transformer Models for Sound Classification
by: Dehaghania, Parinaz Binandeh, et al.
Published: (2026)