Saved in:
Bibliographic Details
Main Authors: Sun, Yulin, Xu, Qisheng, Su, Yi, Zhu, Qian, Dou, Yong, Liu, Xinwang, Xu, Kele
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.15429
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915456486473728
author Sun, Yulin
Xu, Qisheng
Su, Yi
Zhu, Qian
Dou, Yong
Liu, Xinwang
Xu, Kele
author_facet Sun, Yulin
Xu, Qisheng
Su, Yi
Zhu, Qian
Dou, Yong
Liu, Xinwang
Xu, Kele
contents AudioSet is a widely used benchmark in the audio research community and has significantly advanced various audio-related tasks. However, persistent issues with label accuracy and completeness remain critical bottlenecks that limit performance in downstream applications.To address the aforementioned challenges, we propose a three-stage reannotation framework that harnesses general-purpose audio-language foundation models to systematically improve the label quality of AudioSet. The framework employs a cross-modal prompting strategy, inspired by the concept of prompt chaining, wherein prompts are sequentially composed to execute subtasks (audio comprehension, label synthesis, and semantic alignment). Leveraging this framework, we construct a high-quality, structured relabeled version of AudioSet-R. Extensive experiments conducted on representative audio classification models--including AST, PANNs, SSAST, and AudioMAE--consistently demonstrate substantial performance improvements, thereby validating the generalizability and effectiveness of the proposed approach in enhancing label reliability.The code is publicly available at: https://github.com/colaudiolab/AudioSet-R.
format Preprint
id arxiv_https___arxiv_org_abs_2508_15429
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation
Sun, Yulin
Xu, Qisheng
Su, Yi
Zhu, Qian
Dou, Yong
Liu, Xinwang
Xu, Kele
Sound
AudioSet is a widely used benchmark in the audio research community and has significantly advanced various audio-related tasks. However, persistent issues with label accuracy and completeness remain critical bottlenecks that limit performance in downstream applications.To address the aforementioned challenges, we propose a three-stage reannotation framework that harnesses general-purpose audio-language foundation models to systematically improve the label quality of AudioSet. The framework employs a cross-modal prompting strategy, inspired by the concept of prompt chaining, wherein prompts are sequentially composed to execute subtasks (audio comprehension, label synthesis, and semantic alignment). Leveraging this framework, we construct a high-quality, structured relabeled version of AudioSet-R. Extensive experiments conducted on representative audio classification models--including AST, PANNs, SSAST, and AudioMAE--consistently demonstrate substantial performance improvements, thereby validating the generalizability and effectiveness of the proposed approach in enhancing label reliability.The code is publicly available at: https://github.com/colaudiolab/AudioSet-R.
title AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation
topic Sound
url https://arxiv.org/abs/2508.15429