Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.15429 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866915456486473728 |
|---|---|
| author | Sun, Yulin Xu, Qisheng Su, Yi Zhu, Qian Dou, Yong Liu, Xinwang Xu, Kele |
| author_facet | Sun, Yulin Xu, Qisheng Su, Yi Zhu, Qian Dou, Yong Liu, Xinwang Xu, Kele |
| contents | AudioSet is a widely used benchmark in the audio research community and has significantly advanced various audio-related tasks. However, persistent issues with label accuracy and completeness remain critical bottlenecks that limit performance in downstream applications.To address the aforementioned challenges, we propose a three-stage reannotation framework that harnesses general-purpose audio-language foundation models to systematically improve the label quality of AudioSet. The framework employs a cross-modal prompting strategy, inspired by the concept of prompt chaining, wherein prompts are sequentially composed to execute subtasks (audio comprehension, label synthesis, and semantic alignment). Leveraging this framework, we construct a high-quality, structured relabeled version of AudioSet-R. Extensive experiments conducted on representative audio classification models--including AST, PANNs, SSAST, and AudioMAE--consistently demonstrate substantial performance improvements, thereby validating the generalizability and effectiveness of the proposed approach in enhancing label reliability.The code is publicly available at: https://github.com/colaudiolab/AudioSet-R. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2508_15429 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation Sun, Yulin Xu, Qisheng Su, Yi Zhu, Qian Dou, Yong Liu, Xinwang Xu, Kele Sound AudioSet is a widely used benchmark in the audio research community and has significantly advanced various audio-related tasks. However, persistent issues with label accuracy and completeness remain critical bottlenecks that limit performance in downstream applications.To address the aforementioned challenges, we propose a three-stage reannotation framework that harnesses general-purpose audio-language foundation models to systematically improve the label quality of AudioSet. The framework employs a cross-modal prompting strategy, inspired by the concept of prompt chaining, wherein prompts are sequentially composed to execute subtasks (audio comprehension, label synthesis, and semantic alignment). Leveraging this framework, we construct a high-quality, structured relabeled version of AudioSet-R. Extensive experiments conducted on representative audio classification models--including AST, PANNs, SSAST, and AudioMAE--consistently demonstrate substantial performance improvements, thereby validating the generalizability and effectiveness of the proposed approach in enhancing label reliability.The code is publicly available at: https://github.com/colaudiolab/AudioSet-R. |
| title | AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation |
| topic | Sound |
| url | https://arxiv.org/abs/2508.15429 |