Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chung, Sanghyeok, Kim, Eujin, Kim, Donggun, Heo, Gaeun, You, Jeongbin, Lee, Nahyun, Choi, Sunmook, Han, Soyul, Oh, Seungsang, Kwak, Il-Youp
Format:	Preprint
Published:	2025
Subjects:	Sound Machine Learning
Online Access:	https://arxiv.org/abs/2512.15180
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914205404233728
author	Chung, Sanghyeok Kim, Eujin Kim, Donggun Heo, Gaeun You, Jeongbin Lee, Nahyun Choi, Sunmook Han, Soyul Oh, Seungsang Kwak, Il-Youp
author_facet	Chung, Sanghyeok Kim, Eujin Kim, Donggun Heo, Gaeun You, Jeongbin Lee, Nahyun Choi, Sunmook Han, Soyul Oh, Seungsang Kwak, Il-Youp
contents	Recent advances in audio generation have increased the risk of realistic environmental sound manipulation, motivating the ESDD 2026 Challenge as the first large-scale benchmark for Environmental Sound Deepfake Detection (ESDD). We propose BEAT2AASIST which extends BEATs-AASIST by splitting BEATs-derived representations along frequency or channel dimension and processing them with dual AASIST branches. To enrich feature representations, we incorporate top-k transformer layer fusion using concatenation, CNN-gated, and SE-gated strategies. In addition, vocoder-based data augmentation is applied to improve robustness against unseen spoofing methods. Experimental results on the official test sets demonstrate that the proposed approach achieves competitive performance across the challenge tracks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_15180
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	BEAT2AASIST model with layer fusion for ESDD 2026 Challenge Chung, Sanghyeok Kim, Eujin Kim, Donggun Heo, Gaeun You, Jeongbin Lee, Nahyun Choi, Sunmook Han, Soyul Oh, Seungsang Kwak, Il-Youp Sound Machine Learning Recent advances in audio generation have increased the risk of realistic environmental sound manipulation, motivating the ESDD 2026 Challenge as the first large-scale benchmark for Environmental Sound Deepfake Detection (ESDD). We propose BEAT2AASIST which extends BEATs-AASIST by splitting BEATs-derived representations along frequency or channel dimension and processing them with dual AASIST branches. To enrich feature representations, we incorporate top-k transformer layer fusion using concatenation, CNN-gated, and SE-gated strategies. In addition, vocoder-based data augmentation is applied to improve robustness against unseen spoofing methods. Experimental results on the official test sets demonstrate that the proposed approach achieves competitive performance across the challenge tracks.
title	BEAT2AASIST model with layer fusion for ESDD 2026 Challenge
topic	Sound Machine Learning
url	https://arxiv.org/abs/2512.15180

Similar Items