Saved in:
Bibliographic Details
Main Authors: Chung, Sanghyeok, Kim, Eujin, Kim, Donggun, Heo, Gaeun, You, Jeongbin, Lee, Nahyun, Choi, Sunmook, Han, Soyul, Oh, Seungsang, Kwak, Il-Youp
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2512.15180
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914205404233728
author Chung, Sanghyeok
Kim, Eujin
Kim, Donggun
Heo, Gaeun
You, Jeongbin
Lee, Nahyun
Choi, Sunmook
Han, Soyul
Oh, Seungsang
Kwak, Il-Youp
author_facet Chung, Sanghyeok
Kim, Eujin
Kim, Donggun
Heo, Gaeun
You, Jeongbin
Lee, Nahyun
Choi, Sunmook
Han, Soyul
Oh, Seungsang
Kwak, Il-Youp
contents Recent advances in audio generation have increased the risk of realistic environmental sound manipulation, motivating the ESDD 2026 Challenge as the first large-scale benchmark for Environmental Sound Deepfake Detection (ESDD). We propose BEAT2AASIST which extends BEATs-AASIST by splitting BEATs-derived representations along frequency or channel dimension and processing them with dual AASIST branches. To enrich feature representations, we incorporate top-k transformer layer fusion using concatenation, CNN-gated, and SE-gated strategies. In addition, vocoder-based data augmentation is applied to improve robustness against unseen spoofing methods. Experimental results on the official test sets demonstrate that the proposed approach achieves competitive performance across the challenge tracks.
format Preprint
id arxiv_https___arxiv_org_abs_2512_15180
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle BEAT2AASIST model with layer fusion for ESDD 2026 Challenge
Chung, Sanghyeok
Kim, Eujin
Kim, Donggun
Heo, Gaeun
You, Jeongbin
Lee, Nahyun
Choi, Sunmook
Han, Soyul
Oh, Seungsang
Kwak, Il-Youp
Sound
Machine Learning
Recent advances in audio generation have increased the risk of realistic environmental sound manipulation, motivating the ESDD 2026 Challenge as the first large-scale benchmark for Environmental Sound Deepfake Detection (ESDD). We propose BEAT2AASIST which extends BEATs-AASIST by splitting BEATs-derived representations along frequency or channel dimension and processing them with dual AASIST branches. To enrich feature representations, we incorporate top-k transformer layer fusion using concatenation, CNN-gated, and SE-gated strategies. In addition, vocoder-based data augmentation is applied to improve robustness against unseen spoofing methods. Experimental results on the official test sets demonstrate that the proposed approach achieves competitive performance across the challenge tracks.
title BEAT2AASIST model with layer fusion for ESDD 2026 Challenge
topic Sound
Machine Learning
url https://arxiv.org/abs/2512.15180