Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Yixin, Lu, Lie, Jin, Jihui, Sun, Lichao, Fanelli, Andrea
Format:	Preprint
Published:	2025
Subjects:	Sound Artificial Intelligence Cryptography and Security Machine Learning Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2502.04230
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914588972285952
author	Liu, Yixin Lu, Lie Jin, Jihui Sun, Lichao Fanelli, Andrea
author_facet	Liu, Yixin Lu, Lie Jin, Jihui Sun, Lichao Fanelli, Andrea
contents	The rapid proliferation of generative audio synthesis and editing technologies has raised serious concerns about copyright infringement, data provenance, and the spread of misinformation via deepfake audio. Watermarking offers a proactive solution by embedding imperceptible yet identifiable and traceable signals into audio content. While recent neural network-based watermarking methods like WavMark and AudioSeal have improved robustness and quality, they struggle to jointly optimize both robust detection and accurate attribution. This paper introduces Cross-Attention Robust Audio Watermark (XATTNMARK), which bridges this gap by leveraging partial parameter sharing between the generator and the detector, a cross-attention mechanism for efficient message retrieval, and a temporal conditioning module for improved message distribution. Additionally, we propose a psychoacoustic-aligned time-frequency (TF) masking loss that captures fine-grained auditory masking effects, improving watermark imperceptibility. XATTNMARK achieves state-of-the-art performance in both detection and attribution, demonstrating superior robustness against a wide range of audio transformations, including challenging generative editing at varying strengths. This work advances audio watermarking for protecting intellectual property and ensuring authenticity in the era of generative AI.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_04230
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	XAttnMark: Learning Robust Audio Watermarking with Cross-Attention Liu, Yixin Lu, Lie Jin, Jihui Sun, Lichao Fanelli, Andrea Sound Artificial Intelligence Cryptography and Security Machine Learning Audio and Speech Processing The rapid proliferation of generative audio synthesis and editing technologies has raised serious concerns about copyright infringement, data provenance, and the spread of misinformation via deepfake audio. Watermarking offers a proactive solution by embedding imperceptible yet identifiable and traceable signals into audio content. While recent neural network-based watermarking methods like WavMark and AudioSeal have improved robustness and quality, they struggle to jointly optimize both robust detection and accurate attribution. This paper introduces Cross-Attention Robust Audio Watermark (XATTNMARK), which bridges this gap by leveraging partial parameter sharing between the generator and the detector, a cross-attention mechanism for efficient message retrieval, and a temporal conditioning module for improved message distribution. Additionally, we propose a psychoacoustic-aligned time-frequency (TF) masking loss that captures fine-grained auditory masking effects, improving watermark imperceptibility. XATTNMARK achieves state-of-the-art performance in both detection and attribution, demonstrating superior robustness against a wide range of audio transformations, including challenging generative editing at varying strengths. This work advances audio watermarking for protecting intellectual property and ensuring authenticity in the era of generative AI.
title	XAttnMark: Learning Robust Audio Watermarking with Cross-Attention
topic	Sound Artificial Intelligence Cryptography and Security Machine Learning Audio and Speech Processing
url	https://arxiv.org/abs/2502.04230

Similar Items