Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Jain, Anubhav, Kobayashi, Yuya, Murata, Naoki, Takida, Yuhta, Shibuya, Takashi, Mitsufuji, Yuki, Cohen, Niv, Memon, Nasir, Togelius, Julian
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.20111
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909596637986816
author	Jain, Anubhav Kobayashi, Yuya Murata, Naoki Takida, Yuhta Shibuya, Takashi Mitsufuji, Yuki Cohen, Niv Memon, Nasir Togelius, Julian
author_facet	Jain, Anubhav Kobayashi, Yuya Murata, Naoki Takida, Yuhta Shibuya, Takashi Mitsufuji, Yuki Cohen, Niv Memon, Nasir Togelius, Julian
contents	Watermarking techniques are vital for protecting intellectual property and preventing fraudulent use of media. Most previous watermarking schemes designed for diffusion models embed a secret key in the initial noise. The resulting pattern is often considered hard to remove and forge into unrelated images. In this paper, we propose a black-box adversarial attack without presuming access to the diffusion model weights. Our attack uses only a single watermarked example and is based on a simple observation: there is a many-to-one mapping between images and initial noises. There are regions in the clean image latent space pertaining to each watermark that get mapped to the same initial noise when inverted. Based on this intuition, we propose an adversarial attack to forge the watermark by introducing perturbations to the images such that we can enter the region of watermarked images. We show that we can also apply a similar approach for watermark removal by learning perturbations to exit this region. We report results on multiple watermarking schemes (Tree-Ring, RingID, WIND, and Gaussian Shading) across two diffusion models (SDv1.4 and SDv2.0). Our results demonstrate the effectiveness of the attack and expose vulnerabilities in the watermarking methods, motivating future research on improving them.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_20111
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Forging and Removing Latent-Noise Diffusion Watermarks Using a Single Image Jain, Anubhav Kobayashi, Yuya Murata, Naoki Takida, Yuhta Shibuya, Takashi Mitsufuji, Yuki Cohen, Niv Memon, Nasir Togelius, Julian Computer Vision and Pattern Recognition Watermarking techniques are vital for protecting intellectual property and preventing fraudulent use of media. Most previous watermarking schemes designed for diffusion models embed a secret key in the initial noise. The resulting pattern is often considered hard to remove and forge into unrelated images. In this paper, we propose a black-box adversarial attack without presuming access to the diffusion model weights. Our attack uses only a single watermarked example and is based on a simple observation: there is a many-to-one mapping between images and initial noises. There are regions in the clean image latent space pertaining to each watermark that get mapped to the same initial noise when inverted. Based on this intuition, we propose an adversarial attack to forge the watermark by introducing perturbations to the images such that we can enter the region of watermarked images. We show that we can also apply a similar approach for watermark removal by learning perturbations to exit this region. We report results on multiple watermarking schemes (Tree-Ring, RingID, WIND, and Gaussian Shading) across two diffusion models (SDv1.4 and SDv2.0). Our results demonstrate the effectiveness of the attack and expose vulnerabilities in the watermarking methods, motivating future research on improving them.
title	Forging and Removing Latent-Noise Diffusion Watermarks Using a Single Image
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2504.20111

Similar Items