Saved in:
Bibliographic Details
Main Authors: Jain, Anubhav, Kobayashi, Yuya, Murata, Naoki, Takida, Yuhta, Shibuya, Takashi, Mitsufuji, Yuki, Cohen, Niv, Memon, Nasir, Togelius, Julian
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.20111
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909596637986816
author Jain, Anubhav
Kobayashi, Yuya
Murata, Naoki
Takida, Yuhta
Shibuya, Takashi
Mitsufuji, Yuki
Cohen, Niv
Memon, Nasir
Togelius, Julian
author_facet Jain, Anubhav
Kobayashi, Yuya
Murata, Naoki
Takida, Yuhta
Shibuya, Takashi
Mitsufuji, Yuki
Cohen, Niv
Memon, Nasir
Togelius, Julian
contents Watermarking techniques are vital for protecting intellectual property and preventing fraudulent use of media. Most previous watermarking schemes designed for diffusion models embed a secret key in the initial noise. The resulting pattern is often considered hard to remove and forge into unrelated images. In this paper, we propose a black-box adversarial attack without presuming access to the diffusion model weights. Our attack uses only a single watermarked example and is based on a simple observation: there is a many-to-one mapping between images and initial noises. There are regions in the clean image latent space pertaining to each watermark that get mapped to the same initial noise when inverted. Based on this intuition, we propose an adversarial attack to forge the watermark by introducing perturbations to the images such that we can enter the region of watermarked images. We show that we can also apply a similar approach for watermark removal by learning perturbations to exit this region. We report results on multiple watermarking schemes (Tree-Ring, RingID, WIND, and Gaussian Shading) across two diffusion models (SDv1.4 and SDv2.0). Our results demonstrate the effectiveness of the attack and expose vulnerabilities in the watermarking methods, motivating future research on improving them.
format Preprint
id arxiv_https___arxiv_org_abs_2504_20111
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Forging and Removing Latent-Noise Diffusion Watermarks Using a Single Image
Jain, Anubhav
Kobayashi, Yuya
Murata, Naoki
Takida, Yuhta
Shibuya, Takashi
Mitsufuji, Yuki
Cohen, Niv
Memon, Nasir
Togelius, Julian
Computer Vision and Pattern Recognition
Watermarking techniques are vital for protecting intellectual property and preventing fraudulent use of media. Most previous watermarking schemes designed for diffusion models embed a secret key in the initial noise. The resulting pattern is often considered hard to remove and forge into unrelated images. In this paper, we propose a black-box adversarial attack without presuming access to the diffusion model weights. Our attack uses only a single watermarked example and is based on a simple observation: there is a many-to-one mapping between images and initial noises. There are regions in the clean image latent space pertaining to each watermark that get mapped to the same initial noise when inverted. Based on this intuition, we propose an adversarial attack to forge the watermark by introducing perturbations to the images such that we can enter the region of watermarked images. We show that we can also apply a similar approach for watermark removal by learning perturbations to exit this region. We report results on multiple watermarking schemes (Tree-Ring, RingID, WIND, and Gaussian Shading) across two diffusion models (SDv1.4 and SDv2.0). Our results demonstrate the effectiveness of the attack and expose vulnerabilities in the watermarking methods, motivating future research on improving them.
title Forging and Removing Latent-Noise Diffusion Watermarks Using a Single Image
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2504.20111