Saved in:
Bibliographic Details
Main Authors: Zhu, Xinjie, Zhao, Zijing, Jin, Hui, Guo, Qingxiao, Ma, Yilong, Wang, Yunhao, Guo, Xiaobing, Zhang, Weifeng
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.02882
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910039224090624
author Zhu, Xinjie
Zhao, Zijing
Jin, Hui
Guo, Qingxiao
Ma, Yilong
Wang, Yunhao
Guo, Xiaobing
Zhang, Weifeng
author_facet Zhu, Xinjie
Zhao, Zijing
Jin, Hui
Guo, Qingxiao
Ma, Yilong
Wang, Yunhao
Guo, Xiaobing
Zhang, Weifeng
contents Artificial Intelligence Generated Content (AIGC), particularly video generation with diffusion models, has been advanced rapidly. Invisible watermarking is a key technology for protecting AI-generated videos and tracing harmful content, and thus plays a crucial role in AI safety. Beyond post-processing watermarks which inevitably degrade video quality, recent studies have proposed distortion-free in-generation watermarking for video diffusion models. However, existing in-generation approaches are non-blind: they require maintaining all the message-key pairs and performing template-based matching during extraction, which incurs prohibitive computational costs at scale. Moreover, when applied to modern video diffusion models with causal 3D Variational Autoencoders (VAEs), their robustness against temporal disturbance becomes extremely weak. To overcome these challenges, we propose SIGMark, a Scalable In-Generation watermarking framework with blind extraction for video diffusion. To achieve blind-extraction, we propose to generate watermarked initial noise using a Global set of Frame-wise PseudoRandom Coding keys (GF-PRC), reducing the cost of storing large-scale information while preserving noise distribution and diversity for distortion-free watermarking. To enhance robustness, we further design a Segment Group-Ordering module (SGO) tailored to causal 3D VAEs, ensuring robust watermark inversion during extraction under temporal disturbance. Comprehensive experiments on modern diffusion models show that SIGMark achieves very high bit-accuracy during extraction under both temporal and spatial disturbances with minimal overhead, demonstrating its scalability and robustness. Our project is available at https://jeremyzhao1998.github.io/SIGMark-release/.
format Preprint
id arxiv_https___arxiv_org_abs_2603_02882
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion
Zhu, Xinjie
Zhao, Zijing
Jin, Hui
Guo, Qingxiao
Ma, Yilong
Wang, Yunhao
Guo, Xiaobing
Zhang, Weifeng
Computer Vision and Pattern Recognition
Artificial Intelligence Generated Content (AIGC), particularly video generation with diffusion models, has been advanced rapidly. Invisible watermarking is a key technology for protecting AI-generated videos and tracing harmful content, and thus plays a crucial role in AI safety. Beyond post-processing watermarks which inevitably degrade video quality, recent studies have proposed distortion-free in-generation watermarking for video diffusion models. However, existing in-generation approaches are non-blind: they require maintaining all the message-key pairs and performing template-based matching during extraction, which incurs prohibitive computational costs at scale. Moreover, when applied to modern video diffusion models with causal 3D Variational Autoencoders (VAEs), their robustness against temporal disturbance becomes extremely weak. To overcome these challenges, we propose SIGMark, a Scalable In-Generation watermarking framework with blind extraction for video diffusion. To achieve blind-extraction, we propose to generate watermarked initial noise using a Global set of Frame-wise PseudoRandom Coding keys (GF-PRC), reducing the cost of storing large-scale information while preserving noise distribution and diversity for distortion-free watermarking. To enhance robustness, we further design a Segment Group-Ordering module (SGO) tailored to causal 3D VAEs, ensuring robust watermark inversion during extraction under temporal disturbance. Comprehensive experiments on modern diffusion models show that SIGMark achieves very high bit-accuracy during extraction under both temporal and spatial disturbances with minimal overhead, demonstrating its scalability and robustness. Our project is available at https://jeremyzhao1998.github.io/SIGMark-release/.
title SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2603.02882