Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhu, Xinjie, Zhao, Zijing, Jin, Hui, Guo, Qingxiao, Ma, Yilong, Wang, Yunhao, Guo, Xiaobing, Zhang, Weifeng
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.02882
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910039224090624
author	Zhu, Xinjie Zhao, Zijing Jin, Hui Guo, Qingxiao Ma, Yilong Wang, Yunhao Guo, Xiaobing Zhang, Weifeng
author_facet	Zhu, Xinjie Zhao, Zijing Jin, Hui Guo, Qingxiao Ma, Yilong Wang, Yunhao Guo, Xiaobing Zhang, Weifeng
contents	Artificial Intelligence Generated Content (AIGC), particularly video generation with diffusion models, has been advanced rapidly. Invisible watermarking is a key technology for protecting AI-generated videos and tracing harmful content, and thus plays a crucial role in AI safety. Beyond post-processing watermarks which inevitably degrade video quality, recent studies have proposed distortion-free in-generation watermarking for video diffusion models. However, existing in-generation approaches are non-blind: they require maintaining all the message-key pairs and performing template-based matching during extraction, which incurs prohibitive computational costs at scale. Moreover, when applied to modern video diffusion models with causal 3D Variational Autoencoders (VAEs), their robustness against temporal disturbance becomes extremely weak. To overcome these challenges, we propose SIGMark, a Scalable In-Generation watermarking framework with blind extraction for video diffusion. To achieve blind-extraction, we propose to generate watermarked initial noise using a Global set of Frame-wise PseudoRandom Coding keys (GF-PRC), reducing the cost of storing large-scale information while preserving noise distribution and diversity for distortion-free watermarking. To enhance robustness, we further design a Segment Group-Ordering module (SGO) tailored to causal 3D VAEs, ensuring robust watermark inversion during extraction under temporal disturbance. Comprehensive experiments on modern diffusion models show that SIGMark achieves very high bit-accuracy during extraction under both temporal and spatial disturbances with minimal overhead, demonstrating its scalability and robustness. Our project is available at https://jeremyzhao1998.github.io/SIGMark-release/.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_02882
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion Zhu, Xinjie Zhao, Zijing Jin, Hui Guo, Qingxiao Ma, Yilong Wang, Yunhao Guo, Xiaobing Zhang, Weifeng Computer Vision and Pattern Recognition Artificial Intelligence Generated Content (AIGC), particularly video generation with diffusion models, has been advanced rapidly. Invisible watermarking is a key technology for protecting AI-generated videos and tracing harmful content, and thus plays a crucial role in AI safety. Beyond post-processing watermarks which inevitably degrade video quality, recent studies have proposed distortion-free in-generation watermarking for video diffusion models. However, existing in-generation approaches are non-blind: they require maintaining all the message-key pairs and performing template-based matching during extraction, which incurs prohibitive computational costs at scale. Moreover, when applied to modern video diffusion models with causal 3D Variational Autoencoders (VAEs), their robustness against temporal disturbance becomes extremely weak. To overcome these challenges, we propose SIGMark, a Scalable In-Generation watermarking framework with blind extraction for video diffusion. To achieve blind-extraction, we propose to generate watermarked initial noise using a Global set of Frame-wise PseudoRandom Coding keys (GF-PRC), reducing the cost of storing large-scale information while preserving noise distribution and diversity for distortion-free watermarking. To enhance robustness, we further design a Segment Group-Ordering module (SGO) tailored to causal 3D VAEs, ensuring robust watermark inversion during extraction under temporal disturbance. Comprehensive experiments on modern diffusion models show that SIGMark achieves very high bit-accuracy during extraction under both temporal and spatial disturbances with minimal overhead, demonstrating its scalability and robustness. Our project is available at https://jeremyzhao1998.github.io/SIGMark-release/.
title	SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2603.02882

Similar Items