Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Dou, Weijia, Zheng, Wenzhao, Chen, Weiliang, Zheng, Yu, Zhou, Jie, Lu, Jiwen
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.19048
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914409360654336
author	Dou, Weijia Zheng, Wenzhao Chen, Weiliang Zheng, Yu Zhou, Jie Lu, Jiwen
author_facet	Dou, Weijia Zheng, Wenzhao Chen, Weiliang Zheng, Yu Zhou, Jie Lu, Jiwen
contents	Recent generative models can produce high-fidelity videos, yet they often exhibit 3D spatial geometric inconsistencies. Existing evaluation methods fail to accurately characterize these inconsistencies: fidelity-centric metrics like FVD are insensitive to geometric distortions, while consistency-focused benchmarks often penalize valid foreground dynamics. To address this gap, we introduce SGC, a metric for evaluating 3D \textbf{S}patial \textbf{G}eometric \textbf{C}onsistency in dynamically generated videos. We quantify geometric consistency by measuring the divergence among multiple camera poses estimated from distinct local regions. Our approach first separates static from dynamic regions, then partitions the static background into spatially coherent sub-regions. We predict depth for each pixel, estimate a local camera pose for each subregion, and compute the divergence among these poses to quantify geometric consistency. Experiments on real and generative videos demonstrate that SGC robustly quantifies geometric inconsistencies, effectively identifying critical failures missed by existing metrics.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_19048
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Measuring 3D Spatial Geometric Consistency in Dynamic Generated Videos Dou, Weijia Zheng, Wenzhao Chen, Weiliang Zheng, Yu Zhou, Jie Lu, Jiwen Computer Vision and Pattern Recognition Recent generative models can produce high-fidelity videos, yet they often exhibit 3D spatial geometric inconsistencies. Existing evaluation methods fail to accurately characterize these inconsistencies: fidelity-centric metrics like FVD are insensitive to geometric distortions, while consistency-focused benchmarks often penalize valid foreground dynamics. To address this gap, we introduce SGC, a metric for evaluating 3D \textbf{S}patial \textbf{G}eometric \textbf{C}onsistency in dynamically generated videos. We quantify geometric consistency by measuring the divergence among multiple camera poses estimated from distinct local regions. Our approach first separates static from dynamic regions, then partitions the static background into spatially coherent sub-regions. We predict depth for each pixel, estimate a local camera pose for each subregion, and compute the divergence among these poses to quantify geometric consistency. Experiments on real and generative videos demonstrate that SGC robustly quantifies geometric inconsistencies, effectively identifying critical failures missed by existing metrics.
title	Measuring 3D Spatial Geometric Consistency in Dynamic Generated Videos
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2603.19048

Similar Items