Saved in:
Bibliographic Details
Main Authors: Dou, Weijia, Zheng, Wenzhao, Chen, Weiliang, Zheng, Yu, Zhou, Jie, Lu, Jiwen
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.19048
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914409360654336
author Dou, Weijia
Zheng, Wenzhao
Chen, Weiliang
Zheng, Yu
Zhou, Jie
Lu, Jiwen
author_facet Dou, Weijia
Zheng, Wenzhao
Chen, Weiliang
Zheng, Yu
Zhou, Jie
Lu, Jiwen
contents Recent generative models can produce high-fidelity videos, yet they often exhibit 3D spatial geometric inconsistencies. Existing evaluation methods fail to accurately characterize these inconsistencies: fidelity-centric metrics like FVD are insensitive to geometric distortions, while consistency-focused benchmarks often penalize valid foreground dynamics. To address this gap, we introduce SGC, a metric for evaluating 3D \textbf{S}patial \textbf{G}eometric \textbf{C}onsistency in dynamically generated videos. We quantify geometric consistency by measuring the divergence among multiple camera poses estimated from distinct local regions. Our approach first separates static from dynamic regions, then partitions the static background into spatially coherent sub-regions. We predict depth for each pixel, estimate a local camera pose for each subregion, and compute the divergence among these poses to quantify geometric consistency. Experiments on real and generative videos demonstrate that SGC robustly quantifies geometric inconsistencies, effectively identifying critical failures missed by existing metrics.
format Preprint
id arxiv_https___arxiv_org_abs_2603_19048
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Measuring 3D Spatial Geometric Consistency in Dynamic Generated Videos
Dou, Weijia
Zheng, Wenzhao
Chen, Weiliang
Zheng, Yu
Zhou, Jie
Lu, Jiwen
Computer Vision and Pattern Recognition
Recent generative models can produce high-fidelity videos, yet they often exhibit 3D spatial geometric inconsistencies. Existing evaluation methods fail to accurately characterize these inconsistencies: fidelity-centric metrics like FVD are insensitive to geometric distortions, while consistency-focused benchmarks often penalize valid foreground dynamics. To address this gap, we introduce SGC, a metric for evaluating 3D \textbf{S}patial \textbf{G}eometric \textbf{C}onsistency in dynamically generated videos. We quantify geometric consistency by measuring the divergence among multiple camera poses estimated from distinct local regions. Our approach first separates static from dynamic regions, then partitions the static background into spatially coherent sub-regions. We predict depth for each pixel, estimate a local camera pose for each subregion, and compute the divergence among these poses to quantify geometric consistency. Experiments on real and generative videos demonstrate that SGC robustly quantifies geometric inconsistencies, effectively identifying critical failures missed by existing metrics.
title Measuring 3D Spatial Geometric Consistency in Dynamic Generated Videos
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2603.19048