Saved in:
Bibliographic Details
Main Authors: Zhang, Qin, Jing, Peiyu, Yu, Hong-Xing, Ding, Fangqiang, Nie, Fan, Wang, Weimin, Du, Yilun, Zou, James, Wu, Jiajun, Shuai, Bing
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.19607
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918398999396352
author Zhang, Qin
Jing, Peiyu
Yu, Hong-Xing
Ding, Fangqiang
Nie, Fan
Wang, Weimin
Du, Yilun
Zou, James
Wu, Jiajun
Shuai, Bing
author_facet Zhang, Qin
Jing, Peiyu
Yu, Hong-Xing
Ding, Fangqiang
Nie, Fan
Wang, Weimin
Du, Yilun
Zou, James
Wu, Jiajun
Shuai, Bing
contents Video generation models are increasingly used as world simulators for storytelling, simulation, and embodied AI. As these models advance, a key question arises: do generated videos obey the physical laws of the real world? Existing evaluations largely rely on automated metrics or coarse human judgments such as preferences or rubric-based checks. While useful for assessing perceptual quality, these methods provide limited insight into when and why generated dynamics violate real-world physical constraints. We introduce Physion-Eval, a large-scale benchmark of expert human reasoning for diagnosing physical realism failures in videos generated by five state-of-the-art models across egocentric and exocentric views, containing 10,990 expert reasoning traces spanning 22 fine-grained physical categories. Each generated video is derived from a corresponding real-world reference video depicting a clear physical process, and annotated with temporally localized glitches, structured failure categories, and natural-language explanations of the violated physical behavior. Using this dataset, we reveal a striking limitation of current video generation models: in physics-critical scenarios, 83.3% of exocentric and 93.5% of egocentric generated videos exhibit at least one human-identifiable physical glitch. We hope Physion-Eval will set a new standard for physical realism evaluation and guide the development of physics-grounded video generation. The benchmark is publicly available at https://huggingface.co/datasets/PhysionLabs/Physion-Eval.
format Preprint
id arxiv_https___arxiv_org_abs_2603_19607
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Physion-Eval: Evaluating Physical Realism in Generated Video via Human Reasoning
Zhang, Qin
Jing, Peiyu
Yu, Hong-Xing
Ding, Fangqiang
Nie, Fan
Wang, Weimin
Du, Yilun
Zou, James
Wu, Jiajun
Shuai, Bing
Computer Vision and Pattern Recognition
Video generation models are increasingly used as world simulators for storytelling, simulation, and embodied AI. As these models advance, a key question arises: do generated videos obey the physical laws of the real world? Existing evaluations largely rely on automated metrics or coarse human judgments such as preferences or rubric-based checks. While useful for assessing perceptual quality, these methods provide limited insight into when and why generated dynamics violate real-world physical constraints. We introduce Physion-Eval, a large-scale benchmark of expert human reasoning for diagnosing physical realism failures in videos generated by five state-of-the-art models across egocentric and exocentric views, containing 10,990 expert reasoning traces spanning 22 fine-grained physical categories. Each generated video is derived from a corresponding real-world reference video depicting a clear physical process, and annotated with temporally localized glitches, structured failure categories, and natural-language explanations of the violated physical behavior. Using this dataset, we reveal a striking limitation of current video generation models: in physics-critical scenarios, 83.3% of exocentric and 93.5% of egocentric generated videos exhibit at least one human-identifiable physical glitch. We hope Physion-Eval will set a new standard for physical realism evaluation and guide the development of physics-grounded video generation. The benchmark is publicly available at https://huggingface.co/datasets/PhysionLabs/Physion-Eval.
title Physion-Eval: Evaluating Physical Realism in Generated Video via Human Reasoning
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2603.19607