Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Qin, Jing, Peiyu, Yu, Hong-Xing, Ding, Fangqiang, Nie, Fan, Wang, Weimin, Du, Yilun, Zou, James, Wu, Jiajun, Shuai, Bing
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.19607
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918398999396352
author	Zhang, Qin Jing, Peiyu Yu, Hong-Xing Ding, Fangqiang Nie, Fan Wang, Weimin Du, Yilun Zou, James Wu, Jiajun Shuai, Bing
author_facet	Zhang, Qin Jing, Peiyu Yu, Hong-Xing Ding, Fangqiang Nie, Fan Wang, Weimin Du, Yilun Zou, James Wu, Jiajun Shuai, Bing
contents	Video generation models are increasingly used as world simulators for storytelling, simulation, and embodied AI. As these models advance, a key question arises: do generated videos obey the physical laws of the real world? Existing evaluations largely rely on automated metrics or coarse human judgments such as preferences or rubric-based checks. While useful for assessing perceptual quality, these methods provide limited insight into when and why generated dynamics violate real-world physical constraints. We introduce Physion-Eval, a large-scale benchmark of expert human reasoning for diagnosing physical realism failures in videos generated by five state-of-the-art models across egocentric and exocentric views, containing 10,990 expert reasoning traces spanning 22 fine-grained physical categories. Each generated video is derived from a corresponding real-world reference video depicting a clear physical process, and annotated with temporally localized glitches, structured failure categories, and natural-language explanations of the violated physical behavior. Using this dataset, we reveal a striking limitation of current video generation models: in physics-critical scenarios, 83.3% of exocentric and 93.5% of egocentric generated videos exhibit at least one human-identifiable physical glitch. We hope Physion-Eval will set a new standard for physical realism evaluation and guide the development of physics-grounded video generation. The benchmark is publicly available at https://huggingface.co/datasets/PhysionLabs/Physion-Eval.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_19607
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Physion-Eval: Evaluating Physical Realism in Generated Video via Human Reasoning Zhang, Qin Jing, Peiyu Yu, Hong-Xing Ding, Fangqiang Nie, Fan Wang, Weimin Du, Yilun Zou, James Wu, Jiajun Shuai, Bing Computer Vision and Pattern Recognition Video generation models are increasingly used as world simulators for storytelling, simulation, and embodied AI. As these models advance, a key question arises: do generated videos obey the physical laws of the real world? Existing evaluations largely rely on automated metrics or coarse human judgments such as preferences or rubric-based checks. While useful for assessing perceptual quality, these methods provide limited insight into when and why generated dynamics violate real-world physical constraints. We introduce Physion-Eval, a large-scale benchmark of expert human reasoning for diagnosing physical realism failures in videos generated by five state-of-the-art models across egocentric and exocentric views, containing 10,990 expert reasoning traces spanning 22 fine-grained physical categories. Each generated video is derived from a corresponding real-world reference video depicting a clear physical process, and annotated with temporally localized glitches, structured failure categories, and natural-language explanations of the violated physical behavior. Using this dataset, we reveal a striking limitation of current video generation models: in physics-critical scenarios, 83.3% of exocentric and 93.5% of egocentric generated videos exhibit at least one human-identifiable physical glitch. We hope Physion-Eval will set a new standard for physical realism evaluation and guide the development of physics-grounded video generation. The benchmark is publicly available at https://huggingface.co/datasets/PhysionLabs/Physion-Eval.
title	Physion-Eval: Evaluating Physical Realism in Generated Video via Human Reasoning
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2603.19607

Similar Items