:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bordes, Florian, Garrido, Quentin, Kao, Justine T, Williams, Adina, Rabbat, Michael, Dupoux, Emmanuel
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.09849
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Intuitive physics understanding emerges from self-supervised pretraining on natural videos
by: Garrido, Quentin, et al.
Published: (2025)

What's in Common? Multimodal Models Hallucinate When Reasoning Across Scenes
by: Ross, Candace, et al.
Published: (2025)

LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference
by: Yuan, Jianhao, et al.
Published: (2025)

Learning Latent Action World Models In The Wild
by: Garrido, Quentin, et al.
Published: (2026)

Interpreting Physics in Video World Models
by: Joseph, Sonia, et al.
Published: (2026)

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
by: Zhang, Zixin, et al.
Published: (2025)

A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
by: Krojer, Benno, et al.
Published: (2025)

Revisiting Feature Prediction for Learning Visual Representations from Video
by: Bardes, Adrien, et al.
Published: (2024)

Eval Factsheets: A Structured Framework for Documenting AI Evaluations
by: Bordes, Florian, et al.
Published: (2025)

PhysDepth: Plug-and-Play Physical Refinement for Monocular Depth Estimation in Challenging Environments
by: Peng, Kebin, et al.
Published: (2024)

PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding
by: Chow, Wei, et al.
Published: (2025)

CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models
by: Foss, Aaron, et al.
Published: (2025)

PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments
by: Zou, Minghao, et al.
Published: (2025)

PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability
by: Zhou, Weijie, et al.
Published: (2025)

PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments
by: Zhou, Weijie, et al.
Published: (2025)

Measuring Déjà vu Memorization Efficiently
by: Kokhlikyan, Narine, et al.
Published: (2025)

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
by: Urbanek, Jack, et al.
Published: (2023)

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
by: Tong, Shengbang, et al.
Published: (2024)

A Lightweight Library for Energy-Based Joint-Embedding Predictive Architectures
by: Terver, Basile, et al.
Published: (2026)

Object-centric Binding in Contrastive Language-Image Pretraining
by: Assouel, Rim, et al.
Published: (2025)

Diffusion-based 3D Hand Motion Recovery with Intuitive Physics
by: Zhang, Yufei, et al.
Published: (2025)

PhysEditBench: A Protocol-Conditioned Benchmark for Dense Physical-Map Prediction with Image Editors
by: Yang, Jiaxin, et al.
Published: (2026)

Grounding Social Perception in Intuitive Physics
by: Ying, Lance, et al.
Published: (2026)

Understanding Contrastive Representation Learning from Positive Unlabeled (PU) Data
by: Acharya, Anish, et al.
Published: (2024)

Beyond Static Vision: Scene Dynamic Field Unlocks Intuitive Physics Understanding in Multi-modal Large Language Models
by: Li, Nanxi, et al.
Published: (2026)

ForestSim: A Synthetic Benchmark for Intelligent Vehicle Perception in Unstructured Forest Environments
by: Wagle, Pragat, et al.
Published: (2026)

PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations
by: Zheng, Yang, et al.
Published: (2024)

Evaluation of Conversational Agents: Understanding Culture, Context and Environment in Emotion Detection
by: Teye, Martha Teiko, et al.
Published: (2026)

PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation
by: Wang, Chen, et al.
Published: (2025)

Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class
by: Moayeri, Mazda, et al.
Published: (2024)

PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
by: Cao, Meng, et al.
Published: (2024)

Learning to Play Video Games with Intuitive Physics Priors
by: Jaiswal, Abhishek, et al.
Published: (2024)

PhysAnimator: Physics-Guided Generative Cartoon Animation
by: Xie, Tianyi, et al.
Published: (2025)

Feedback-guided Data Synthesis for Imbalanced Classification
by: Hemmat, Reyhane Askari, et al.
Published: (2023)

PhysMotion: Physics-Grounded Dynamics From a Single Image
by: Tan, Xiyang, et al.
Published: (2024)

PhysLayer: Language-Guided Layered Animation with Depth-Aware Physics
by: Xie, Tianyidan, et al.
Published: (2026)

ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video
by: Wang, Boyuan, et al.
Published: (2026)

Opinion: Learning Intuitive Physics May Require More than Visual Data
by: Su, Ellen, et al.
Published: (2025)

Benchmarking Vision-Based Object Tracking for USVs in Complex Maritime Environments
by: Din, Muhayy Ud, et al.
Published: (2024)

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
by: Shen, Xiaoqian, et al.
Published: (2024)