Saved in:
| Main Authors: | Bordes, Florian, Garrido, Quentin, Kao, Justine T, Williams, Adina, Rabbat, Michael, Dupoux, Emmanuel |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.09849 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Intuitive physics understanding emerges from self-supervised pretraining on natural videos
by: Garrido, Quentin, et al.
Published: (2025)
by: Garrido, Quentin, et al.
Published: (2025)
What's in Common? Multimodal Models Hallucinate When Reasoning Across Scenes
by: Ross, Candace, et al.
Published: (2025)
by: Ross, Candace, et al.
Published: (2025)
LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference
by: Yuan, Jianhao, et al.
Published: (2025)
by: Yuan, Jianhao, et al.
Published: (2025)
Learning Latent Action World Models In The Wild
by: Garrido, Quentin, et al.
Published: (2026)
by: Garrido, Quentin, et al.
Published: (2026)
Interpreting Physics in Video World Models
by: Joseph, Sonia, et al.
Published: (2026)
by: Joseph, Sonia, et al.
Published: (2026)
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
by: Zhang, Zixin, et al.
Published: (2025)
by: Zhang, Zixin, et al.
Published: (2025)
A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
by: Krojer, Benno, et al.
Published: (2025)
by: Krojer, Benno, et al.
Published: (2025)
Revisiting Feature Prediction for Learning Visual Representations from Video
by: Bardes, Adrien, et al.
Published: (2024)
by: Bardes, Adrien, et al.
Published: (2024)
Eval Factsheets: A Structured Framework for Documenting AI Evaluations
by: Bordes, Florian, et al.
Published: (2025)
by: Bordes, Florian, et al.
Published: (2025)
PhysDepth: Plug-and-Play Physical Refinement for Monocular Depth Estimation in Challenging Environments
by: Peng, Kebin, et al.
Published: (2024)
by: Peng, Kebin, et al.
Published: (2024)
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding
by: Chow, Wei, et al.
Published: (2025)
by: Chow, Wei, et al.
Published: (2025)
CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models
by: Foss, Aaron, et al.
Published: (2025)
by: Foss, Aaron, et al.
Published: (2025)
PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments
by: Zou, Minghao, et al.
Published: (2025)
by: Zou, Minghao, et al.
Published: (2025)
PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability
by: Zhou, Weijie, et al.
Published: (2025)
by: Zhou, Weijie, et al.
Published: (2025)
PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments
by: Zhou, Weijie, et al.
Published: (2025)
by: Zhou, Weijie, et al.
Published: (2025)
Measuring Déjà vu Memorization Efficiently
by: Kokhlikyan, Narine, et al.
Published: (2025)
by: Kokhlikyan, Narine, et al.
Published: (2025)
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
by: Urbanek, Jack, et al.
Published: (2023)
by: Urbanek, Jack, et al.
Published: (2023)
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
by: Tong, Shengbang, et al.
Published: (2024)
by: Tong, Shengbang, et al.
Published: (2024)
A Lightweight Library for Energy-Based Joint-Embedding Predictive Architectures
by: Terver, Basile, et al.
Published: (2026)
by: Terver, Basile, et al.
Published: (2026)
Object-centric Binding in Contrastive Language-Image Pretraining
by: Assouel, Rim, et al.
Published: (2025)
by: Assouel, Rim, et al.
Published: (2025)
Diffusion-based 3D Hand Motion Recovery with Intuitive Physics
by: Zhang, Yufei, et al.
Published: (2025)
by: Zhang, Yufei, et al.
Published: (2025)
PhysEditBench: A Protocol-Conditioned Benchmark for Dense Physical-Map Prediction with Image Editors
by: Yang, Jiaxin, et al.
Published: (2026)
by: Yang, Jiaxin, et al.
Published: (2026)
Grounding Social Perception in Intuitive Physics
by: Ying, Lance, et al.
Published: (2026)
by: Ying, Lance, et al.
Published: (2026)
Understanding Contrastive Representation Learning from Positive Unlabeled (PU) Data
by: Acharya, Anish, et al.
Published: (2024)
by: Acharya, Anish, et al.
Published: (2024)
Beyond Static Vision: Scene Dynamic Field Unlocks Intuitive Physics Understanding in Multi-modal Large Language Models
by: Li, Nanxi, et al.
Published: (2026)
by: Li, Nanxi, et al.
Published: (2026)
ForestSim: A Synthetic Benchmark for Intelligent Vehicle Perception in Unstructured Forest Environments
by: Wagle, Pragat, et al.
Published: (2026)
by: Wagle, Pragat, et al.
Published: (2026)
PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations
by: Zheng, Yang, et al.
Published: (2024)
by: Zheng, Yang, et al.
Published: (2024)
Evaluation of Conversational Agents: Understanding Culture, Context and Environment in Emotion Detection
by: Teye, Martha Teiko, et al.
Published: (2026)
by: Teye, Martha Teiko, et al.
Published: (2026)
PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation
by: Wang, Chen, et al.
Published: (2025)
by: Wang, Chen, et al.
Published: (2025)
Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class
by: Moayeri, Mazda, et al.
Published: (2024)
by: Moayeri, Mazda, et al.
Published: (2024)
PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
by: Cao, Meng, et al.
Published: (2024)
by: Cao, Meng, et al.
Published: (2024)
Learning to Play Video Games with Intuitive Physics Priors
by: Jaiswal, Abhishek, et al.
Published: (2024)
by: Jaiswal, Abhishek, et al.
Published: (2024)
PhysAnimator: Physics-Guided Generative Cartoon Animation
by: Xie, Tianyi, et al.
Published: (2025)
by: Xie, Tianyi, et al.
Published: (2025)
Feedback-guided Data Synthesis for Imbalanced Classification
by: Hemmat, Reyhane Askari, et al.
Published: (2023)
by: Hemmat, Reyhane Askari, et al.
Published: (2023)
PhysMotion: Physics-Grounded Dynamics From a Single Image
by: Tan, Xiyang, et al.
Published: (2024)
by: Tan, Xiyang, et al.
Published: (2024)
PhysLayer: Language-Guided Layered Animation with Depth-Aware Physics
by: Xie, Tianyidan, et al.
Published: (2026)
by: Xie, Tianyidan, et al.
Published: (2026)
ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video
by: Wang, Boyuan, et al.
Published: (2026)
by: Wang, Boyuan, et al.
Published: (2026)
Opinion: Learning Intuitive Physics May Require More than Visual Data
by: Su, Ellen, et al.
Published: (2025)
by: Su, Ellen, et al.
Published: (2025)
Benchmarking Vision-Based Object Tracking for USVs in Complex Maritime Environments
by: Din, Muhayy Ud, et al.
Published: (2024)
by: Din, Muhayy Ud, et al.
Published: (2024)
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
by: Shen, Xiaoqian, et al.
Published: (2024)
by: Shen, Xiaoqian, et al.
Published: (2024)
Similar Items
-
Intuitive physics understanding emerges from self-supervised pretraining on natural videos
by: Garrido, Quentin, et al.
Published: (2025) -
What's in Common? Multimodal Models Hallucinate When Reasoning Across Scenes
by: Ross, Candace, et al.
Published: (2025) -
LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference
by: Yuan, Jianhao, et al.
Published: (2025) -
Learning Latent Action World Models In The Wild
by: Garrido, Quentin, et al.
Published: (2026) -
Interpreting Physics in Video World Models
by: Joseph, Sonia, et al.
Published: (2026)