Saved in:
| Main Author: | Deng, Hokin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.05969 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
by: Yang, Cheng, et al.
Published: (2025)
by: Yang, Cheng, et al.
Published: (2025)
Large Vision Models Can Solve Mental Rotation Problems
by: Mason, Sebastian Ray, et al.
Published: (2025)
by: Mason, Sebastian Ray, et al.
Published: (2025)
Egocentric Bias in Vision-Language Models
by: Wang, Maijunxian, et al.
Published: (2026)
by: Wang, Maijunxian, et al.
Published: (2026)
Demystifying Video Reasoning
by: Wang, Ruisi, et al.
Published: (2026)
by: Wang, Ruisi, et al.
Published: (2026)
What Makes a Maze Look Like a Maze?
by: Hsu, Joy, et al.
Published: (2024)
by: Hsu, Joy, et al.
Published: (2024)
Core Knowledge Deficits in Multi-Modal Language Models
by: Li, Yijiang, et al.
Published: (2024)
by: Li, Yijiang, et al.
Published: (2024)
Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model
by: Zhao, Min, et al.
Published: (2024)
by: Zhao, Min, et al.
Published: (2024)
Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven's Progressive Matrices
by: Małkiński, Mikołaj, et al.
Published: (2022)
by: Małkiński, Mikołaj, et al.
Published: (2022)
Video Models Reason Early: Exploiting Plan Commitment for Maze Solving
by: Newman, Kaleb, et al.
Published: (2026)
by: Newman, Kaleb, et al.
Published: (2026)
Probing Perceptual Constancy in Large Vision-Language Models
by: Sun, Haoran, et al.
Published: (2025)
by: Sun, Haoran, et al.
Published: (2025)
An Ordinary Differential Equation Sampler with Stochastic Start for Diffusion Bridge Models
by: Wang, Yuang, et al.
Published: (2024)
by: Wang, Yuang, et al.
Published: (2024)
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs
by: Li, Hong, et al.
Published: (2024)
by: Li, Hong, et al.
Published: (2024)
Solving Video Inverse Problems Using Image Diffusion Models
by: Kwon, Taesung, et al.
Published: (2024)
by: Kwon, Taesung, et al.
Published: (2024)
VideoPDE: Unified Generative PDE Solving via Video Inpainting Diffusion Models
by: Li, Edward, et al.
Published: (2025)
by: Li, Edward, et al.
Published: (2025)
Any-to-Bokeh: Arbitrary-Subject Video Refocusing with Video Diffusion Model
by: Yang, Yang, et al.
Published: (2025)
by: Yang, Yang, et al.
Published: (2025)
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
by: Deng, Andong, et al.
Published: (2025)
by: Deng, Andong, et al.
Published: (2025)
Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models
by: Daras, Giannis, et al.
Published: (2024)
by: Daras, Giannis, et al.
Published: (2024)
Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos
by: Fei, Jiajun, et al.
Published: (2024)
by: Fei, Jiajun, et al.
Published: (2024)
PhyGround: Benchmarking Physical Reasoning in Generative World Models
by: Lin, Juyi, et al.
Published: (2026)
by: Lin, Juyi, et al.
Published: (2026)
Bias Detection and Rotation-Robustness Mitigation in Vision-Language Models and Generative Image Models
by: Mithila, Tarannum
Published: (2026)
by: Mithila, Tarannum
Published: (2026)
From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning
by: Luo, Ruilin, et al.
Published: (2026)
by: Luo, Ruilin, et al.
Published: (2026)
Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference
by: Kang, Beomseok, et al.
Published: (2026)
by: Kang, Beomseok, et al.
Published: (2026)
Semi-Supervised Coupled Thin-Plate Spline Model for Rotation Correction and Beyond
by: Nie, Lang, et al.
Published: (2024)
by: Nie, Lang, et al.
Published: (2024)
Towards Robust Probabilistic Modeling on SO(3) via Rotation Laplace Distribution
by: Yin, Yingda, et al.
Published: (2023)
by: Yin, Yingda, et al.
Published: (2023)
Rethinking Video Generation Model for the Embodied World
by: Deng, Yufan, et al.
Published: (2026)
by: Deng, Yufan, et al.
Published: (2026)
Beyond Cropping and Rotation: Automated Evolution of Powerful Task-Specific Augmentations with Generative Models
by: Goldfeder, Judah, et al.
Published: (2026)
by: Goldfeder, Judah, et al.
Published: (2026)
Video Occupancy Models
by: Tomar, Manan, et al.
Published: (2024)
by: Tomar, Manan, et al.
Published: (2024)
Learning Image Priors through Patch-based Diffusion Models for Solving Inverse Problems
by: Hu, Jason, et al.
Published: (2024)
by: Hu, Jason, et al.
Published: (2024)
GVD: Guiding Video Diffusion Model for Scalable Video Distillation
by: Li, Kunyang, et al.
Published: (2025)
by: Li, Kunyang, et al.
Published: (2025)
DeVAn: Dense Video Annotation for Video-Language Models
by: Liu, Tingkai, et al.
Published: (2023)
by: Liu, Tingkai, et al.
Published: (2023)
An Organism Starts with a Single Pix-Cell: A Neural Cellular Diffusion for High-Resolution Image Synthesis
by: Elbatel, Marawan, et al.
Published: (2024)
by: Elbatel, Marawan, et al.
Published: (2024)
PuzzleBench: A Fully Dynamic Evaluation Framework for Large Multimodal Models on Puzzle Solving
by: Zhang, Zeyu, et al.
Published: (2025)
by: Zhang, Zeyu, et al.
Published: (2025)
VG4D: Vision-Language Model Goes 4D Video Recognition
by: Deng, Zhichao, et al.
Published: (2024)
by: Deng, Zhichao, et al.
Published: (2024)
Relaxed Rotational Equivariance via $G$-Biases in Vision
by: Wu, Zhiqiang, et al.
Published: (2024)
by: Wu, Zhiqiang, et al.
Published: (2024)
MindCube: Spatial Mental Modeling from Limited Views
by: Wang, Qineng, et al.
Published: (2025)
by: Wang, Qineng, et al.
Published: (2025)
Can Vision-Language Models Solve Visual Math Equations?
by: Choudhury, Monjoy Narayan, et al.
Published: (2025)
by: Choudhury, Monjoy Narayan, et al.
Published: (2025)
Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video
by: Zhu, Xiangming, et al.
Published: (2024)
by: Zhu, Xiangming, et al.
Published: (2024)
Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space
by: Zhu, Jian, et al.
Published: (2025)
by: Zhu, Jian, et al.
Published: (2025)
ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices
by: Yu, Hao, et al.
Published: (2025)
by: Yu, Hao, et al.
Published: (2025)
VideoRewardBench: Comprehensive Evaluation of Multimodal Reward Models for Video Understanding
by: Zhang, Zhihong, et al.
Published: (2025)
by: Zhang, Zhihong, et al.
Published: (2025)
Similar Items
-
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
by: Yang, Cheng, et al.
Published: (2025) -
Large Vision Models Can Solve Mental Rotation Problems
by: Mason, Sebastian Ray, et al.
Published: (2025) -
Egocentric Bias in Vision-Language Models
by: Wang, Maijunxian, et al.
Published: (2026) -
Demystifying Video Reasoning
by: Wang, Ruisi, et al.
Published: (2026) -
What Makes a Maze Look Like a Maze?
by: Hsu, Joy, et al.
Published: (2024)