:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lillemark, Hansen Jin, Huang, Benhao, Zhan, Fangneng, Du, Yilun, Keller, Thomas Anderson
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.01075
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AAPMT: AGI Assessment Through Prompt and Metric Transformer
by: Huang, Benhao
Published: (2024)

Flow Equivariant Recurrent Neural Networks
by: Keller, T. Anderson
Published: (2025)

MixLight: Borrowing the Best of both Spherical Harmonics and Gaussian Models
by: Ji, Xinlong, et al.
Published: (2024)

Abstract 3D Perception for Spatial Intelligence in Vision-Language Models
by: Liu, Yifan, et al.
Published: (2025)

GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation
by: Zhou, Kaichen, et al.
Published: (2026)

AdaWorld: Learning Adaptable World Models with Latent Actions
by: Gao, Shenyuan, et al.
Published: (2025)

AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance
by: Xu, Tianling, et al.
Published: (2025)

Defining and Extracting generalizable interaction primitives from DNNs
by: Chen, Lu, et al.
Published: (2024)

Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models
by: Wang, Runqian, et al.
Published: (2025)

HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments
by: Zhou, Qinhong, et al.
Published: (2024)

Equivariant Reinforcement Learning under Partial Observability
by: Nguyen, Hai, et al.
Published: (2024)

Long-Text-to-Image Generation via Compositional Prompt Decomposition
by: Huang, Jen-Yuan, et al.
Published: (2026)

PPU-Bench:Real World Benchmark for Personalized Partial Unlearning in Vision Language Models
by: Guang, Jiahui, et al.
Published: (2026)

Compositional Generative Modeling: A Single Model is Not All You Need
by: Du, Yilun, et al.
Published: (2024)

Stream3D: Sequential Multi-View 3D Generation via Evidential Memory
by: Zhou, Kaichen, et al.
Published: (2026)

Grounding Video Models to Actions through Goal Conditioned Exploration
by: Luo, Yunhao, et al.
Published: (2024)

Category-level Neural Field for Reconstruction of Partially Observed Objects in Indoor Environment
by: Lee, Taekbeom, et al.
Published: (2024)

Equivariant Flow Matching for Point Cloud Assembly
by: Wang, Ziming, et al.
Published: (2025)

MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
by: Yang, Yuncong, et al.
Published: (2025)

DiffAge3D: Diffusion-based 3D-aware Face Aging
by: Wahid, Junaid, et al.
Published: (2024)

SOGS: Second-Order Anchor for Advanced 3D Gaussian Splatting
by: Zhang, Jiahui, et al.
Published: (2025)

Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
by: Chen, Kaijin, et al.
Published: (2026)

Variational Partial Group Convolutions for Input-Aware Partial Equivariance of Rotations and Color-Shifts
by: Kim, Hyunsu, et al.
Published: (2024)

SPIE: Semantic and Structural Post-Training of Image Editing Diffusion Models with AI feedback
by: Benarous, Elior, et al.
Published: (2025)

Video as the New Language for Real-World Decision Making
by: Yang, Sherry, et al.
Published: (2024)

Toward Stable World Models: Measuring and Addressing World Instability in Generative Environments
by: Kwon, Soonwoo, et al.
Published: (2025)

General Neural Gauge Fields
by: Zhan, Fangneng, et al.
Published: (2023)

seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
by: Ghaemi, Hafez, et al.
Published: (2025)

Large-scale Reinforcement Learning for Diffusion Models
by: Zhang, Yinan, et al.
Published: (2024)

Visual Acoustic Fields
by: Li, Yuelei, et al.
Published: (2025)

Dynamic Orchestration of Multi-Agent System for Real-World Multi-Image Agricultural VQA
by: Ke, Yan, et al.
Published: (2025)

Ctrl-VI: Controllable Video Synthesis via Variational Inference
by: Duan, Haoyi, et al.
Published: (2025)

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation
by: Zhang, Hongxin, et al.
Published: (2024)

Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement Learning
by: Guo, Zirun, et al.
Published: (2025)

Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation
by: Gao, Qiyue, et al.
Published: (2025)

DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields
by: Chi, Yu, et al.
Published: (2023)

FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization
by: Zhang, Jiahui, et al.
Published: (2024)

MuSASplat: Efficient Sparse-View 3D Gaussian Splats via Lightweight Multi-Scale Adaptation
by: Xu, Muyu, et al.
Published: (2025)

MIND: Benchmarking Memory Consistency and Action Control in World Models
by: Ye, Yixuan, et al.
Published: (2026)

3D-VLA: A 3D Vision-Language-Action Generative World Model
by: Zhen, Haoyu, et al.
Published: (2024)