:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Harley, Adam W., You, Yang, Sun, Xinglong, Zheng, Yang, Raghuraman, Nikhil, Gu, Yunqi, Liang, Sheldon, Chu, Wen-Hsuan, Dave, Achal, Tokmakov, Pavel, You, Suya, Ambrus, Rares, Fragkiadaki, Katerina, Guibas, Leonidas J.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.07310
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models
by: Chu, Wen-Hsuan, et al.
Published: (2023)

GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion
by: Guizilini, Vitor, et al.
Published: (2024)

Support-Set Context Matters for Bongard Problems
by: Raghuraman, Nikhil, et al.
Published: (2023)

Understanding Video Transformers via Universal Concept Discovery
by: Kowal, Matthew, et al.
Published: (2024)

Understanding Complexity in VideoQA via Visual Program Generation
by: Eyzaguirre, Cristobal, et al.
Published: (2025)

Espresso: High Compression For Rich Extraction From Videos for Your Vision-Language Model
by: Yu, Keunwoo Peter, et al.
Published: (2024)

Animal Pose Labeling Using General-Purpose Point Trackers
by: Pan, Zhuoyang, et al.
Published: (2025)

Refining Pre-Trained Motion Models
by: Sun, Xinglong, et al.
Published: (2024)

Generative 4D Scene Gaussian Splatting with Object View-Synthesis Priors
by: Chu, Wen-Hsuan, et al.
Published: (2025)

Video Generators are Robot Policies
by: Liang, Junbang, et al.
Published: (2025)

OCH3R: Object-Centric Holistic 3D Reconstruction
by: Du, Yi, et al.
Published: (2026)

PhysMem: Scaling Test-Time Memory for Embodied Physical Reasoning
by: Li, Haoyang, et al.
Published: (2026)

Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization
by: You, Yang, et al.
Published: (2024)

pix2gestalt: Amodal Segmentation by Synthesizing Wholes
by: Ozguroglu, Ege, et al.
Published: (2024)

BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing
by: Gu, Yunqi, et al.
Published: (2025)

Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning
by: You, Yang, et al.
Published: (2024)

DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos
by: Chu, Wen-Hsuan, et al.
Published: (2024)

TAPIP3D: Tracking Any Point in Persistent 3D Geometry
by: Zhang, Bowei, et al.
Published: (2025)

Dreamitate: Real-World Visuomotor Policy Learning via Video Generation
by: Liang, Junbang, et al.
Published: (2024)

ProvNeRF: Modeling per Point Provenance in NeRFs as a Stochastic Field
by: Nakayama, Kiyohiro, et al.
Published: (2024)

MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds
by: Lei, Jiahui, et al.
Published: (2024)

View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields
by: He, Haodi, et al.
Published: (2024)

Zero-Shot Image Feature Consensus with Deep Functional Maps
by: Cheng, Xinle, et al.
Published: (2024)

PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments
by: You, Yang, et al.
Published: (2023)

BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
by: Huang, Ian, et al.
Published: (2024)

DOT-Sim: Differentiable Optical Tactile Simulation with Precise Real-to-Sim Physical Calibration
by: You, Yang, et al.
Published: (2026)

LookOut: Real-World Humanoid Egocentric Navigation
by: Pan, Boxiao, et al.
Published: (2025)

Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis
by: Van Hoorick, Basile, et al.
Published: (2024)

Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
by: Zhou, Shijie, et al.
Published: (2025)

Make a Donut: Hierarchical EMD-Space Planning for Zero-Shot Deformable Manipulation with Tools
by: You, Yang, et al.
Published: (2023)

SparseDFF: Sparse-View Feature Distillation for One-Shot Dexterous Manipulation
by: Wang, Qianxu, et al.
Published: (2023)

Rodrigues Network for Learning Robot Actions
by: Zhang, Jialiang, et al.
Published: (2025)

InfoGaussian: Structure-Aware Dynamic Gaussians through Lightweight Information Shaping
by: Zhang, Yunchao, et al.
Published: (2024)

SceneTeract: Agentic Functional Affordances and VLM Grounding in 3D Scenes
by: Maillard, Léopold, et al.
Published: (2026)

ReFiNe: Recursive Field Networks for Cross-modal Multi-scene Representation
by: Zakharov, Sergey, et al.
Published: (2024)

Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos
by: Stearns, Colton, et al.
Published: (2024)

PASTA: Controllable Part-Aware Shape Generation with Autoregressive Transformers
by: Li, Songlin, et al.
Published: (2024)

3D Diffuser Actor: Policy Diffusion with 3D Scene Representations
by: Ke, Tsung-Wei, et al.
Published: (2024)

ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images
by: Zhang, Xiaoshuai, et al.
Published: (2024)

BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling
by: Ruschel, Raphael, et al.
Published: (2023)