Saved in:
| Main Authors: | Harley, Adam W., You, Yang, Sun, Xinglong, Zheng, Yang, Raghuraman, Nikhil, Gu, Yunqi, Liang, Sheldon, Chu, Wen-Hsuan, Dave, Achal, Tokmakov, Pavel, You, Suya, Ambrus, Rares, Fragkiadaki, Katerina, Guibas, Leonidas J. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.07310 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models
by: Chu, Wen-Hsuan, et al.
Published: (2023)
by: Chu, Wen-Hsuan, et al.
Published: (2023)
GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion
by: Guizilini, Vitor, et al.
Published: (2024)
by: Guizilini, Vitor, et al.
Published: (2024)
Support-Set Context Matters for Bongard Problems
by: Raghuraman, Nikhil, et al.
Published: (2023)
by: Raghuraman, Nikhil, et al.
Published: (2023)
Understanding Video Transformers via Universal Concept Discovery
by: Kowal, Matthew, et al.
Published: (2024)
by: Kowal, Matthew, et al.
Published: (2024)
Understanding Complexity in VideoQA via Visual Program Generation
by: Eyzaguirre, Cristobal, et al.
Published: (2025)
by: Eyzaguirre, Cristobal, et al.
Published: (2025)
Espresso: High Compression For Rich Extraction From Videos for Your Vision-Language Model
by: Yu, Keunwoo Peter, et al.
Published: (2024)
by: Yu, Keunwoo Peter, et al.
Published: (2024)
Animal Pose Labeling Using General-Purpose Point Trackers
by: Pan, Zhuoyang, et al.
Published: (2025)
by: Pan, Zhuoyang, et al.
Published: (2025)
Refining Pre-Trained Motion Models
by: Sun, Xinglong, et al.
Published: (2024)
by: Sun, Xinglong, et al.
Published: (2024)
Generative 4D Scene Gaussian Splatting with Object View-Synthesis Priors
by: Chu, Wen-Hsuan, et al.
Published: (2025)
by: Chu, Wen-Hsuan, et al.
Published: (2025)
Video Generators are Robot Policies
by: Liang, Junbang, et al.
Published: (2025)
by: Liang, Junbang, et al.
Published: (2025)
OCH3R: Object-Centric Holistic 3D Reconstruction
by: Du, Yi, et al.
Published: (2026)
by: Du, Yi, et al.
Published: (2026)
PhysMem: Scaling Test-Time Memory for Embodied Physical Reasoning
by: Li, Haoyang, et al.
Published: (2026)
by: Li, Haoyang, et al.
Published: (2026)
Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization
by: You, Yang, et al.
Published: (2024)
by: You, Yang, et al.
Published: (2024)
pix2gestalt: Amodal Segmentation by Synthesizing Wholes
by: Ozguroglu, Ege, et al.
Published: (2024)
by: Ozguroglu, Ege, et al.
Published: (2024)
BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing
by: Gu, Yunqi, et al.
Published: (2025)
by: Gu, Yunqi, et al.
Published: (2025)
Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning
by: You, Yang, et al.
Published: (2024)
by: You, Yang, et al.
Published: (2024)
DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos
by: Chu, Wen-Hsuan, et al.
Published: (2024)
by: Chu, Wen-Hsuan, et al.
Published: (2024)
TAPIP3D: Tracking Any Point in Persistent 3D Geometry
by: Zhang, Bowei, et al.
Published: (2025)
by: Zhang, Bowei, et al.
Published: (2025)
Dreamitate: Real-World Visuomotor Policy Learning via Video Generation
by: Liang, Junbang, et al.
Published: (2024)
by: Liang, Junbang, et al.
Published: (2024)
ProvNeRF: Modeling per Point Provenance in NeRFs as a Stochastic Field
by: Nakayama, Kiyohiro, et al.
Published: (2024)
by: Nakayama, Kiyohiro, et al.
Published: (2024)
MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds
by: Lei, Jiahui, et al.
Published: (2024)
by: Lei, Jiahui, et al.
Published: (2024)
View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields
by: He, Haodi, et al.
Published: (2024)
by: He, Haodi, et al.
Published: (2024)
Zero-Shot Image Feature Consensus with Deep Functional Maps
by: Cheng, Xinle, et al.
Published: (2024)
by: Cheng, Xinle, et al.
Published: (2024)
PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments
by: You, Yang, et al.
Published: (2023)
by: You, Yang, et al.
Published: (2023)
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
by: Huang, Ian, et al.
Published: (2024)
by: Huang, Ian, et al.
Published: (2024)
DOT-Sim: Differentiable Optical Tactile Simulation with Precise Real-to-Sim Physical Calibration
by: You, Yang, et al.
Published: (2026)
by: You, Yang, et al.
Published: (2026)
LookOut: Real-World Humanoid Egocentric Navigation
by: Pan, Boxiao, et al.
Published: (2025)
by: Pan, Boxiao, et al.
Published: (2025)
Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis
by: Van Hoorick, Basile, et al.
Published: (2024)
by: Van Hoorick, Basile, et al.
Published: (2024)
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
by: Zhou, Shijie, et al.
Published: (2025)
by: Zhou, Shijie, et al.
Published: (2025)
Make a Donut: Hierarchical EMD-Space Planning for Zero-Shot Deformable Manipulation with Tools
by: You, Yang, et al.
Published: (2023)
by: You, Yang, et al.
Published: (2023)
SparseDFF: Sparse-View Feature Distillation for One-Shot Dexterous Manipulation
by: Wang, Qianxu, et al.
Published: (2023)
by: Wang, Qianxu, et al.
Published: (2023)
Rodrigues Network for Learning Robot Actions
by: Zhang, Jialiang, et al.
Published: (2025)
by: Zhang, Jialiang, et al.
Published: (2025)
InfoGaussian: Structure-Aware Dynamic Gaussians through Lightweight Information Shaping
by: Zhang, Yunchao, et al.
Published: (2024)
by: Zhang, Yunchao, et al.
Published: (2024)
SceneTeract: Agentic Functional Affordances and VLM Grounding in 3D Scenes
by: Maillard, Léopold, et al.
Published: (2026)
by: Maillard, Léopold, et al.
Published: (2026)
ReFiNe: Recursive Field Networks for Cross-modal Multi-scene Representation
by: Zakharov, Sergey, et al.
Published: (2024)
by: Zakharov, Sergey, et al.
Published: (2024)
Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos
by: Stearns, Colton, et al.
Published: (2024)
by: Stearns, Colton, et al.
Published: (2024)
PASTA: Controllable Part-Aware Shape Generation with Autoregressive Transformers
by: Li, Songlin, et al.
Published: (2024)
by: Li, Songlin, et al.
Published: (2024)
3D Diffuser Actor: Policy Diffusion with 3D Scene Representations
by: Ke, Tsung-Wei, et al.
Published: (2024)
by: Ke, Tsung-Wei, et al.
Published: (2024)
ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images
by: Zhang, Xiaoshuai, et al.
Published: (2024)
by: Zhang, Xiaoshuai, et al.
Published: (2024)
BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling
by: Ruschel, Raphael, et al.
Published: (2023)
by: Ruschel, Raphael, et al.
Published: (2023)
Similar Items
-
Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models
by: Chu, Wen-Hsuan, et al.
Published: (2023) -
GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion
by: Guizilini, Vitor, et al.
Published: (2024) -
Support-Set Context Matters for Bongard Problems
by: Raghuraman, Nikhil, et al.
Published: (2023) -
Understanding Video Transformers via Universal Concept Discovery
by: Kowal, Matthew, et al.
Published: (2024) -
Understanding Complexity in VideoQA via Visual Program Generation
by: Eyzaguirre, Cristobal, et al.
Published: (2025)