Saved in:
| Main Authors: | Tang, George, Agarwal, Aditya, Han, Weiqiao, Darrell, Trevor, Bai, Yutong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.06469 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Finding Visual Task Vectors
by: Hojel, Alberto, et al.
Published: (2024)
by: Hojel, Alberto, et al.
Published: (2024)
Lifting Embodied World Models for Planning and Control
by: Wang, Alex N., et al.
Published: (2026)
by: Wang, Alex N., et al.
Published: (2026)
REOrdering Patches Improves Vision Models
by: Kutscher, Declan, et al.
Published: (2025)
by: Kutscher, Declan, et al.
Published: (2025)
Readout Guidance: Learning Control from Diffusion Features
by: Luo, Grace, et al.
Published: (2023)
by: Luo, Grace, et al.
Published: (2023)
Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence
by: Luo, Grace, et al.
Published: (2023)
by: Luo, Grace, et al.
Published: (2023)
Fast Image-based Neural Relighting with Translucency-Reflection Modeling
by: Zhu, Shizhan, et al.
Published: (2023)
by: Zhu, Shizhan, et al.
Published: (2023)
Analyzing The Language of Visual Tokens
by: Chan, David M., et al.
Published: (2024)
by: Chan, David M., et al.
Published: (2024)
VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment
by: Cong, Wenyan, et al.
Published: (2025)
by: Cong, Wenyan, et al.
Published: (2025)
ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions
by: Chen, Dubing, et al.
Published: (2024)
by: Chen, Dubing, et al.
Published: (2024)
Tracking-Aware Deformation Field Estimation for Non-rigid 3D Reconstruction in Robotic Surgeries
by: Wang, Zeqing, et al.
Published: (2025)
by: Wang, Zeqing, et al.
Published: (2025)
Whole-Body Conditioned Egocentric Video Prediction
by: Bai, Yutong, et al.
Published: (2025)
by: Bai, Yutong, et al.
Published: (2025)
LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
by: Niu, Dantong, et al.
Published: (2024)
by: Niu, Dantong, et al.
Published: (2024)
CF3: Compact and Fast 3D Feature Fields
by: Lee, Hyunjoon, et al.
Published: (2025)
by: Lee, Hyunjoon, et al.
Published: (2025)
Lifting by Gaussians: A Simple, Fast and Flexible Method for 3D Instance Segmentation
by: Chacko, Rohan, et al.
Published: (2025)
by: Chacko, Rohan, et al.
Published: (2025)
LiftFeat: 3D Geometry-Aware Local Feature Matching
by: Liu, Yepeng, et al.
Published: (2025)
by: Liu, Yepeng, et al.
Published: (2025)
Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D
by: T, Mukund Varma, et al.
Published: (2024)
by: T, Mukund Varma, et al.
Published: (2024)
3D-SSGAN: Lifting 2D Semantics for 3D-Aware Compositional Portrait Synthesis
by: Liu, Ruiqi, et al.
Published: (2024)
by: Liu, Ruiqi, et al.
Published: (2024)
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
by: Lian, Long, et al.
Published: (2023)
by: Lian, Long, et al.
Published: (2023)
Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space
by: Liang, Yingping, et al.
Published: (2025)
by: Liang, Yingping, et al.
Published: (2025)
Efficient 3D Instance Mapping and Localization with Neural Fields
by: Tang, George, et al.
Published: (2024)
by: Tang, George, et al.
Published: (2024)
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
by: Huang, Brandon, et al.
Published: (2024)
by: Huang, Brandon, et al.
Published: (2024)
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
by: Ng, Evonne, et al.
Published: (2024)
by: Ng, Evonne, et al.
Published: (2024)
Vision-Language Models Create Cross-Modal Task Representations
by: Luo, Grace, et al.
Published: (2024)
by: Luo, Grace, et al.
Published: (2024)
Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery
by: Zhang, Yuqi, et al.
Published: (2024)
by: Zhang, Yuqi, et al.
Published: (2024)
FisheyeGaussianLift: BEV Feature Lifting for Surround-View Fisheye Camera Perception
by: Sonarghare, Shubham, et al.
Published: (2025)
by: Sonarghare, Shubham, et al.
Published: (2025)
PoseBench3D: A Cross-Dataset Analysis Framework for 3D Human Pose Estimation via Pose Lifting Networks
by: Manzur, Saad, et al.
Published: (2025)
by: Manzur, Saad, et al.
Published: (2025)
Visual Lexicon: Rich Image Features in Language Space
by: Wang, XuDong, et al.
Published: (2024)
by: Wang, XuDong, et al.
Published: (2024)
Improving 3D Gaussian Splatting Compression by Scene-Adaptive Lattice Vector Quantization
by: Xu, Hao, et al.
Published: (2025)
by: Xu, Hao, et al.
Published: (2025)
RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception
by: Li, Chunliang, et al.
Published: (2024)
by: Li, Chunliang, et al.
Published: (2024)
Segment Anything without Supervision
by: Wang, XuDong, et al.
Published: (2024)
by: Wang, XuDong, et al.
Published: (2024)
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
by: Jia, Yueru, et al.
Published: (2024)
by: Jia, Yueru, et al.
Published: (2024)
LooC: Effective Low-Dimensional Codebook for Compositional Vector Quantization
by: Li, Jie, et al.
Published: (2026)
by: Li, Jie, et al.
Published: (2026)
Hyperbolic Active Learning for Semantic Segmentation under Domain Shift
by: Franco, Luca, et al.
Published: (2023)
by: Franco, Luca, et al.
Published: (2023)
LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors
by: Chen, Yabo, et al.
Published: (2024)
by: Chen, Yabo, et al.
Published: (2024)
Compressing 3D Gaussian Splatting by Noise-Substituted Vector Quantization
by: Wang, Haishan, et al.
Published: (2025)
by: Wang, Haishan, et al.
Published: (2025)
When Do We Not Need Larger Vision Models?
by: Shi, Baifeng, et al.
Published: (2024)
by: Shi, Baifeng, et al.
Published: (2024)
Lift, Splat, Map: Lifting Foundation Masks for Label-Free Semantic Scene Completion
by: Zhang, Arthur, et al.
Published: (2024)
by: Zhang, Arthur, et al.
Published: (2024)
A Reference-Based 3D Semantic-Aware Framework for Accurate Local Facial Attribute Editing
by: Huang, Yu-Kai, et al.
Published: (2024)
by: Huang, Yu-Kai, et al.
Published: (2024)
DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation
by: Yin, Ze-Xin, et al.
Published: (2025)
by: Yin, Ze-Xin, et al.
Published: (2025)
Differentiable Vector Quantization for Rate-Distortion Optimization of Generative Image Compression
by: Jiang, Shiyin, et al.
Published: (2026)
by: Jiang, Shiyin, et al.
Published: (2026)
Similar Items
-
Finding Visual Task Vectors
by: Hojel, Alberto, et al.
Published: (2024) -
Lifting Embodied World Models for Planning and Control
by: Wang, Alex N., et al.
Published: (2026) -
REOrdering Patches Improves Vision Models
by: Kutscher, Declan, et al.
Published: (2025) -
Readout Guidance: Learning Control from Diffusion Features
by: Luo, Grace, et al.
Published: (2023) -
Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence
by: Luo, Grace, et al.
Published: (2023)