Saved in:
| Main Authors: | Li, Yuelei, Kim, Hyunjin, Zhan, Fangneng, Qiu, Ri-Zhao, Ji, Mazeyu, Shan, Xiaojun, Zou, Xueyan, Liang, Paul, Pfister, Hanspeter, Wang, Xiaolong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.24270 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GraspSplats: Efficient Manipulation with 3D Feature Splatting
by: Ji, Mazeyu, et al.
Published: (2024)
by: Ji, Mazeyu, et al.
Published: (2024)
AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance
by: Xu, Tianling, et al.
Published: (2025)
by: Xu, Tianling, et al.
Published: (2025)
GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation
by: Jiang, Guangqi, et al.
Published: (2025)
by: Jiang, Guangqi, et al.
Published: (2025)
Abstract 3D Perception for Spatial Intelligence in Vision-Language Models
by: Liu, Yifan, et al.
Published: (2025)
by: Liu, Yifan, et al.
Published: (2025)
RoboTAG: End-to-end Robot Configuration Estimation via Topological Alignment Graph
by: Liu, Yifan, et al.
Published: (2025)
by: Liu, Yifan, et al.
Published: (2025)
GeCo: Evaluating Geometric Consistency for Video Generation via Motion and Structure
by: Gu, Leslie, et al.
Published: (2025)
by: Gu, Leslie, et al.
Published: (2025)
WildLMa: Long Horizon Loco-Manipulation in the Wild
by: Qiu, Ri-Zhao, et al.
Published: (2024)
by: Qiu, Ri-Zhao, et al.
Published: (2024)
M3: 3D-Spatial MultiModal Memory
by: Zou, Xueyan, et al.
Published: (2025)
by: Zou, Xueyan, et al.
Published: (2025)
Advances in Feed-Forward 3D Reconstruction and View Synthesis: A Survey
by: Zhang, Jiahui, et al.
Published: (2025)
by: Zhang, Jiahui, et al.
Published: (2025)
General Neural Gauge Fields
by: Zhan, Fangneng, et al.
Published: (2023)
by: Zhan, Fangneng, et al.
Published: (2023)
Lite2Relight: 3D-aware Single Image Portrait Relighting
by: Rao, Pramod, et al.
Published: (2024)
by: Rao, Pramod, et al.
Published: (2024)
CTRL-GS: Cascaded Temporal Residue Learning for 4D Gaussian Splatting
by: Hou, Karly, et al.
Published: (2025)
by: Hou, Karly, et al.
Published: (2025)
Gaussian-Augmented Physics Simulation and System Identification with Complex Colliders
by: Vasile, Federico, et al.
Published: (2025)
by: Vasile, Federico, et al.
Published: (2025)
Visual Whole-Body Control for Legged Loco-Manipulation
by: Liu, Minghuan, et al.
Published: (2024)
by: Liu, Minghuan, et al.
Published: (2024)
DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields
by: Chi, Yu, et al.
Published: (2023)
by: Chi, Yu, et al.
Published: (2023)
Stream3D: Sequential Multi-View 3D Generation via Evidential Memory
by: Zhou, Kaichen, et al.
Published: (2026)
by: Zhou, Kaichen, et al.
Published: (2026)
LoRA-TTT: Low-Rank Test-Time Training for Vision-Language Models
by: Kojima, Yuto, et al.
Published: (2025)
by: Kojima, Yuto, et al.
Published: (2025)
When Visuals Aren't the Problem: Evaluating Vision-Language Models on Misleading Data Visualizations
by: Lalai, Harsh Nishant, et al.
Published: (2026)
by: Lalai, Harsh Nishant, et al.
Published: (2026)
Joint-Task Regularization for Partially Labeled Multi-Task Learning
by: Nishi, Kento, et al.
Published: (2024)
by: Nishi, Kento, et al.
Published: (2024)
3DPR: Single Image 3D Portrait Relight using Generative Priors
by: Rao, Pramod, et al.
Published: (2025)
by: Rao, Pramod, et al.
Published: (2025)
MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality
by: Shi, Zhiyi, et al.
Published: (2024)
by: Shi, Zhiyi, et al.
Published: (2024)
DiffAge3D: Diffusion-based 3D-aware Face Aging
by: Wahid, Junaid, et al.
Published: (2024)
by: Wahid, Junaid, et al.
Published: (2024)
SOGS: Second-Order Anchor for Advanced 3D Gaussian Splatting
by: Zhang, Jiahui, et al.
Published: (2025)
by: Zhang, Jiahui, et al.
Published: (2025)
LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images
by: Liu, Yilong, et al.
Published: (2026)
by: Liu, Yilong, et al.
Published: (2026)
RiGS: Rigid-aware 4D Gaussian Splatting from a Single Monocular Video
by: Wu, Chenyu, et al.
Published: (2026)
by: Wu, Chenyu, et al.
Published: (2026)
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
by: He, Jixuan, et al.
Published: (2024)
by: He, Jixuan, et al.
Published: (2024)
Understanding Graphical Perception in Data Visualization through Zero-shot Prompting of Vision-Language Models
by: Guo, Grace, et al.
Published: (2024)
by: Guo, Grace, et al.
Published: (2024)
Visual Instruction-Finetuned Language Model for Versatile Brain MR Image Tasks
by: Kim, Jonghun, et al.
Published: (2026)
by: Kim, Jonghun, et al.
Published: (2026)
Integrating LMM Planners and 3D Skill Policies for Generalizable Manipulation
by: Li, Yuelei, et al.
Published: (2025)
by: Li, Yuelei, et al.
Published: (2025)
LangSplat: 3D Language Gaussian Splatting
by: Qin, Minghan, et al.
Published: (2023)
by: Qin, Minghan, et al.
Published: (2023)
Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing
by: Qiu, Ri-Zhao, et al.
Published: (2024)
by: Qiu, Ri-Zhao, et al.
Published: (2024)
Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment
by: Senocak, Arda, et al.
Published: (2024)
by: Senocak, Arda, et al.
Published: (2024)
Towards 1000-fold Electron Microscopy Image Compression for Connectomics via VQ-VAE with Transformer Prior
by: Yang, Fuming, et al.
Published: (2025)
by: Yang, Fuming, et al.
Published: (2025)
MixLight: Borrowing the Best of both Spherical Harmonics and Gaussian Models
by: Ji, Xinlong, et al.
Published: (2024)
by: Ji, Xinlong, et al.
Published: (2024)
DualEdit: Dual Editing for Knowledge Updating in Vision-Language Models
by: Shi, Zhiyi, et al.
Published: (2025)
by: Shi, Zhiyi, et al.
Published: (2025)
FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization
by: Zhang, Jiahui, et al.
Published: (2024)
by: Zhang, Jiahui, et al.
Published: (2024)
MuSASplat: Efficient Sparse-View 3D Gaussian Splats via Lightweight Multi-Scale Adaptation
by: Xu, Muyu, et al.
Published: (2025)
by: Xu, Muyu, et al.
Published: (2025)
Tree of Attributes Prompt Learning for Vision-Language Models
by: Ding, Tong, et al.
Published: (2024)
by: Ding, Tong, et al.
Published: (2024)
PAGE-4D: VGGT-4D Perception via Disentangled Pose and Geometry Estimation
by: Zhou, Kaichen, et al.
Published: (2025)
by: Zhou, Kaichen, et al.
Published: (2025)
Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models
by: Magid, Salma Abdel, et al.
Published: (2024)
by: Magid, Salma Abdel, et al.
Published: (2024)
Similar Items
-
GraspSplats: Efficient Manipulation with 3D Feature Splatting
by: Ji, Mazeyu, et al.
Published: (2024) -
AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance
by: Xu, Tianling, et al.
Published: (2025) -
GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation
by: Jiang, Guangqi, et al.
Published: (2025) -
Abstract 3D Perception for Spatial Intelligence in Vision-Language Models
by: Liu, Yifan, et al.
Published: (2025) -
RoboTAG: End-to-end Robot Configuration Estimation via Topological Alignment Graph
by: Liu, Yifan, et al.
Published: (2025)