:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Zhuofan, Zhu, Ziyu, Li, Junhao, Li, Pengxiang, Wang, Tianxu, Liu, Tengyu, Ma, Xiaojian, Chen, Yixin, Jia, Baoxiong, Huang, Siyuan, Li, Qing
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2408.04034
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes
by: Wang, Tianxu, et al.
Published: (2025)

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation
by: Zhu, Ziyu, et al.
Published: (2025)

Unifying 3D Vision-Language Understanding via Promptable Queries
by: Zhu, Ziyu, et al.
Published: (2024)

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
by: Jia, Baoxiong, et al.
Published: (2024)

SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes
by: Linghu, Xiongkun, et al.
Published: (2025)

LEO-VL: Efficient Scene Representation for Scalable 3D Vision-Language Learning
by: Huang, Jiangyong, et al.
Published: (2025)

Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding
by: Wang, Yan, et al.
Published: (2025)

Multi-modal Situated Reasoning in 3D Scenes
by: Linghu, Xiongkun, et al.
Published: (2024)

MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans
by: Yu, Huangyue, et al.
Published: (2025)

Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance
by: Wang, Zan, et al.
Published: (2024)

3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding
by: Linghu, Xiongkun, et al.
Published: (2026)

PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI
by: Yang, Yandan, et al.
Published: (2024)

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent
by: Yang, Yandan, et al.
Published: (2025)

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
by: Huang, Jiangyong, et al.
Published: (2025)

An Embodied Generalist Agent in 3D World
by: Huang, Jiangyong, et al.
Published: (2023)

Scaling Up Dynamic Human-Scene Interaction Modeling
by: Jiang, Nan, et al.
Published: (2024)

SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields
by: Liu, Yu, et al.
Published: (2024)

Lifting Unlabeled Internet-level Data for 3D Scene Understanding
by: Chen, Yixin, et al.
Published: (2026)

MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
by: Lu, Ruijie, et al.
Published: (2024)

Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting
by: Guo, Jun, et al.
Published: (2024)

Spatial-Temporal Multi-Scale Quantization for Flexible Motion Generation
by: Wang, Zan, et al.
Published: (2025)

GWM: Towards Scalable Gaussian World Models for Robotic Manipulation
by: Lu, Guanxing, et al.
Published: (2025)

Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation
by: Li, Yuyang, et al.
Published: (2025)

Dynamic Motion Blending for Versatile Motion Editing
by: Jiang, Nan, et al.
Published: (2025)

GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
by: Cui, Jieming, et al.
Published: (2025)

Autonomous Character-Scene Interaction Synthesis from Text Instruction
by: Jiang, Nan, et al.
Published: (2024)

Grasp Multiple Objects with One Hand
by: Li, Yuyang, et al.
Published: (2023)

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
by: Cui, Jieming, et al.
Published: (2024)

3D Scene Change Modeling With Consistent Multi-View Aggregation
by: Zhou, Zirui, et al.
Published: (2025)

ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning
by: Li, Kailin, et al.
Published: (2025)

Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations
by: Li, Puhao, et al.
Published: (2024)

Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation
by: Li, Yuyang, et al.
Published: (2025)

LessMimic: Long-Horizon Humanoid Interaction with Unified Distance Field Representations
by: Lin, Yutang, et al.
Published: (2026)

Afford-X: Generalizable and Slim Affordance Reasoning for Task-oriented Manipulation
by: Zhu, Xiaomeng, et al.
Published: (2025)

PhyRecon: Physically Plausible Neural Scene Reconstruction
by: Ni, Junfeng, et al.
Published: (2024)

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage
by: Gao, Zhi, et al.
Published: (2024)

ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
by: Liu, Yu, et al.
Published: (2025)

Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
by: Li, Pengxiang, et al.
Published: (2025)

ARFlow: Human Action-Reaction Flow Matching with Physical Guidance
by: Jiang, Wentao, et al.
Published: (2025)

DGSG-Mind: Dynamic 3D Gaussian Scene Graphs for Long-Term Scene Understanding and Grounding
by: Ge, Luzhou, et al.
Published: (2026)