Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ma, Xianzheng, Smart, Brandon, Bhalgat, Yash, Chen, Shuai, Li, Xinghui, Ding, Jian, Gu, Jindong, Chen, Dave Zhenyu, Peng, Songyou, Bian, Jia-Wang, Torr, Philip H, Pollefeys, Marc, Nießner, Matthias, Reid, Ian D, Chang, Angel X., Laina, Iro, Prisacariu, Victor Adrian
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Robotics
Online Access:	https://arxiv.org/abs/2405.10255
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909858962341888
author	Ma, Xianzheng Smart, Brandon Bhalgat, Yash Chen, Shuai Li, Xinghui Ding, Jian Gu, Jindong Chen, Dave Zhenyu Peng, Songyou Bian, Jia-Wang Torr, Philip H Pollefeys, Marc Nießner, Matthias Reid, Ian D Chang, Angel X. Laina, Iro Prisacariu, Victor Adrian
author_facet	Ma, Xianzheng Smart, Brandon Bhalgat, Yash Chen, Shuai Li, Xinghui Ding, Jian Gu, Jindong Chen, Dave Zhenyu Peng, Songyou Bian, Jia-Wang Torr, Philip H Pollefeys, Marc Nießner, Matthias Reid, Ian D Chang, Angel X. Laina, Iro Prisacariu, Victor Adrian
contents	As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context learning, step-by-step reasoning, open-vocabulary capabilities, and extensive world knowledge, we underscore their potential to significantly advance spatial comprehension and interaction within embodied Artificial Intelligence (AI) systems. Our investigation spans various 3D data representations, from point clouds to Neural Radiance Fields (NeRFs). It examines their integration with LLMs for tasks such as 3D scene understanding, captioning, question-answering, and dialogue, as well as LLM-based agents for spatial reasoning, planning, and navigation. The paper also includes a brief review of other methods that integrate 3D and language. The meta-analysis presented in this paper reveals significant progress yet underscores the necessity for novel approaches to harness the full potential of 3D-LLMs. Hence, with this paper, we aim to chart a course for future research that explores and expands the capabilities of 3D-LLMs in understanding and interacting with the complex 3D world. To support this survey, we have established a project page where papers related to our topic are organized and listed: https://github.com/ActiveVisionLab/Awesome-LLM-3D.
format	Preprint
id	arxiv_https___arxiv_org_abs_2405_10255
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models Ma, Xianzheng Smart, Brandon Bhalgat, Yash Chen, Shuai Li, Xinghui Ding, Jian Gu, Jindong Chen, Dave Zhenyu Peng, Songyou Bian, Jia-Wang Torr, Philip H Pollefeys, Marc Nießner, Matthias Reid, Ian D Chang, Angel X. Laina, Iro Prisacariu, Victor Adrian Computer Vision and Pattern Recognition Robotics As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context learning, step-by-step reasoning, open-vocabulary capabilities, and extensive world knowledge, we underscore their potential to significantly advance spatial comprehension and interaction within embodied Artificial Intelligence (AI) systems. Our investigation spans various 3D data representations, from point clouds to Neural Radiance Fields (NeRFs). It examines their integration with LLMs for tasks such as 3D scene understanding, captioning, question-answering, and dialogue, as well as LLM-based agents for spatial reasoning, planning, and navigation. The paper also includes a brief review of other methods that integrate 3D and language. The meta-analysis presented in this paper reveals significant progress yet underscores the necessity for novel approaches to harness the full potential of 3D-LLMs. Hence, with this paper, we aim to chart a course for future research that explores and expands the capabilities of 3D-LLMs in understanding and interacting with the complex 3D world. To support this survey, we have established a project page where papers related to our topic are organized and listed: https://github.com/ActiveVisionLab/Awesome-LLM-3D.
title	When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
topic	Computer Vision and Pattern Recognition Robotics
url	https://arxiv.org/abs/2405.10255

Similar Items