Saved in:
Bibliographic Details
Main Authors: Yan, Xu, Zhang, Haiming, Cai, Yingjie, Guo, Jingming, Qiu, Weichao, Gao, Bin, Zhou, Kaiqiang, Zhao, Yue, Jin, Huan, Gao, Jiantao, Li, Zhen, Jiang, Lihui, Zhang, Wei, Zhang, Hongbo, Dai, Dengxin, Liu, Bingbing
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2401.08045
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929210429276160
author Yan, Xu
Zhang, Haiming
Cai, Yingjie
Guo, Jingming
Qiu, Weichao
Gao, Bin
Zhou, Kaiqiang
Zhao, Yue
Jin, Huan
Gao, Jiantao
Li, Zhen
Jiang, Lihui
Zhang, Wei
Zhang, Hongbo
Dai, Dengxin
Liu, Bingbing
author_facet Yan, Xu
Zhang, Haiming
Cai, Yingjie
Guo, Jingming
Qiu, Weichao
Gao, Bin
Zhou, Kaiqiang
Zhao, Yue
Jin, Huan
Gao, Jiantao
Li, Zhen
Jiang, Lihui
Zhang, Wei
Zhang, Hongbo
Dai, Dengxin
Liu, Bingbing
contents The rise of large foundation models, trained on extensive datasets, is revolutionizing the field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by extracting intricate patterns and performing effectively across diverse tasks, thereby serving as potent building blocks for a wide range of AI applications. Autonomous driving, a vibrant front in AI applications, remains challenged by the lack of dedicated vision foundation models (VFMs). The scarcity of comprehensive training data, the need for multi-sensor integration, and the diverse task-specific architectures pose significant obstacles to the development of VFMs in this field. This paper delves into the critical challenge of forging VFMs tailored specifically for autonomous driving, while also outlining future directions. Through a systematic analysis of over 250 papers, we dissect essential techniques for VFM development, including data preparation, pre-training strategies, and downstream task adaptation. Moreover, we explore key advancements such as NeRF, diffusion models, 3D Gaussian Splatting, and world models, presenting a comprehensive roadmap for future research. To empower researchers, we have built and maintained https://github.com/zhanghm1995/Forge_VFM4AD, an open-access repository constantly updated with the latest advancements in forging VFMs for autonomous driving.
format Preprint
id arxiv_https___arxiv_org_abs_2401_08045
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
Yan, Xu
Zhang, Haiming
Cai, Yingjie
Guo, Jingming
Qiu, Weichao
Gao, Bin
Zhou, Kaiqiang
Zhao, Yue
Jin, Huan
Gao, Jiantao
Li, Zhen
Jiang, Lihui
Zhang, Wei
Zhang, Hongbo
Dai, Dengxin
Liu, Bingbing
Computer Vision and Pattern Recognition
The rise of large foundation models, trained on extensive datasets, is revolutionizing the field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by extracting intricate patterns and performing effectively across diverse tasks, thereby serving as potent building blocks for a wide range of AI applications. Autonomous driving, a vibrant front in AI applications, remains challenged by the lack of dedicated vision foundation models (VFMs). The scarcity of comprehensive training data, the need for multi-sensor integration, and the diverse task-specific architectures pose significant obstacles to the development of VFMs in this field. This paper delves into the critical challenge of forging VFMs tailored specifically for autonomous driving, while also outlining future directions. Through a systematic analysis of over 250 papers, we dissect essential techniques for VFM development, including data preparation, pre-training strategies, and downstream task adaptation. Moreover, we explore key advancements such as NeRF, diffusion models, 3D Gaussian Splatting, and world models, presenting a comprehensive roadmap for future research. To empower researchers, we have built and maintained https://github.com/zhanghm1995/Forge_VFM4AD, an open-access repository constantly updated with the latest advancements in forging VFMs for autonomous driving.
title Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2401.08045