MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Sheng, Hualian, Cai, Sijia, Zhao, Na, Deng, Bing, Liang, Qiao, Zhao, Min-Jian, Ye, Jieping
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computer Vision and Pattern Recognition
Accesso online:	https://arxiv.org/abs/2406.08152
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866916283673477120
author	Sheng, Hualian Cai, Sijia Zhao, Na Deng, Bing Liang, Qiao Zhao, Min-Jian Ye, Jieping
author_facet	Sheng, Hualian Cai, Sijia Zhao, Na Deng, Bing Liang, Qiao Zhao, Min-Jian Ye, Jieping
contents	The field of 3D object detection from point clouds is rapidly advancing in computer vision, aiming to accurately and efficiently detect and localize objects in three-dimensional space. Current 3D detectors commonly fall short in terms of flexibility and scalability, with ample room for advancements in performance. In this paper, our objective is to address these limitations by introducing two frameworks for 3D object detection with minimal hand-crafted design. Firstly, we propose CT3D, which sequentially performs raw-point-based embedding, a standard Transformer encoder, and a channel-wise decoder for point features within each proposal. Secondly, we present an enhanced network called CT3D++, which incorporates geometric and semantic fusion-based embedding to extract more valuable and comprehensive proposal-aware information. Additionally, CT3D ++ utilizes a point-to-key bidirectional encoder for more efficient feature encoding with reduced computational cost. By replacing the corresponding components of CT3D with these novel modules, CT3D++ achieves state-of-the-art performance on both the KITTI dataset and the large-scale Way\-mo Open Dataset. The source code for our frameworks will be made accessible at https://github.com/hlsheng1/CT3D-plusplus.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_08152
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer Sheng, Hualian Cai, Sijia Zhao, Na Deng, Bing Liang, Qiao Zhao, Min-Jian Ye, Jieping Computer Vision and Pattern Recognition The field of 3D object detection from point clouds is rapidly advancing in computer vision, aiming to accurately and efficiently detect and localize objects in three-dimensional space. Current 3D detectors commonly fall short in terms of flexibility and scalability, with ample room for advancements in performance. In this paper, our objective is to address these limitations by introducing two frameworks for 3D object detection with minimal hand-crafted design. Firstly, we propose CT3D, which sequentially performs raw-point-based embedding, a standard Transformer encoder, and a channel-wise decoder for point features within each proposal. Secondly, we present an enhanced network called CT3D++, which incorporates geometric and semantic fusion-based embedding to extract more valuable and comprehensive proposal-aware information. Additionally, CT3D ++ utilizes a point-to-key bidirectional encoder for more efficient feature encoding with reduced computational cost. By replacing the corresponding components of CT3D with these novel modules, CT3D++ achieves state-of-the-art performance on both the KITTI dataset and the large-scale Way\-mo Open Dataset. The source code for our frameworks will be made accessible at https://github.com/hlsheng1/CT3D-plusplus.
title	CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2406.08152

Documenti analoghi