Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Balakrishnan, Ajith, S, Sreeja, Shine, Linu
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computer Vision and Pattern Recognition I.5
Online-Zugang:	https://arxiv.org/abs/2412.00731
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866913592246272000
author	Balakrishnan, Ajith S, Sreeja Shine, Linu
author_facet	Balakrishnan, Ajith S, Sreeja Shine, Linu
contents	Generating 3D models from multi-view 2D RGB images has gained significant attention, extending the capabilities of technologies like Virtual Reality, Robotic Vision, and human-machine interaction. In this paper, we introduce a hybrid strategy combining CNNs and transformers, featuring a visual auto-encoder with self-attention mechanisms and a 3D refiner network, trained using a novel Joint Train Separate Optimization (JTSO) algorithm. Encoded features from unordered inputs are transformed into an enhanced feature map by the self-attention layer, decoded into an initial 3D volume, and further refined. Our network generates 3D voxels from single or multiple 2D images from arbitrary viewpoints. Performance evaluations using the ShapeNet datasets show that our approach, combined with JTSO, outperforms state-of-the-art techniques in single and multi-view 3D reconstruction, achieving the highest mean intersection over union (IOU) scores, surpassing other models by 4.2% in single-view reconstruction.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_00731
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Refine3DNet: Scaling Precision in 3D Object Reconstruction from Multi-View RGB Images using Attention Balakrishnan, Ajith S, Sreeja Shine, Linu Computer Vision and Pattern Recognition I.5 Generating 3D models from multi-view 2D RGB images has gained significant attention, extending the capabilities of technologies like Virtual Reality, Robotic Vision, and human-machine interaction. In this paper, we introduce a hybrid strategy combining CNNs and transformers, featuring a visual auto-encoder with self-attention mechanisms and a 3D refiner network, trained using a novel Joint Train Separate Optimization (JTSO) algorithm. Encoded features from unordered inputs are transformed into an enhanced feature map by the self-attention layer, decoded into an initial 3D volume, and further refined. Our network generates 3D voxels from single or multiple 2D images from arbitrary viewpoints. Performance evaluations using the ShapeNet datasets show that our approach, combined with JTSO, outperforms state-of-the-art techniques in single and multi-view 3D reconstruction, achieving the highest mean intersection over union (IOU) scores, surpassing other models by 4.2% in single-view reconstruction.
title	Refine3DNet: Scaling Precision in 3D Object Reconstruction from Multi-View RGB Images using Attention
topic	Computer Vision and Pattern Recognition I.5
url	https://arxiv.org/abs/2412.00731

Ähnliche Einträge