Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhao, Hongxiang, Dai, Xili, Wang, Jianan, Tong, Shengbang, Zhang, Jingyuan, Wang, Weida, Zhang, Lei, Ma, Yi
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2403.10953
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910498177417216
author	Zhao, Hongxiang Dai, Xili Wang, Jianan Tong, Shengbang Zhang, Jingyuan Wang, Weida Zhang, Lei Ma, Yi
author_facet	Zhao, Hongxiang Dai, Xili Wang, Jianan Tong, Shengbang Zhang, Jingyuan Wang, Weida Zhang, Lei Ma, Yi
contents	Large image diffusion models have demonstrated zero-shot capability in novel view synthesis (NVS). However, existing diffusion-based NVS methods struggle to generate novel views that are accurately consistent with the corresponding ground truth poses and appearances, even on the training set. This consequently limits the performance of downstream tasks, such as image-to-multiview generation and 3D reconstruction. We realize that such inconsistency is largely due to the fact that it is difficult to enforce accurate pose and appearance alignment directly in the diffusion training, as mostly done by existing methods such as Zero123. To remedy this problem, we propose Ctrl123, a closed-loop transcription-based NVS diffusion method that enforces alignment between the generated view and ground truth in a pose-sensitive feature space. Our extensive experiments demonstrate the effectiveness of Ctrl123 on the tasks of NVS and 3D reconstruction, achieving significant improvements in both multiview-consistency and pose-consistency over existing methods.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_10953
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Ctrl123: Consistent Novel View Synthesis via Closed-Loop Transcription Zhao, Hongxiang Dai, Xili Wang, Jianan Tong, Shengbang Zhang, Jingyuan Wang, Weida Zhang, Lei Ma, Yi Computer Vision and Pattern Recognition Large image diffusion models have demonstrated zero-shot capability in novel view synthesis (NVS). However, existing diffusion-based NVS methods struggle to generate novel views that are accurately consistent with the corresponding ground truth poses and appearances, even on the training set. This consequently limits the performance of downstream tasks, such as image-to-multiview generation and 3D reconstruction. We realize that such inconsistency is largely due to the fact that it is difficult to enforce accurate pose and appearance alignment directly in the diffusion training, as mostly done by existing methods such as Zero123. To remedy this problem, we propose Ctrl123, a closed-loop transcription-based NVS diffusion method that enforces alignment between the generated view and ground truth in a pose-sensitive feature space. Our extensive experiments demonstrate the effectiveness of Ctrl123 on the tasks of NVS and 3D reconstruction, achieving significant improvements in both multiview-consistency and pose-consistency over existing methods.
title	Ctrl123: Consistent Novel View Synthesis via Closed-Loop Transcription
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2403.10953

Similar Items