MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Dekel, Shay, Keller, Yosi, Cadik, Martin
Natura:	Preprint
Pubblicazione:	2023
Soggetti:	Computer Vision and Pattern Recognition
Accesso online:	https://arxiv.org/abs/2303.02615
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866913258418470912
author	Dekel, Shay Keller, Yosi Cadik, Martin
author_facet	Dekel, Shay Keller, Yosi Cadik, Martin
contents	The estimation of large and extreme image rotation plays a key role in multiple computer vision domains, where the rotated images are related by a limited or a non-overlapping field of view. Contemporary approaches apply convolutional neural networks to compute a 4D correlation volume to estimate the relative rotation between image pairs. In this work, we propose a cross-attention-based approach that utilizes CNN feature maps and a Transformer-Encoder, to compute the cross-attention between the activation maps of the image pairs, which is shown to be an improved equivalent of the 4D correlation volume, used in previous works. In the suggested approach, higher attention scores are associated with image regions that encode visual cues of rotation. Our approach is end-to-end trainable and optimizes a simple regression loss. It is experimentally shown to outperform contemporary state-of-the-art schemes when applied to commonly used image rotation datasets and benchmarks, and establishes a new state-of-the-art accuracy on these datasets. We make our code publicly available.
format	Preprint
id	arxiv_https___arxiv_org_abs_2303_02615
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Estimating Extreme 3D Image Rotation with Transformer Cross-Attention Dekel, Shay Keller, Yosi Cadik, Martin Computer Vision and Pattern Recognition The estimation of large and extreme image rotation plays a key role in multiple computer vision domains, where the rotated images are related by a limited or a non-overlapping field of view. Contemporary approaches apply convolutional neural networks to compute a 4D correlation volume to estimate the relative rotation between image pairs. In this work, we propose a cross-attention-based approach that utilizes CNN feature maps and a Transformer-Encoder, to compute the cross-attention between the activation maps of the image pairs, which is shown to be an improved equivalent of the 4D correlation volume, used in previous works. In the suggested approach, higher attention scores are associated with image regions that encode visual cues of rotation. Our approach is end-to-end trainable and optimizes a simple regression loss. It is experimentally shown to outperform contemporary state-of-the-art schemes when applied to commonly used image rotation datasets and benchmarks, and establishes a new state-of-the-art accuracy on these datasets. We make our code publicly available.
title	Estimating Extreme 3D Image Rotation with Transformer Cross-Attention
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2303.02615

Documenti analoghi