Enregistré dans:
Détails bibliographiques
Auteurs principaux: Carlsson, Oscar, Gerken, Jan E., Linander, Hampus, Spieß, Heiner, Ohlsson, Fredrik, Petersson, Christoffer, Persson, Daniel
Format: Preprint
Publié: 2023
Sujets:
Accès en ligne:https://arxiv.org/abs/2307.07313
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866909194268966912
author Carlsson, Oscar
Gerken, Jan E.
Linander, Hampus
Spieß, Heiner
Ohlsson, Fredrik
Petersson, Christoffer
Persson, Daniel
author_facet Carlsson, Oscar
Gerken, Jan E.
Linander, Hampus
Spieß, Heiner
Ohlsson, Fredrik
Petersson, Christoffer
Persson, Daniel
contents High-resolution wide-angle fisheye images are becoming more and more important for robotics applications such as autonomous driving. However, using ordinary convolutional neural networks or vision transformers on this data is problematic due to projection and distortion losses introduced when projecting to a rectangular grid on the plane. We introduce the HEAL-SWIN transformer, which combines the highly uniform Hierarchical Equal Area iso-Latitude Pixelation (HEALPix) grid used in astrophysics and cosmology with the Hierarchical Shifted-Window (SWIN) transformer to yield an efficient and flexible model capable of training on high-resolution, distortion-free spherical data. In HEAL-SWIN, the nested structure of the HEALPix grid is used to perform the patching and windowing operations of the SWIN transformer, enabling the network to process spherical representations with minimal computational overhead. We demonstrate the superior performance of our model on both synthetic and real automotive datasets, as well as a selection of other image datasets, for semantic segmentation, depth regression and classification tasks. Our code is publicly available at https://github.com/JanEGerken/HEAL-SWIN.
format Preprint
id arxiv_https___arxiv_org_abs_2307_07313
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle HEAL-SWIN: A Vision Transformer On The Sphere
Carlsson, Oscar
Gerken, Jan E.
Linander, Hampus
Spieß, Heiner
Ohlsson, Fredrik
Petersson, Christoffer
Persson, Daniel
Computer Vision and Pattern Recognition
Machine Learning
High-resolution wide-angle fisheye images are becoming more and more important for robotics applications such as autonomous driving. However, using ordinary convolutional neural networks or vision transformers on this data is problematic due to projection and distortion losses introduced when projecting to a rectangular grid on the plane. We introduce the HEAL-SWIN transformer, which combines the highly uniform Hierarchical Equal Area iso-Latitude Pixelation (HEALPix) grid used in astrophysics and cosmology with the Hierarchical Shifted-Window (SWIN) transformer to yield an efficient and flexible model capable of training on high-resolution, distortion-free spherical data. In HEAL-SWIN, the nested structure of the HEALPix grid is used to perform the patching and windowing operations of the SWIN transformer, enabling the network to process spherical representations with minimal computational overhead. We demonstrate the superior performance of our model on both synthetic and real automotive datasets, as well as a selection of other image datasets, for semantic segmentation, depth regression and classification tasks. Our code is publicly available at https://github.com/JanEGerken/HEAL-SWIN.
title HEAL-SWIN: A Vision Transformer On The Sphere
topic Computer Vision and Pattern Recognition
Machine Learning
url https://arxiv.org/abs/2307.07313