Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Jin, Youngwan, Park, Incheol, Song, Hanbin, Ju, Hyeongjin, Nalcakan, Yagiz, Kim, Shiho
Format: Preprint
Veröffentlicht: 2024
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2409.16706
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866916703270600704
author Jin, Youngwan
Park, Incheol
Song, Hanbin
Ju, Hyeongjin
Nalcakan, Yagiz
Kim, Shiho
author_facet Jin, Youngwan
Park, Incheol
Song, Hanbin
Ju, Hyeongjin
Nalcakan, Yagiz
Kim, Shiho
contents This paper proposes Pix2Next, a novel image-to-image translation framework designed to address the challenge of generating high-quality Near-Infrared (NIR) images from RGB inputs. Our approach leverages a state-of-the-art Vision Foundation Model (VFM) within an encoder-decoder architecture, incorporating cross-attention mechanisms to enhance feature integration. This design captures detailed global representations and preserves essential spectral characteristics, treating RGB-to-NIR translation as more than a simple domain transfer problem. A multi-scale PatchGAN discriminator ensures realistic image generation at various detail levels, while carefully designed loss functions couple global context understanding with local feature preservation. We performed experiments on the RANUS dataset to demonstrate Pix2Next's advantages in quantitative metrics and visual quality, improving the FID score by 34.81% compared to existing methods. Furthermore, we demonstrate the practical utility of Pix2Next by showing improved performance on a downstream object detection task using generated NIR data to augment limited real NIR datasets. The proposed approach enables the scaling up of NIR datasets without additional data acquisition or annotation efforts, potentially accelerating advancements in NIR-based computer vision applications.
format Preprint
id arxiv_https___arxiv_org_abs_2409_16706
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation
Jin, Youngwan
Park, Incheol
Song, Hanbin
Ju, Hyeongjin
Nalcakan, Yagiz
Kim, Shiho
Computer Vision and Pattern Recognition
Artificial Intelligence
This paper proposes Pix2Next, a novel image-to-image translation framework designed to address the challenge of generating high-quality Near-Infrared (NIR) images from RGB inputs. Our approach leverages a state-of-the-art Vision Foundation Model (VFM) within an encoder-decoder architecture, incorporating cross-attention mechanisms to enhance feature integration. This design captures detailed global representations and preserves essential spectral characteristics, treating RGB-to-NIR translation as more than a simple domain transfer problem. A multi-scale PatchGAN discriminator ensures realistic image generation at various detail levels, while carefully designed loss functions couple global context understanding with local feature preservation. We performed experiments on the RANUS dataset to demonstrate Pix2Next's advantages in quantitative metrics and visual quality, improving the FID score by 34.81% compared to existing methods. Furthermore, we demonstrate the practical utility of Pix2Next by showing improved performance on a downstream object detection task using generated NIR data to augment limited real NIR datasets. The proposed approach enables the scaling up of NIR datasets without additional data acquisition or annotation efforts, potentially accelerating advancements in NIR-based computer vision applications.
title Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2409.16706