Saved in:
Bibliographic Details
Main Authors: Truong, Thanh-Dat, Prabhu, Utsav, Wang, Dongyi, Raj, Bhiksha, Gauch, Susan, Subbiah, Jeyamkondan, Luu, Khoa
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.01429
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910644951842816
author Truong, Thanh-Dat
Prabhu, Utsav
Wang, Dongyi
Raj, Bhiksha
Gauch, Susan
Subbiah, Jeyamkondan
Luu, Khoa
author_facet Truong, Thanh-Dat
Prabhu, Utsav
Wang, Dongyi
Raj, Bhiksha
Gauch, Susan
Subbiah, Jeyamkondan
Luu, Khoa
contents Unsupervised Domain Adaptation has been an efficient approach to transferring the semantic segmentation model across data distributions. Meanwhile, the recent Open-vocabulary Semantic Scene understanding based on large-scale vision language models is effective in open-set settings because it can learn diverse concepts and categories. However, these prior methods fail to generalize across different camera views due to the lack of cross-view geometric modeling. At present, there are limited studies analyzing cross-view learning. To address this problem, we introduce a novel Unsupervised Cross-view Adaptation Learning approach to modeling the geometric structural change across views in Semantic Scene Understanding. First, we introduce a novel Cross-view Geometric Constraint on Unpaired Data to model structural changes in images and segmentation masks across cameras. Second, we present a new Geodesic Flow-based Correlation Metric to efficiently measure the geometric structural changes across camera views. Third, we introduce a novel view-condition prompting mechanism to enhance the view-information modeling of the open-vocabulary segmentation network in cross-view adaptation learning. The experiments on different cross-view adaptation benchmarks have shown the effectiveness of our approach in cross-view modeling, demonstrating that we achieve State-of-the-Art (SOTA) performance compared to prior unsupervised domain adaptation and open-vocabulary semantic segmentation methods.
format Preprint
id arxiv_https___arxiv_org_abs_2406_01429
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding
Truong, Thanh-Dat
Prabhu, Utsav
Wang, Dongyi
Raj, Bhiksha
Gauch, Susan
Subbiah, Jeyamkondan
Luu, Khoa
Computer Vision and Pattern Recognition
Unsupervised Domain Adaptation has been an efficient approach to transferring the semantic segmentation model across data distributions. Meanwhile, the recent Open-vocabulary Semantic Scene understanding based on large-scale vision language models is effective in open-set settings because it can learn diverse concepts and categories. However, these prior methods fail to generalize across different camera views due to the lack of cross-view geometric modeling. At present, there are limited studies analyzing cross-view learning. To address this problem, we introduce a novel Unsupervised Cross-view Adaptation Learning approach to modeling the geometric structural change across views in Semantic Scene Understanding. First, we introduce a novel Cross-view Geometric Constraint on Unpaired Data to model structural changes in images and segmentation masks across cameras. Second, we present a new Geodesic Flow-based Correlation Metric to efficiently measure the geometric structural changes across camera views. Third, we introduce a novel view-condition prompting mechanism to enhance the view-information modeling of the open-vocabulary segmentation network in cross-view adaptation learning. The experiments on different cross-view adaptation benchmarks have shown the effectiveness of our approach in cross-view modeling, demonstrating that we achieve State-of-the-Art (SOTA) performance compared to prior unsupervised domain adaptation and open-vocabulary semantic segmentation methods.
title EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2406.01429