Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.01429 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866910644951842816 |
|---|---|
| author | Truong, Thanh-Dat Prabhu, Utsav Wang, Dongyi Raj, Bhiksha Gauch, Susan Subbiah, Jeyamkondan Luu, Khoa |
| author_facet | Truong, Thanh-Dat Prabhu, Utsav Wang, Dongyi Raj, Bhiksha Gauch, Susan Subbiah, Jeyamkondan Luu, Khoa |
| contents | Unsupervised Domain Adaptation has been an efficient approach to transferring the semantic segmentation model across data distributions. Meanwhile, the recent Open-vocabulary Semantic Scene understanding based on large-scale vision language models is effective in open-set settings because it can learn diverse concepts and categories. However, these prior methods fail to generalize across different camera views due to the lack of cross-view geometric modeling. At present, there are limited studies analyzing cross-view learning. To address this problem, we introduce a novel Unsupervised Cross-view Adaptation Learning approach to modeling the geometric structural change across views in Semantic Scene Understanding. First, we introduce a novel Cross-view Geometric Constraint on Unpaired Data to model structural changes in images and segmentation masks across cameras. Second, we present a new Geodesic Flow-based Correlation Metric to efficiently measure the geometric structural changes across camera views. Third, we introduce a novel view-condition prompting mechanism to enhance the view-information modeling of the open-vocabulary segmentation network in cross-view adaptation learning. The experiments on different cross-view adaptation benchmarks have shown the effectiveness of our approach in cross-view modeling, demonstrating that we achieve State-of-the-Art (SOTA) performance compared to prior unsupervised domain adaptation and open-vocabulary semantic segmentation methods. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2406_01429 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding Truong, Thanh-Dat Prabhu, Utsav Wang, Dongyi Raj, Bhiksha Gauch, Susan Subbiah, Jeyamkondan Luu, Khoa Computer Vision and Pattern Recognition Unsupervised Domain Adaptation has been an efficient approach to transferring the semantic segmentation model across data distributions. Meanwhile, the recent Open-vocabulary Semantic Scene understanding based on large-scale vision language models is effective in open-set settings because it can learn diverse concepts and categories. However, these prior methods fail to generalize across different camera views due to the lack of cross-view geometric modeling. At present, there are limited studies analyzing cross-view learning. To address this problem, we introduce a novel Unsupervised Cross-view Adaptation Learning approach to modeling the geometric structural change across views in Semantic Scene Understanding. First, we introduce a novel Cross-view Geometric Constraint on Unpaired Data to model structural changes in images and segmentation masks across cameras. Second, we present a new Geodesic Flow-based Correlation Metric to efficiently measure the geometric structural changes across camera views. Third, we introduce a novel view-condition prompting mechanism to enhance the view-information modeling of the open-vocabulary segmentation network in cross-view adaptation learning. The experiments on different cross-view adaptation benchmarks have shown the effectiveness of our approach in cross-view modeling, demonstrating that we achieve State-of-the-Art (SOTA) performance compared to prior unsupervised domain adaptation and open-vocabulary semantic segmentation methods. |
| title | EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2406.01429 |