Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.07343 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866914313669705728 |
|---|---|
| author | Reddy, Ruturaj Barua, Hrishav Bakul Loo, Junn Yong Nguyen, Thanh Thi Krishnasamy, Ganesh |
| author_facet | Reddy, Ruturaj Barua, Hrishav Bakul Loo, Junn Yong Nguyen, Thanh Thi Krishnasamy, Ganesh |
| contents | Robust semantic segmentation of road scenes under adverse illumination, lighting, and shadow conditions remain a core challenge for autonomous driving applications. RGB-Thermal fusion is a standard approach, yet existing methods apply static fusion strategies uniformly across all conditions, allowing modality-specific noise to propagate throughout the network. Hence, we propose CLARITY that dynamically adapts its fusion strategy to the detected scene condition. Guided by vision-language model (VLM) priors, the network learns to modulate each modality's contribution based on the illumination state while leveraging object embeddings for segmentation, rather than applying a fixed fusion policy. We further introduce two mechanisms, i.e., one which preserves valid dark-object semantics that prior noise-suppression methods incorrectly discard, and a hierarchical decoder that enforces structural consistency across scales to sharpen boundaries on thin objects. Experiments on the MFNet dataset demonstrate that CLARITY establishes a new state-of-the-art (SOTA), achieving 62.3% mIoU and 77.5% mAcc. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2602_07343 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Seeing Roads Through Words: A Language-Guided Framework for RGB-T Driving Scene Segmentation Reddy, Ruturaj Barua, Hrishav Bakul Loo, Junn Yong Nguyen, Thanh Thi Krishnasamy, Ganesh Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning Robotics I.2.9; I.2.10; I.4.6; I.4.8 Robust semantic segmentation of road scenes under adverse illumination, lighting, and shadow conditions remain a core challenge for autonomous driving applications. RGB-Thermal fusion is a standard approach, yet existing methods apply static fusion strategies uniformly across all conditions, allowing modality-specific noise to propagate throughout the network. Hence, we propose CLARITY that dynamically adapts its fusion strategy to the detected scene condition. Guided by vision-language model (VLM) priors, the network learns to modulate each modality's contribution based on the illumination state while leveraging object embeddings for segmentation, rather than applying a fixed fusion policy. We further introduce two mechanisms, i.e., one which preserves valid dark-object semantics that prior noise-suppression methods incorrectly discard, and a hierarchical decoder that enforces structural consistency across scales to sharpen boundaries on thin objects. Experiments on the MFNet dataset demonstrate that CLARITY establishes a new state-of-the-art (SOTA), achieving 62.3% mIoU and 77.5% mAcc. |
| title | Seeing Roads Through Words: A Language-Guided Framework for RGB-T Driving Scene Segmentation |
| topic | Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning Robotics I.2.9; I.2.10; I.4.6; I.4.8 |
| url | https://arxiv.org/abs/2602.07343 |