Saved in:
Bibliographic Details
Main Authors: Meng, Zelin, Fukao, Takanori
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.04821
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909596460777472
author Meng, Zelin
Fukao, Takanori
author_facet Meng, Zelin
Fukao, Takanori
contents Depth estimation in complex real-world scenarios is a challenging task, especially when relying solely on a single modality such as visible light or thermal infrared (THR) imagery. This paper proposes a novel multimodal depth estimation model, RTFusion, which enhances depth estimation accuracy and robustness by integrating the complementary strengths of RGB and THR data. The RGB modality provides rich texture and color information, while the THR modality captures thermal patterns, ensuring stability under adverse lighting conditions such as extreme illumination. The model incorporates a unique fusion mechanism, EGFusion, consisting of the Mutual Complementary Attention (MCA) module for cross-modal feature alignment and the Edge Saliency Enhancement Module (ESEM) to improve edge detail preservation. Comprehensive experiments on the MS2 and ViViD++ datasets demonstrate that the proposed model consistently produces high-quality depth maps across various challenging environments, including nighttime, rainy, and high-glare conditions. The experimental results highlight the potential of the proposed method in applications requiring reliable depth estimation, such as autonomous driving, robotics, and augmented reality.
format Preprint
id arxiv_https___arxiv_org_abs_2503_04821
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle RGB-Thermal Infrared Fusion for Robust Depth Estimation in Complex Environments
Meng, Zelin
Fukao, Takanori
Image and Video Processing
Artificial Intelligence
Computer Vision and Pattern Recognition
Depth estimation in complex real-world scenarios is a challenging task, especially when relying solely on a single modality such as visible light or thermal infrared (THR) imagery. This paper proposes a novel multimodal depth estimation model, RTFusion, which enhances depth estimation accuracy and robustness by integrating the complementary strengths of RGB and THR data. The RGB modality provides rich texture and color information, while the THR modality captures thermal patterns, ensuring stability under adverse lighting conditions such as extreme illumination. The model incorporates a unique fusion mechanism, EGFusion, consisting of the Mutual Complementary Attention (MCA) module for cross-modal feature alignment and the Edge Saliency Enhancement Module (ESEM) to improve edge detail preservation. Comprehensive experiments on the MS2 and ViViD++ datasets demonstrate that the proposed model consistently produces high-quality depth maps across various challenging environments, including nighttime, rainy, and high-glare conditions. The experimental results highlight the potential of the proposed method in applications requiring reliable depth estimation, such as autonomous driving, robotics, and augmented reality.
title RGB-Thermal Infrared Fusion for Robust Depth Estimation in Complex Environments
topic Image and Video Processing
Artificial Intelligence
Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2503.04821