Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wu, Jiaqi, Wang, Zhen, Huang, Enhao, Shen, Kangqing, Wang, Yulin, Yue, Yang, Pu, Yifan, Huang, Gao
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.11234
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908959866093568
author	Wu, Jiaqi Wang, Zhen Huang, Enhao Shen, Kangqing Wang, Yulin Yue, Yang Pu, Yifan Huang, Gao
author_facet	Wu, Jiaqi Wang, Zhen Huang, Enhao Shen, Kangqing Wang, Yulin Yue, Yang Pu, Yifan Huang, Gao
contents	Text-guided multispectral object detection uses text semantics to guide semantic-aware cross-modal interaction between RGB and IR for more robust perception. However, notable limitations remain: (1) existing methods often use text only as an auxiliary semantic enhancement signal, without exploiting its guiding role to bridge the inherent granularity asymmetry between RGB and IR; and (2) conventional data-driven attention-based fusion tends to emphasize stable consensus while overlooking potentially valuable cross-modal discrepancies. To address these issues, we propose a semantic bridge fusion framework with bi-support modeling for multispectral object detection. Specifically, text is used as a shared semantic bridge to align RGB and IR responses under a unified category condition, while the recalibrated thermal semantic prior is projected onto the RGB branch for semantic-level mapping fusion. We further formulate RGB-IR interaction evidence into the regular consensus support and the complementary discrepancy support that contains potentially discriminative cues, and introduce them into fusion via dynamic recalibration as a structured inductive bias. In addition, we design a bidirectional semantic alignment module for closed-loop vision-text guidance enhancement. Extensive experiments demonstrate the effectiveness of the proposed fusion framework and its superior detection performance on multispectral benchmarks. Code is available at https://github.com/zhenwang5372/Bridging-RGB-IR-Gap.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_11234
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Bridging the RGB-IR Gap: Consensus and Discrepancy Modeling for Text-Guided Multispectral Detection Wu, Jiaqi Wang, Zhen Huang, Enhao Shen, Kangqing Wang, Yulin Yue, Yang Pu, Yifan Huang, Gao Computer Vision and Pattern Recognition Text-guided multispectral object detection uses text semantics to guide semantic-aware cross-modal interaction between RGB and IR for more robust perception. However, notable limitations remain: (1) existing methods often use text only as an auxiliary semantic enhancement signal, without exploiting its guiding role to bridge the inherent granularity asymmetry between RGB and IR; and (2) conventional data-driven attention-based fusion tends to emphasize stable consensus while overlooking potentially valuable cross-modal discrepancies. To address these issues, we propose a semantic bridge fusion framework with bi-support modeling for multispectral object detection. Specifically, text is used as a shared semantic bridge to align RGB and IR responses under a unified category condition, while the recalibrated thermal semantic prior is projected onto the RGB branch for semantic-level mapping fusion. We further formulate RGB-IR interaction evidence into the regular consensus support and the complementary discrepancy support that contains potentially discriminative cues, and introduce them into fusion via dynamic recalibration as a structured inductive bias. In addition, we design a bidirectional semantic alignment module for closed-loop vision-text guidance enhancement. Extensive experiments demonstrate the effectiveness of the proposed fusion framework and its superior detection performance on multispectral benchmarks. Code is available at https://github.com/zhenwang5372/Bridging-RGB-IR-Gap.
title	Bridging the RGB-IR Gap: Consensus and Discrepancy Modeling for Text-Guided Multispectral Detection
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2604.11234

Similar Items