Saved in:
Bibliographic Details
Main Authors: Wang, Xiangyu, Lv, Zhixin, Sun, Yongjiao, Han, Anrui, Yuan, Ye, Ji, Hangxu
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.15931
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909997903904768
author Wang, Xiangyu
Lv, Zhixin
Sun, Yongjiao
Han, Anrui
Yuan, Ye
Ji, Hangxu
author_facet Wang, Xiangyu
Lv, Zhixin
Sun, Yongjiao
Han, Anrui
Yuan, Ye
Ji, Hangxu
contents Text-Based Person Search (TBPS) holds unique value in real-world surveillance bridging visual perception and language understanding, yet current paradigms utilizing pre-training models often fail to transfer effectively to complex open-world scenarios. The reliance on "Passive Observation" leads to multifaceted spurious correlations and spatial semantic misalignment, causing a lack of robustness against distribution shifts. To fundamentally resolve these defects, this paper proposes ICON (Invariant Counterfactual Optimization with Neuro-symbolic priors), a framework integrating causal and topological priors. First, we introduce Rule-Guided Spatial Intervention to strictly penalize sensitivity to bounding box noise, forcibly severing location shortcuts to achieve geometric invariance. Second, Counterfactual Context Disentanglement is implemented via semantic-driven background transplantation, compelling the model to ignore background interference for environmental independence. Then, we employ Saliency-Driven Semantic Regularization with adaptive masking to resolve local saliency bias and guarantee holistic completeness. Finally, Neuro-Symbolic Topological Alignment utilizes neuro-symbolic priors to constrain feature matching, ensuring activated regions are topologically consistent with human structural logic. Experimental results demonstrate that ICON not only maintains leading performance on standard benchmarks but also exhibits exceptional robustness against occlusion, background interference, and localization noise. This approach effectively advances the field by shifting from fitting statistical co-occurrences to learning causal invariance.
format Preprint
id arxiv_https___arxiv_org_abs_2601_15931
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle ICON: Invariant Counterfactual Optimization with Neuro-Symbolic Priors for Text-Based Person Search
Wang, Xiangyu
Lv, Zhixin
Sun, Yongjiao
Han, Anrui
Yuan, Ye
Ji, Hangxu
Artificial Intelligence
Machine Learning
Text-Based Person Search (TBPS) holds unique value in real-world surveillance bridging visual perception and language understanding, yet current paradigms utilizing pre-training models often fail to transfer effectively to complex open-world scenarios. The reliance on "Passive Observation" leads to multifaceted spurious correlations and spatial semantic misalignment, causing a lack of robustness against distribution shifts. To fundamentally resolve these defects, this paper proposes ICON (Invariant Counterfactual Optimization with Neuro-symbolic priors), a framework integrating causal and topological priors. First, we introduce Rule-Guided Spatial Intervention to strictly penalize sensitivity to bounding box noise, forcibly severing location shortcuts to achieve geometric invariance. Second, Counterfactual Context Disentanglement is implemented via semantic-driven background transplantation, compelling the model to ignore background interference for environmental independence. Then, we employ Saliency-Driven Semantic Regularization with adaptive masking to resolve local saliency bias and guarantee holistic completeness. Finally, Neuro-Symbolic Topological Alignment utilizes neuro-symbolic priors to constrain feature matching, ensuring activated regions are topologically consistent with human structural logic. Experimental results demonstrate that ICON not only maintains leading performance on standard benchmarks but also exhibits exceptional robustness against occlusion, background interference, and localization noise. This approach effectively advances the field by shifting from fitting statistical co-occurrences to learning causal invariance.
title ICON: Invariant Counterfactual Optimization with Neuro-Symbolic Priors for Text-Based Person Search
topic Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2601.15931