Saved in:
Bibliographic Details
Main Authors: Luo, Zengli, Zhang, Canlong, Li, Zhixin, Wang, Zhiwen, Wei, Chunrong
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2505.03567
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915472040001536
author Luo, Zengli
Zhang, Canlong
Li, Zhixin
Wang, Zhiwen
Wei, Chunrong
author_facet Luo, Zengli
Zhang, Canlong
Li, Zhixin
Wang, Zhiwen
Wei, Chunrong
contents Text-based pedestrian search (TBPS) in full images aims to locate a target pedestrian in untrimmed images using natural language descriptions. However, in complex scenes with multiple pedestrians, existing methods are limited by uncertainties in detection and matching, leading to degraded performance. To address this, we propose UPD-TBPS, a novel framework comprising three modules: Multi-granularity Uncertainty Estimation (MUE), Prototype-based Uncertainty Decoupling (PUD), and Cross-modal Re-identification (ReID). MUE conducts multi-granularity queries to identify potential targets and assigns confidence scores to reduce early-stage uncertainty. PUD leverages visual context decoupling and prototype mining to extract features of the target pedestrian described in the query. It separates and learns pedestrian prototype representations at both the coarse-grained cluster level and the fine-grained individual level, thereby reducing matching uncertainty. ReID evaluates candidates with varying confidence levels, improving detection and retrieval accuracy. Experiments on CUHK-SYSU-TBPS and PRW-TBPS datasets validate the effectiveness of our framework.
format Preprint
id arxiv_https___arxiv_org_abs_2505_03567
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Uncertainty-Aware Prototype Semantic Decoupling for Text-Based Person Search in Full Images
Luo, Zengli
Zhang, Canlong
Li, Zhixin
Wang, Zhiwen
Wei, Chunrong
Computer Vision and Pattern Recognition
Text-based pedestrian search (TBPS) in full images aims to locate a target pedestrian in untrimmed images using natural language descriptions. However, in complex scenes with multiple pedestrians, existing methods are limited by uncertainties in detection and matching, leading to degraded performance. To address this, we propose UPD-TBPS, a novel framework comprising three modules: Multi-granularity Uncertainty Estimation (MUE), Prototype-based Uncertainty Decoupling (PUD), and Cross-modal Re-identification (ReID). MUE conducts multi-granularity queries to identify potential targets and assigns confidence scores to reduce early-stage uncertainty. PUD leverages visual context decoupling and prototype mining to extract features of the target pedestrian described in the query. It separates and learns pedestrian prototype representations at both the coarse-grained cluster level and the fine-grained individual level, thereby reducing matching uncertainty. ReID evaluates candidates with varying confidence levels, improving detection and retrieval accuracy. Experiments on CUHK-SYSU-TBPS and PRW-TBPS datasets validate the effectiveness of our framework.
title Uncertainty-Aware Prototype Semantic Decoupling for Text-Based Person Search in Full Images
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2505.03567