Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Luo, Zengli, Zhang, Canlong, Li, Zhixin, Wang, Zhiwen, Wei, Chunrong
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2505.03567
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915472040001536
author	Luo, Zengli Zhang, Canlong Li, Zhixin Wang, Zhiwen Wei, Chunrong
author_facet	Luo, Zengli Zhang, Canlong Li, Zhixin Wang, Zhiwen Wei, Chunrong
contents	Text-based pedestrian search (TBPS) in full images aims to locate a target pedestrian in untrimmed images using natural language descriptions. However, in complex scenes with multiple pedestrians, existing methods are limited by uncertainties in detection and matching, leading to degraded performance. To address this, we propose UPD-TBPS, a novel framework comprising three modules: Multi-granularity Uncertainty Estimation (MUE), Prototype-based Uncertainty Decoupling (PUD), and Cross-modal Re-identification (ReID). MUE conducts multi-granularity queries to identify potential targets and assigns confidence scores to reduce early-stage uncertainty. PUD leverages visual context decoupling and prototype mining to extract features of the target pedestrian described in the query. It separates and learns pedestrian prototype representations at both the coarse-grained cluster level and the fine-grained individual level, thereby reducing matching uncertainty. ReID evaluates candidates with varying confidence levels, improving detection and retrieval accuracy. Experiments on CUHK-SYSU-TBPS and PRW-TBPS datasets validate the effectiveness of our framework.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_03567
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Uncertainty-Aware Prototype Semantic Decoupling for Text-Based Person Search in Full Images Luo, Zengli Zhang, Canlong Li, Zhixin Wang, Zhiwen Wei, Chunrong Computer Vision and Pattern Recognition Text-based pedestrian search (TBPS) in full images aims to locate a target pedestrian in untrimmed images using natural language descriptions. However, in complex scenes with multiple pedestrians, existing methods are limited by uncertainties in detection and matching, leading to degraded performance. To address this, we propose UPD-TBPS, a novel framework comprising three modules: Multi-granularity Uncertainty Estimation (MUE), Prototype-based Uncertainty Decoupling (PUD), and Cross-modal Re-identification (ReID). MUE conducts multi-granularity queries to identify potential targets and assigns confidence scores to reduce early-stage uncertainty. PUD leverages visual context decoupling and prototype mining to extract features of the target pedestrian described in the query. It separates and learns pedestrian prototype representations at both the coarse-grained cluster level and the fine-grained individual level, thereby reducing matching uncertainty. ReID evaluates candidates with varying confidence levels, improving detection and retrieval accuracy. Experiments on CUHK-SYSU-TBPS and PRW-TBPS datasets validate the effectiveness of our framework.
title	Uncertainty-Aware Prototype Semantic Decoupling for Text-Based Person Search in Full Images
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2505.03567

Similar Items