Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Lin, Junfeng, Xiu, Yanming, Gorlatova, Maria
Format:	Preprint
Publié:	2026
Sujets:	Computer Vision and Pattern Recognition
Accès en ligne:	https://arxiv.org/abs/2601.23281
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866918316107366400
author	Lin, Junfeng Xiu, Yanming Gorlatova, Maria
author_facet	Lin, Junfeng Xiu, Yanming Gorlatova, Maria
contents	Open-set object detection (OSOD) localizes objects while identifying and rejecting unknown classes at inference. While recent OSOD models perform well on benchmarks, their behavior under realistic user prompting remains underexplored. In interactive XR settings, user-generated prompts are often ambiguous, underspecified, or overly detailed. To study prompt-conditioned robustness, we evaluate two OSOD models, GroundingDINO and YOLO-E, on real-world XR images and simulate diverse user prompting behaviors using vision-language models. We consider four prompt types: standard, underdetailed, overdetailed, and pragmatically ambiguous, and examine the impact of two enhancement strategies on these prompts. Results show that both models exhibit stable performance under underdetailed and standard prompts, while they suffer degradation under ambiguous prompts. Overdetailed prompts primarily affect GroundingDINO. Prompt enhancement substantially improves robustness under ambiguity, yielding gains exceeding 55% mIoU and 41% average confidence. Based on the findings, we propose several prompting strategies and prompt enhancement methods for OSOD models in XR environments.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_23281
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	User Prompting Strategies and Prompt Enhancement Methods for Open-Set Object Detection in XR Environments Lin, Junfeng Xiu, Yanming Gorlatova, Maria Computer Vision and Pattern Recognition Open-set object detection (OSOD) localizes objects while identifying and rejecting unknown classes at inference. While recent OSOD models perform well on benchmarks, their behavior under realistic user prompting remains underexplored. In interactive XR settings, user-generated prompts are often ambiguous, underspecified, or overly detailed. To study prompt-conditioned robustness, we evaluate two OSOD models, GroundingDINO and YOLO-E, on real-world XR images and simulate diverse user prompting behaviors using vision-language models. We consider four prompt types: standard, underdetailed, overdetailed, and pragmatically ambiguous, and examine the impact of two enhancement strategies on these prompts. Results show that both models exhibit stable performance under underdetailed and standard prompts, while they suffer degradation under ambiguous prompts. Overdetailed prompts primarily affect GroundingDINO. Prompt enhancement substantially improves robustness under ambiguity, yielding gains exceeding 55% mIoU and 41% average confidence. Based on the findings, we propose several prompting strategies and prompt enhancement methods for OSOD models in XR environments.
title	User Prompting Strategies and Prompt Enhancement Methods for Open-Set Object Detection in XR Environments
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2601.23281

Documents similaires