Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Dubey, Mradul
Format: Recurso digital
Sprache:
Veröffentlicht: Zenodo 2026
Schlagworte:
Online-Zugang:https://doi.org/10.5281/zenodo.19557723
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Inhaltsangabe:
  • <p>Adding structured detection metadata to vision-language model prompts systematically degrades visual reasoning due to anchoring bias, and the delivery channel determines the magnitude. Across seven controlled conditions on a surveillance scene, text-encoded bounding boxes dropped visual reasoning to 53%, visual overlays preserved 69%, and cross-modal ID-mapping collapsed to 47%, despite having smaller text to image token ratio. Plausibly positioned fabricated detections pass unchallenged; the metadata cost on visual perception is monotonic for scene description case. This repository contains all raw prompts, model responses, scoring rubrics, and reproducibility artifacts for the study.</p>