Gespeichert in:
| 1. Verfasser: | |
|---|---|
| Format: | Recurso digital |
| Sprache: | |
| Veröffentlicht: |
Zenodo
2026
|
| Schlagworte: | |
| Online-Zugang: | https://doi.org/10.5281/zenodo.19557723 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Inhaltsangabe:
- <p>Adding structured detection metadata to vision-language model prompts systematically degrades visual reasoning due to anchoring bias, and the delivery channel determines the magnitude. Across seven controlled conditions on a surveillance scene, text-encoded bounding boxes dropped visual reasoning to 53%, visual overlays preserved 69%, and cross-modal ID-mapping collapsed to 47%, despite having smaller text to image token ratio. Plausibly positioned fabricated detections pass unchallenged; the metadata cost on visual perception is monotonic for scene description case. This repository contains all raw prompts, model responses, scoring rubrics, and reproducibility artifacts for the study.</p>