Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Haraguchi, Daichi
Format:	Preprint
Published:	2026
Subjects:	Human-Computer Interaction Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.23746
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

High text recognition performance does not guarantee that Vision-Language Models (VLMs) share human-like decision patterns when resolving ambiguity. We investigate this behavioral gap by directly comparing humans and VLMs using continuously interpolated Japanese character shapes generated via a $β$-VAE. We estimate decision boundaries in a single-character recognition (shape-only task) and evaluate whether VLM responses align with human judgments under shape in context (i.e., embedding an ambiguous character near the human decision boundary in word-level context). We find that human and VLM decision boundaries differ in the shape-only task, and that shape in context can improve human alignment in some conditions. These results highlight qualitative behavioral differences, offering foundational insights toward human--VLM alignment benchmarking.

Similar Items