Saved in:
Bibliographic Details
Main Authors: Song, Xinhao, Su, Su, Song, Sirui, Wu, Hongliang, Shen, Wen, Wei, Zhihua, Liu, Gongshen, Zhang, Linfeng, Liu, Dongrui
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2606.02449
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918535319519232
author Song, Xinhao
Su, Su
Song, Sirui
Wu, Hongliang
Shen, Wen
Wei, Zhihua
Liu, Gongshen
Zhang, Linfeng
Liu, Dongrui
author_facet Song, Xinhao
Su, Su
Song, Sirui
Wu, Hongliang
Shen, Wen
Wei, Zhihua
Liu, Gongshen
Zhang, Linfeng
Liu, Dongrui
contents Multimodal agents are increasingly expected to operate interfaces on behalf of users, raising a central deployment question: can they truly substitute for humans in workflows that services deliberately protect against automation? CAPTCHA verification makes this question concrete. It is not merely a visual puzzle, but a human-verification boundary placed before account creation, content access, form submission, and other protected actions. We introduce \textbf{Humanity's Last Line of Verification (HLL)}, a controlled benchmark that uses interactive CAPTCHA verification to evaluate whether agents can cross this boundary through grounded, human-like interaction rather than recognition alone. HLL covers diverse CAPTCHA interactions and exposes agents to controlled realism stressors, including cluttered webpages, harder task variants, and trace-conditioned validation of the solving process. We evaluate eight frontier multimodal agents in a closed-loop GUI environment. The results show that current agents remain brittle at this human-substitution boundary: performance varies sharply across verification types, degrades under realistic interface conditions, and drops further when correct answers must be supported by valid action traces. By exposing gaps in localization, action calibration, state tracking, and process consistency, HLL provides a concrete testbed for measuring how close multimodal agents are to acting as human substitutes in protected real-world workflows. Our code is available at https://github.com/XinhaoS0101/HLL
format Preprint
id arxiv_https___arxiv_org_abs_2606_02449
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle HLL: Can Agents Cross Humanity's Last Line of Verification?
Song, Xinhao
Su, Su
Song, Sirui
Wu, Hongliang
Shen, Wen
Wei, Zhihua
Liu, Gongshen
Zhang, Linfeng
Liu, Dongrui
Artificial Intelligence
Computation and Language
Computer Vision and Pattern Recognition
Machine Learning
Multimedia
Multimodal agents are increasingly expected to operate interfaces on behalf of users, raising a central deployment question: can they truly substitute for humans in workflows that services deliberately protect against automation? CAPTCHA verification makes this question concrete. It is not merely a visual puzzle, but a human-verification boundary placed before account creation, content access, form submission, and other protected actions. We introduce \textbf{Humanity's Last Line of Verification (HLL)}, a controlled benchmark that uses interactive CAPTCHA verification to evaluate whether agents can cross this boundary through grounded, human-like interaction rather than recognition alone. HLL covers diverse CAPTCHA interactions and exposes agents to controlled realism stressors, including cluttered webpages, harder task variants, and trace-conditioned validation of the solving process. We evaluate eight frontier multimodal agents in a closed-loop GUI environment. The results show that current agents remain brittle at this human-substitution boundary: performance varies sharply across verification types, degrades under realistic interface conditions, and drops further when correct answers must be supported by valid action traces. By exposing gaps in localization, action calibration, state tracking, and process consistency, HLL provides a concrete testbed for measuring how close multimodal agents are to acting as human substitutes in protected real-world workflows. Our code is available at https://github.com/XinhaoS0101/HLL
title HLL: Can Agents Cross Humanity's Last Line of Verification?
topic Artificial Intelligence
Computation and Language
Computer Vision and Pattern Recognition
Machine Learning
Multimedia
url https://arxiv.org/abs/2606.02449