Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Song, Xinhao, Su, Su, Song, Sirui, Wu, Hongliang, Shen, Wen, Wei, Zhihua, Liu, Gongshen, Zhang, Linfeng, Liu, Dongrui
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Machine Learning Multimedia
Online Access:	https://arxiv.org/abs/2606.02449
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918535319519232
author	Song, Xinhao Su, Su Song, Sirui Wu, Hongliang Shen, Wen Wei, Zhihua Liu, Gongshen Zhang, Linfeng Liu, Dongrui
author_facet	Song, Xinhao Su, Su Song, Sirui Wu, Hongliang Shen, Wen Wei, Zhihua Liu, Gongshen Zhang, Linfeng Liu, Dongrui
contents	Multimodal agents are increasingly expected to operate interfaces on behalf of users, raising a central deployment question: can they truly substitute for humans in workflows that services deliberately protect against automation? CAPTCHA verification makes this question concrete. It is not merely a visual puzzle, but a human-verification boundary placed before account creation, content access, form submission, and other protected actions. We introduce \textbf{Humanity's Last Line of Verification (HLL)}, a controlled benchmark that uses interactive CAPTCHA verification to evaluate whether agents can cross this boundary through grounded, human-like interaction rather than recognition alone. HLL covers diverse CAPTCHA interactions and exposes agents to controlled realism stressors, including cluttered webpages, harder task variants, and trace-conditioned validation of the solving process. We evaluate eight frontier multimodal agents in a closed-loop GUI environment. The results show that current agents remain brittle at this human-substitution boundary: performance varies sharply across verification types, degrades under realistic interface conditions, and drops further when correct answers must be supported by valid action traces. By exposing gaps in localization, action calibration, state tracking, and process consistency, HLL provides a concrete testbed for measuring how close multimodal agents are to acting as human substitutes in protected real-world workflows. Our code is available at https://github.com/XinhaoS0101/HLL
format	Preprint
id	arxiv_https___arxiv_org_abs_2606_02449
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	HLL: Can Agents Cross Humanity's Last Line of Verification? Song, Xinhao Su, Su Song, Sirui Wu, Hongliang Shen, Wen Wei, Zhihua Liu, Gongshen Zhang, Linfeng Liu, Dongrui Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Machine Learning Multimedia Multimodal agents are increasingly expected to operate interfaces on behalf of users, raising a central deployment question: can they truly substitute for humans in workflows that services deliberately protect against automation? CAPTCHA verification makes this question concrete. It is not merely a visual puzzle, but a human-verification boundary placed before account creation, content access, form submission, and other protected actions. We introduce \textbf{Humanity's Last Line of Verification (HLL)}, a controlled benchmark that uses interactive CAPTCHA verification to evaluate whether agents can cross this boundary through grounded, human-like interaction rather than recognition alone. HLL covers diverse CAPTCHA interactions and exposes agents to controlled realism stressors, including cluttered webpages, harder task variants, and trace-conditioned validation of the solving process. We evaluate eight frontier multimodal agents in a closed-loop GUI environment. The results show that current agents remain brittle at this human-substitution boundary: performance varies sharply across verification types, degrades under realistic interface conditions, and drops further when correct answers must be supported by valid action traces. By exposing gaps in localization, action calibration, state tracking, and process consistency, HLL provides a concrete testbed for measuring how close multimodal agents are to acting as human substitutes in protected real-world workflows. Our code is available at https://github.com/XinhaoS0101/HLL
title	HLL: Can Agents Cross Humanity's Last Line of Verification?
topic	Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Machine Learning Multimedia
url	https://arxiv.org/abs/2606.02449

Similar Items