Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xu, Yuting, Tian, Jiayi, Liang, Jian, Xiong, Xin, Zhang, Hang, Xu, Mu, Zhang, Xiao-Yu
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.28683
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910267010449408
author	Xu, Yuting Tian, Jiayi Liang, Jian Xiong, Xin Zhang, Hang Xu, Mu Zhang, Xiao-Yu
author_facet	Xu, Yuting Tian, Jiayi Liang, Jian Xiong, Xin Zhang, Hang Xu, Mu Zhang, Xiao-Yu
contents	Existing benchmarks have laid the foundation for travel planning agents by establishing API-centric paradigms. However, as the capabilities of Autonomous Agents continue to advance, their evaluation must evolve beyond simple tool execution toward handling the inherent complexities of the open web. Current benchmarks bypass core cognitive hurdles: they fail to account for information noise, ignore multi-source factual contradictions, and overlook the necessity of grounding visual perception into logical planning. We introduce VeriTrip, a verifiable benchmark designed to meet the increasing demands for agent robustness and reliability. VeriTrip shifts the evaluation focus to evidence-grounded reasoning over unstructured multimodal web corpora. It establishes a Multimodal Retrieval Base (MRB) derived from real-world sources, forcing agents to autonomously orchestrate queries across heterogeneous data. A synchronized Verifiable Knowledge Base (VKB) enables a cell-wise verification protocol that precisely quantifies factual reliability, distinguishing systematic reasoning failures from parametric hallucinations. Our evaluations across leading MLLMs reveal a critical \textit{retrieval-reasoning trade-off}: the cognitive load of autonomous retrieval significantly erodes instruction retention. VeriTrip provides the rigorous foundation necessary for the next generation of planning agents capable of operating in unconstrained, multimodal environments.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_28683
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	VeriTrip: A Verifiable Benchmark for Travel Planning Agents over Unstructured Web Corpora Xu, Yuting Tian, Jiayi Liang, Jian Xiong, Xin Zhang, Hang Xu, Mu Zhang, Xiao-Yu Artificial Intelligence Existing benchmarks have laid the foundation for travel planning agents by establishing API-centric paradigms. However, as the capabilities of Autonomous Agents continue to advance, their evaluation must evolve beyond simple tool execution toward handling the inherent complexities of the open web. Current benchmarks bypass core cognitive hurdles: they fail to account for information noise, ignore multi-source factual contradictions, and overlook the necessity of grounding visual perception into logical planning. We introduce VeriTrip, a verifiable benchmark designed to meet the increasing demands for agent robustness and reliability. VeriTrip shifts the evaluation focus to evidence-grounded reasoning over unstructured multimodal web corpora. It establishes a Multimodal Retrieval Base (MRB) derived from real-world sources, forcing agents to autonomously orchestrate queries across heterogeneous data. A synchronized Verifiable Knowledge Base (VKB) enables a cell-wise verification protocol that precisely quantifies factual reliability, distinguishing systematic reasoning failures from parametric hallucinations. Our evaluations across leading MLLMs reveal a critical \textit{retrieval-reasoning trade-off}: the cognitive load of autonomous retrieval significantly erodes instruction retention. VeriTrip provides the rigorous foundation necessary for the next generation of planning agents capable of operating in unconstrained, multimodal environments.
title	VeriTrip: A Verifiable Benchmark for Travel Planning Agents over Unstructured Web Corpora
topic	Artificial Intelligence
url	https://arxiv.org/abs/2605.28683

Similar Items