Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhao, Boxiang, Li, Qince, Wang, Zhonghao, Cao, Zelin, Wang, Yi, Cheng, Peng, Lin, Bo
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.19923
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910223041560576
author	Zhao, Boxiang Li, Qince Wang, Zhonghao Cao, Zelin Wang, Yi Cheng, Peng Lin, Bo
author_facet	Zhao, Boxiang Li, Qince Wang, Zhonghao Cao, Zelin Wang, Yi Cheng, Peng Lin, Bo
contents	As Large Language Models (LLMs) evolve into the core of Web-based autonomous agents and complex Web Information Systems, their ability to faithfully translate natural language into rigorous structured formats has become paramount, as this capability is critical for Web API invocation and data exchange. However, evaluating this structural fidelity in Web-native payloads remains a challenge: traditional text metrics fail to capture topological consistency in semi-structured Web data, while manual evaluation is prohibitively costly. To address this, we propose Structure-BiEval, a novel self-supervised framework for quantitative, annotation-free assessment tailored for Web data engineering. By leveraging deterministic Intermediate Representations, our framework effectively decouples structure from content, utilizing Content Semantic Accuracy and Normalized Tree Edit Distance as precise metrics. We empirically benchmark 15 state-of-the-art LLMs across dual Web structural topologies, namely Hierarchical Data (Web backend payloads) and Tabular Data (Web frontend presentation). The results reveal substantial variability in structural performance, including cases where mid-sized models unexpectedly outperform larger counterparts in Web data formatting. Furthermore, our findings show that deep recursive nesting poses a consistent challenge for Web agents across varying parameter scales.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_19923
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Structure-BiEval: A Self-Supervised, Dual-Track Framework for Decoupling Structure and Content in LLM Evaluation for Web Information Systems Zhao, Boxiang Li, Qince Wang, Zhonghao Cao, Zelin Wang, Yi Cheng, Peng Lin, Bo Computation and Language Artificial Intelligence As Large Language Models (LLMs) evolve into the core of Web-based autonomous agents and complex Web Information Systems, their ability to faithfully translate natural language into rigorous structured formats has become paramount, as this capability is critical for Web API invocation and data exchange. However, evaluating this structural fidelity in Web-native payloads remains a challenge: traditional text metrics fail to capture topological consistency in semi-structured Web data, while manual evaluation is prohibitively costly. To address this, we propose Structure-BiEval, a novel self-supervised framework for quantitative, annotation-free assessment tailored for Web data engineering. By leveraging deterministic Intermediate Representations, our framework effectively decouples structure from content, utilizing Content Semantic Accuracy and Normalized Tree Edit Distance as precise metrics. We empirically benchmark 15 state-of-the-art LLMs across dual Web structural topologies, namely Hierarchical Data (Web backend payloads) and Tabular Data (Web frontend presentation). The results reveal substantial variability in structural performance, including cases where mid-sized models unexpectedly outperform larger counterparts in Web data formatting. Furthermore, our findings show that deep recursive nesting poses a consistent challenge for Web agents across varying parameter scales.
title	Structure-BiEval: A Self-Supervised, Dual-Track Framework for Decoupling Structure and Content in LLM Evaluation for Web Information Systems
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2601.19923

Similar Items