Saved in:
Bibliographic Details
Main Authors: Zhao, Boxiang, Li, Qince, Wang, Zhonghao, Cao, Zelin, Wang, Yi, Cheng, Peng, Lin, Bo
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.19923
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910223041560576
author Zhao, Boxiang
Li, Qince
Wang, Zhonghao
Cao, Zelin
Wang, Yi
Cheng, Peng
Lin, Bo
author_facet Zhao, Boxiang
Li, Qince
Wang, Zhonghao
Cao, Zelin
Wang, Yi
Cheng, Peng
Lin, Bo
contents As Large Language Models (LLMs) evolve into the core of Web-based autonomous agents and complex Web Information Systems, their ability to faithfully translate natural language into rigorous structured formats has become paramount, as this capability is critical for Web API invocation and data exchange. However, evaluating this structural fidelity in Web-native payloads remains a challenge: traditional text metrics fail to capture topological consistency in semi-structured Web data, while manual evaluation is prohibitively costly. To address this, we propose Structure-BiEval, a novel self-supervised framework for quantitative, annotation-free assessment tailored for Web data engineering. By leveraging deterministic Intermediate Representations, our framework effectively decouples structure from content, utilizing Content Semantic Accuracy and Normalized Tree Edit Distance as precise metrics. We empirically benchmark 15 state-of-the-art LLMs across dual Web structural topologies, namely Hierarchical Data (Web backend payloads) and Tabular Data (Web frontend presentation). The results reveal substantial variability in structural performance, including cases where mid-sized models unexpectedly outperform larger counterparts in Web data formatting. Furthermore, our findings show that deep recursive nesting poses a consistent challenge for Web agents across varying parameter scales.
format Preprint
id arxiv_https___arxiv_org_abs_2601_19923
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Structure-BiEval: A Self-Supervised, Dual-Track Framework for Decoupling Structure and Content in LLM Evaluation for Web Information Systems
Zhao, Boxiang
Li, Qince
Wang, Zhonghao
Cao, Zelin
Wang, Yi
Cheng, Peng
Lin, Bo
Computation and Language
Artificial Intelligence
As Large Language Models (LLMs) evolve into the core of Web-based autonomous agents and complex Web Information Systems, their ability to faithfully translate natural language into rigorous structured formats has become paramount, as this capability is critical for Web API invocation and data exchange. However, evaluating this structural fidelity in Web-native payloads remains a challenge: traditional text metrics fail to capture topological consistency in semi-structured Web data, while manual evaluation is prohibitively costly. To address this, we propose Structure-BiEval, a novel self-supervised framework for quantitative, annotation-free assessment tailored for Web data engineering. By leveraging deterministic Intermediate Representations, our framework effectively decouples structure from content, utilizing Content Semantic Accuracy and Normalized Tree Edit Distance as precise metrics. We empirically benchmark 15 state-of-the-art LLMs across dual Web structural topologies, namely Hierarchical Data (Web backend payloads) and Tabular Data (Web frontend presentation). The results reveal substantial variability in structural performance, including cases where mid-sized models unexpectedly outperform larger counterparts in Web data formatting. Furthermore, our findings show that deep recursive nesting poses a consistent challenge for Web agents across varying parameter scales.
title Structure-BiEval: A Self-Supervised, Dual-Track Framework for Decoupling Structure and Content in LLM Evaluation for Web Information Systems
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2601.19923