Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.19923 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866910223041560576 |
|---|---|
| author | Zhao, Boxiang Li, Qince Wang, Zhonghao Cao, Zelin Wang, Yi Cheng, Peng Lin, Bo |
| author_facet | Zhao, Boxiang Li, Qince Wang, Zhonghao Cao, Zelin Wang, Yi Cheng, Peng Lin, Bo |
| contents | As Large Language Models (LLMs) evolve into the core of Web-based autonomous agents and complex Web Information Systems, their ability to faithfully translate natural language into rigorous structured formats has become paramount, as this capability is critical for Web API invocation and data exchange. However, evaluating this structural fidelity in Web-native payloads remains a challenge: traditional text metrics fail to capture topological consistency in semi-structured Web data, while manual evaluation is prohibitively costly. To address this, we propose Structure-BiEval, a novel self-supervised framework for quantitative, annotation-free assessment tailored for Web data engineering. By leveraging deterministic Intermediate Representations, our framework effectively decouples structure from content, utilizing Content Semantic Accuracy and Normalized Tree Edit Distance as precise metrics. We empirically benchmark 15 state-of-the-art LLMs across dual Web structural topologies, namely Hierarchical Data (Web backend payloads) and Tabular Data (Web frontend presentation). The results reveal substantial variability in structural performance, including cases where mid-sized models unexpectedly outperform larger counterparts in Web data formatting. Furthermore, our findings show that deep recursive nesting poses a consistent challenge for Web agents across varying parameter scales. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2601_19923 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Structure-BiEval: A Self-Supervised, Dual-Track Framework for Decoupling Structure and Content in LLM Evaluation for Web Information Systems Zhao, Boxiang Li, Qince Wang, Zhonghao Cao, Zelin Wang, Yi Cheng, Peng Lin, Bo Computation and Language Artificial Intelligence As Large Language Models (LLMs) evolve into the core of Web-based autonomous agents and complex Web Information Systems, their ability to faithfully translate natural language into rigorous structured formats has become paramount, as this capability is critical for Web API invocation and data exchange. However, evaluating this structural fidelity in Web-native payloads remains a challenge: traditional text metrics fail to capture topological consistency in semi-structured Web data, while manual evaluation is prohibitively costly. To address this, we propose Structure-BiEval, a novel self-supervised framework for quantitative, annotation-free assessment tailored for Web data engineering. By leveraging deterministic Intermediate Representations, our framework effectively decouples structure from content, utilizing Content Semantic Accuracy and Normalized Tree Edit Distance as precise metrics. We empirically benchmark 15 state-of-the-art LLMs across dual Web structural topologies, namely Hierarchical Data (Web backend payloads) and Tabular Data (Web frontend presentation). The results reveal substantial variability in structural performance, including cases where mid-sized models unexpectedly outperform larger counterparts in Web data formatting. Furthermore, our findings show that deep recursive nesting poses a consistent challenge for Web agents across varying parameter scales. |
| title | Structure-BiEval: A Self-Supervised, Dual-Track Framework for Decoupling Structure and Content in LLM Evaluation for Web Information Systems |
| topic | Computation and Language Artificial Intelligence |
| url | https://arxiv.org/abs/2601.19923 |