Saved in:
Bibliographic Details
Main Author: Kim, JaeHo
Format: Recurso digital
Language:English
Published: Zenodo 2026
Subjects:
Online Access:https://doi.org/10.5281/zenodo.20390729
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • <p>Classical readability formulas predict text difficulty from surface counts and bring grammatical structure into the calculation only through sentence length. This paper introduces structural-lexical friction — the cognitive cost a reader pays when lexical and syntactic demands rise together — and measures it with a transparent index set across 16,904 English passages. The corpus pulls from four high-density informational registers (anchored by the Korean College Scholastic Ability Test and LogiQA) and five baseline registers. A pre-pooling Mann-Whitney U test shows that CSAT and LogiQA cannot be combined: every parser-derived index distinguishes them. The headline ablation runs in two stages — length-controlled classification, then a length-matched sub-corpus restricted to 150–190 words per passage. Parser-aware features yield a reliable AUC gain over surface-only baselines (p<10^(-4), paired DeLong), and the gap grows as length is held first by feature exclusion and then by sample selection. A word-count-matched adversarial design then separates the construct cleanly. Mean Clausal Depth (MCD) shifts by 1.36 units (p<10^(-4)) while Flesch Reading Ease registers no detectable movement. Three independent dependency parsers reproduce the headline signal, ruling out a single-pipeline artefact. What looks like redundancy of parser-aware indices in conventional readability tasks turns out to be mechanical, since Flesch's W/S term absorbs clausal embedding through sentence length; once length is pinned down physically, the structural signal recovers in full.</p>