Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Santos, Eduardo, Carvalho, Juliana, Rennó-Costa, César
Format: Preprint
Veröffentlicht: 2026
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2605.04857
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866915983970533376
author Santos, Eduardo
Carvalho, Juliana
Rennó-Costa, César
author_facet Santos, Eduardo
Carvalho, Juliana
Rennó-Costa, César
contents This paper presents the development and validation of an eye-tracking dataset designed to investigate how second-language (L2) learners process idiomatic expressions. While native speakers often rely on direct retrieval of figurative meanings, L2 speakers frequently adopt a literal-first approach, which incurs measurable cognitive costs. This resource captures these costs through ocular metrics recorded from Portuguese L1 speakers of English across all CEFR proficiency levels (A1-C2). Although the study uses entry-level 60 Hz hardware (Tobii Pro Spark), we demonstrate that this sampling rate provides sufficient data density to detect macro-cognitive events such as fixations and regressions in reading. Preliminary analysis validates the dataset by revealing a strong inverse correlation between language proficiency and regressive eye movements. Integrated into the MIA (Modeling Idiomaticity in Human and Artificial Language Processing) initiative, this dataset serves as a cognitively grounded benchmark for evaluating both human processing models and the alignment of large language models with human-like figurative understanding.
format Preprint
id arxiv_https___arxiv_org_abs_2605_04857
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Assessing Cognitive Effort in L2 Idiomatic Processing: An Eye-Tracking Dataset
Santos, Eduardo
Carvalho, Juliana
Rennó-Costa, César
Computation and Language
Artificial Intelligence
Computer Vision and Pattern Recognition
This paper presents the development and validation of an eye-tracking dataset designed to investigate how second-language (L2) learners process idiomatic expressions. While native speakers often rely on direct retrieval of figurative meanings, L2 speakers frequently adopt a literal-first approach, which incurs measurable cognitive costs. This resource captures these costs through ocular metrics recorded from Portuguese L1 speakers of English across all CEFR proficiency levels (A1-C2). Although the study uses entry-level 60 Hz hardware (Tobii Pro Spark), we demonstrate that this sampling rate provides sufficient data density to detect macro-cognitive events such as fixations and regressions in reading. Preliminary analysis validates the dataset by revealing a strong inverse correlation between language proficiency and regressive eye movements. Integrated into the MIA (Modeling Idiomaticity in Human and Artificial Language Processing) initiative, this dataset serves as a cognitively grounded benchmark for evaluating both human processing models and the alignment of large language models with human-like figurative understanding.
title Assessing Cognitive Effort in L2 Idiomatic Processing: An Eye-Tracking Dataset
topic Computation and Language
Artificial Intelligence
Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2605.04857