Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kharlamova, Darya, Proskurina, Irina
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2603.07366
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917322106601472
author	Kharlamova, Darya Proskurina, Irina
author_facet	Kharlamova, Darya Proskurina, Irina
contents	Many errors in student essays can be explained by influence from the native language (L1). L1 interference refers to errors influenced by a speaker's first language, such as using stadion instead of stadium, reflecting lexical transliteration from Russian. In this work, we address the task of detecting such errors in English essays written by Russian-speaking learners. We introduce RILEC, a large-scale dataset of over 18,000 sentences, combining expert-annotated data from REALEC with synthetic examples generated through rule-based and neural augmentation. We propose a framework for generating L1-motivated errors using generative language models optimized with PPO, prompt-based control, and rule-based patterns. Models fine-tuned on RILEC achieve strong performance, particularly on word-level interference types such as transliteration and tense semantics. We find that the proposed augmentation pipeline leads to a significant performance improvement, making it a potentially valuable tool for learners and teachers to more effectively identify and address such errors.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_07366
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	RILEC: Detection and Generation of L1 Russian Interference Errors in English Learner Texts Kharlamova, Darya Proskurina, Irina Computation and Language Many errors in student essays can be explained by influence from the native language (L1). L1 interference refers to errors influenced by a speaker's first language, such as using stadion instead of stadium, reflecting lexical transliteration from Russian. In this work, we address the task of detecting such errors in English essays written by Russian-speaking learners. We introduce RILEC, a large-scale dataset of over 18,000 sentences, combining expert-annotated data from REALEC with synthetic examples generated through rule-based and neural augmentation. We propose a framework for generating L1-motivated errors using generative language models optimized with PPO, prompt-based control, and rule-based patterns. Models fine-tuned on RILEC achieve strong performance, particularly on word-level interference types such as transliteration and tense semantics. We find that the proposed augmentation pipeline leads to a significant performance improvement, making it a potentially valuable tool for learners and teachers to more effectively identify and address such errors.
title	RILEC: Detection and Generation of L1 Russian Interference Errors in English Learner Texts
topic	Computation and Language
url	https://arxiv.org/abs/2603.07366

Similar Items