Saved in:
Bibliographic Details
Main Authors: Alva, Rodrigo Gabriel Salazar, Nuñez, Matías, López, Cristian, Arista, Javier Martín
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.20111
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915413499052032
author Alva, Rodrigo Gabriel Salazar
Nuñez, Matías
López, Cristian
Arista, Javier Martín
author_facet Alva, Rodrigo Gabriel Salazar
Nuñez, Matías
López, Cristian
Arista, Javier Martín
contents Preserving ancient languages is essential for understanding humanity's cultural and linguistic heritage, yet Old English remains critically under-resourced, limiting its accessibility to modern natural language processing (NLP) techniques. We present a scalable framework that uses advanced large language models (LLMs) to generate high-quality Old English texts, addressing this gap. Our approach combines parameter-efficient fine-tuning (Low-Rank Adaptation, LoRA), data augmentation via backtranslation, and a dual-agent pipeline that separates the tasks of content generation (in English) and translation (into Old English). Evaluation with automated metrics (BLEU, METEOR, and CHRF) shows significant improvements over baseline models, with BLEU scores increasing from 26 to over 65 for English-to-Old English translation. Expert human assessment also confirms high grammatical accuracy and stylistic fidelity in the generated texts. Beyond expanding the Old English corpus, our method offers a practical blueprint for revitalizing other endangered languages, effectively uniting AI innovation with the goals of cultural preservation.
format Preprint
id arxiv_https___arxiv_org_abs_2507_20111
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle AI-Driven Generation of Old English: A Framework for Low-Resource Languages
Alva, Rodrigo Gabriel Salazar
Nuñez, Matías
López, Cristian
Arista, Javier Martín
Computation and Language
Artificial Intelligence
Preserving ancient languages is essential for understanding humanity's cultural and linguistic heritage, yet Old English remains critically under-resourced, limiting its accessibility to modern natural language processing (NLP) techniques. We present a scalable framework that uses advanced large language models (LLMs) to generate high-quality Old English texts, addressing this gap. Our approach combines parameter-efficient fine-tuning (Low-Rank Adaptation, LoRA), data augmentation via backtranslation, and a dual-agent pipeline that separates the tasks of content generation (in English) and translation (into Old English). Evaluation with automated metrics (BLEU, METEOR, and CHRF) shows significant improvements over baseline models, with BLEU scores increasing from 26 to over 65 for English-to-Old English translation. Expert human assessment also confirms high grammatical accuracy and stylistic fidelity in the generated texts. Beyond expanding the Old English corpus, our method offers a practical blueprint for revitalizing other endangered languages, effectively uniting AI innovation with the goals of cultural preservation.
title AI-Driven Generation of Old English: A Framework for Low-Resource Languages
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2507.20111