Saved in:
Bibliographic Details
Main Authors: Kim, Sojung Lucia, Jang, Taehong, Ahn, Joonmo
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.11368
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910529042251776
author Kim, Sojung Lucia
Jang, Taehong
Ahn, Joonmo
author_facet Kim, Sojung Lucia
Jang, Taehong
Ahn, Joonmo
contents This study aims to compare three methods for translating ancient texts with sparse corpora: (1) the traditional statistical translation method of phrase alignment, (2) in-context LLM learning, and (3) proposed inter methodological approach - statistical machine translation method using sentence piece tokens derived from unified set of source-target corpus. The performance of the proposed approach in this study is 36.71 in BLEU score, surpassing the scores of SOLAR-10.7B context learning and the best existing Seq2Seq model. Further analysis and discussion are presented.
format Preprint
id arxiv_https___arxiv_org_abs_2407_11368
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Ancient Korean Archive Translation: Comparison Analysis on Statistical phrase alignment, LLM in-context learning, and inter-methodological approach
Kim, Sojung Lucia
Jang, Taehong
Ahn, Joonmo
Computation and Language
This study aims to compare three methods for translating ancient texts with sparse corpora: (1) the traditional statistical translation method of phrase alignment, (2) in-context LLM learning, and (3) proposed inter methodological approach - statistical machine translation method using sentence piece tokens derived from unified set of source-target corpus. The performance of the proposed approach in this study is 36.71 in BLEU score, surpassing the scores of SOLAR-10.7B context learning and the best existing Seq2Seq model. Further analysis and discussion are presented.
title Ancient Korean Archive Translation: Comparison Analysis on Statistical phrase alignment, LLM in-context learning, and inter-methodological approach
topic Computation and Language
url https://arxiv.org/abs/2407.11368