Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Barmina, Gianluca, Norman, Nathalie Carmen Hau, Schneider-Kamp, Peter, Poech, Lukas Galke
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2512.04799
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914187248140288
author	Barmina, Gianluca Norman, Nathalie Carmen Hau Schneider-Kamp, Peter Poech, Lukas Galke
author_facet	Barmina, Gianluca Norman, Nathalie Carmen Hau Schneider-Kamp, Peter Poech, Lukas Galke
contents	We present an enhanced benchmark for evaluating linguistic acceptability in Danish. We first analyze the most common errors found in written Danish. Based on this analysis, we introduce a set of fourteen corruption functions that generate incorrect sentences by systematically introducing errors into existing correct Danish sentences. To ensure the accuracy of these corruptions, we assess their validity using both manual and automatic methods. The results are then used as a benchmark for evaluating Large Language Models on a linguistic acceptability judgement task. Our findings demonstrate that this extension is both broader and more comprehensive than the current state of the art. By incorporating a greater variety of corruption types, our benchmark provides a more rigorous assessment of linguistic acceptability, increasing task difficulty, as evidenced by the lower performance of LLMs on our benchmark compared to existing ones. Our results also suggest that our benchmark has a higher discriminatory power which allows to better distinguish well-performing models from low-performing ones.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_04799
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	DaLA: Danish Linguistic Acceptability Evaluation Guided by Real World Errors Barmina, Gianluca Norman, Nathalie Carmen Hau Schneider-Kamp, Peter Poech, Lukas Galke Computation and Language We present an enhanced benchmark for evaluating linguistic acceptability in Danish. We first analyze the most common errors found in written Danish. Based on this analysis, we introduce a set of fourteen corruption functions that generate incorrect sentences by systematically introducing errors into existing correct Danish sentences. To ensure the accuracy of these corruptions, we assess their validity using both manual and automatic methods. The results are then used as a benchmark for evaluating Large Language Models on a linguistic acceptability judgement task. Our findings demonstrate that this extension is both broader and more comprehensive than the current state of the art. By incorporating a greater variety of corruption types, our benchmark provides a more rigorous assessment of linguistic acceptability, increasing task difficulty, as evidenced by the lower performance of LLMs on our benchmark compared to existing ones. Our results also suggest that our benchmark has a higher discriminatory power which allows to better distinguish well-performing models from low-performing ones.
title	DaLA: Danish Linguistic Acceptability Evaluation Guided by Real World Errors
topic	Computation and Language
url	https://arxiv.org/abs/2512.04799

Similar Items