Saved in:
Bibliographic Details
Main Author: Li, Wenxi
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2506.04887
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916781089619968
author Li, Wenxi
author_facet Li, Wenxi
contents Universal Dependencies (UD), while widely regarded as the most successful linguistic framework for cross-lingual syntactic representation, remains underexplored in terms of its effectiveness. This paper addresses this gap by integrating UD into pretrained language models and assesses if UD can improve their performance on a cross-lingual adversarial paraphrase identification task. Experimental results show that incorporation of UD yields significant improvements in accuracy and $F_1$ scores, with average gains of 3.85\% and 6.08\% respectively. These enhancements reduce the performance gap between pretrained models and large language models in some language pairs, and even outperform the latter in some others. Furthermore, the UD-based similarity score between a given language and English is positively correlated to the performance of models in that language. Both findings highlight the validity and potential of UD in out-of-domain tasks.
format Preprint
id arxiv_https___arxiv_org_abs_2506_04887
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Evaluating the Effectiveness of Linguistic Knowledge in Pretrained Language Models: A Case Study of Universal Dependencies
Li, Wenxi
Computation and Language
Universal Dependencies (UD), while widely regarded as the most successful linguistic framework for cross-lingual syntactic representation, remains underexplored in terms of its effectiveness. This paper addresses this gap by integrating UD into pretrained language models and assesses if UD can improve their performance on a cross-lingual adversarial paraphrase identification task. Experimental results show that incorporation of UD yields significant improvements in accuracy and $F_1$ scores, with average gains of 3.85\% and 6.08\% respectively. These enhancements reduce the performance gap between pretrained models and large language models in some language pairs, and even outperform the latter in some others. Furthermore, the UD-based similarity score between a given language and English is positively correlated to the performance of models in that language. Both findings highlight the validity and potential of UD in out-of-domain tasks.
title Evaluating the Effectiveness of Linguistic Knowledge in Pretrained Language Models: A Case Study of Universal Dependencies
topic Computation and Language
url https://arxiv.org/abs/2506.04887