Saved in:
Bibliographic Details
Main Authors: Chen, Yiyi, Bjerva, Johannes
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2401.01698
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909062467158016
author Chen, Yiyi
Bjerva, Johannes
author_facet Chen, Yiyi
Bjerva, Johannes
contents Language similarities can be caused by genetic relatedness, areal contact, universality, or chance. Colexification, i.e. a type of similarity where a single lexical form is used to convey multiple meanings, is underexplored. In our work, we shed light on the linguistic causes of cross-lingual similarity in colexification and phonology, by exploring genealogical stability (persistence) and contact-induced change (diffusibility). We construct large-scale graphs incorporating semantic, genealogical, phonological and geographical data for 1,966 languages. We then show the potential of this resource, by investigating several established hypotheses from previous work in linguistics, while proposing new ones. Our results strongly support a previously established hypothesis in the linguistic literature, while offering contradicting evidence to another. Our large scale resource opens for further research across disciplines, e.g.~in multilingual NLP and comparative linguistics.
format Preprint
id arxiv_https___arxiv_org_abs_2401_01698
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Patterns of Persistence and Diffusibility across the World's Languages
Chen, Yiyi
Bjerva, Johannes
Computation and Language
Language similarities can be caused by genetic relatedness, areal contact, universality, or chance. Colexification, i.e. a type of similarity where a single lexical form is used to convey multiple meanings, is underexplored. In our work, we shed light on the linguistic causes of cross-lingual similarity in colexification and phonology, by exploring genealogical stability (persistence) and contact-induced change (diffusibility). We construct large-scale graphs incorporating semantic, genealogical, phonological and geographical data for 1,966 languages. We then show the potential of this resource, by investigating several established hypotheses from previous work in linguistics, while proposing new ones. Our results strongly support a previously established hypothesis in the linguistic literature, while offering contradicting evidence to another. Our large scale resource opens for further research across disciplines, e.g.~in multilingual NLP and comparative linguistics.
title Patterns of Persistence and Diffusibility across the World's Languages
topic Computation and Language
url https://arxiv.org/abs/2401.01698