Saved in:
Bibliographic Details
Main Authors: Moore, Andrew, Rayson, Paul, Archer, Dawn, Czerniak, Tim, Knight, Dawn, Lal, Daisy, Donnchadha, Gearóid Ó, Meachair, Mícheál Ó, Piao, Scott, Dhonnchadha, Elaine Uí, Vuorinen, Johanna, Yabo, Yan, Yang, Xiaobin
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.09648
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910043224408064
author Moore, Andrew
Rayson, Paul
Archer, Dawn
Czerniak, Tim
Knight, Dawn
Lal, Daisy
Donnchadha, Gearóid Ó
Meachair, Mícheál Ó
Piao, Scott
Dhonnchadha, Elaine Uí
Vuorinen, Johanna
Yabo, Yan
Yang, Xiaobin
author_facet Moore, Andrew
Rayson, Paul
Archer, Dawn
Czerniak, Tim
Knight, Dawn
Lal, Daisy
Donnchadha, Gearóid Ó
Meachair, Mícheál Ó
Piao, Scott
Dhonnchadha, Elaine Uí
Vuorinen, Johanna
Yabo, Yan
Yang, Xiaobin
contents Word Sense Disambiguation (WSD) has been widely evaluated using the semantic frameworks of WordNet, BabelNet, and the Oxford Dictionary of English. However, for the UCREL Semantic Analysis System (USAS) framework, no open extensive evaluation has been performed beyond lexical coverage or single language evaluation. In this work, we perform the largest semantic tagging evaluation of the rule based system that uses the lexical resources in the USAS framework covering five different languages using four existing datasets and one novel Chinese dataset. We create a new silver labelled English dataset, to overcome the lack of manually tagged training data, that we train and evaluate various mono and multilingual neural models in both mono and cross-lingual evaluation setups with comparisons to their rule based counterparts, and show how a rule based system can be enhanced with a neural network model. The resulting neural network models, including the data they were trained on, the Chinese evaluation dataset, and all of the code have been released as open resources.
format Preprint
id arxiv_https___arxiv_org_abs_2601_09648
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Creating a Hybrid Rule and Neural Network Based Semantic Tagger using Silver Standard Data: the PyMUSAS framework for Multilingual Semantic Annotation
Moore, Andrew
Rayson, Paul
Archer, Dawn
Czerniak, Tim
Knight, Dawn
Lal, Daisy
Donnchadha, Gearóid Ó
Meachair, Mícheál Ó
Piao, Scott
Dhonnchadha, Elaine Uí
Vuorinen, Johanna
Yabo, Yan
Yang, Xiaobin
Computation and Language
Word Sense Disambiguation (WSD) has been widely evaluated using the semantic frameworks of WordNet, BabelNet, and the Oxford Dictionary of English. However, for the UCREL Semantic Analysis System (USAS) framework, no open extensive evaluation has been performed beyond lexical coverage or single language evaluation. In this work, we perform the largest semantic tagging evaluation of the rule based system that uses the lexical resources in the USAS framework covering five different languages using four existing datasets and one novel Chinese dataset. We create a new silver labelled English dataset, to overcome the lack of manually tagged training data, that we train and evaluate various mono and multilingual neural models in both mono and cross-lingual evaluation setups with comparisons to their rule based counterparts, and show how a rule based system can be enhanced with a neural network model. The resulting neural network models, including the data they were trained on, the Chinese evaluation dataset, and all of the code have been released as open resources.
title Creating a Hybrid Rule and Neural Network Based Semantic Tagger using Silver Standard Data: the PyMUSAS framework for Multilingual Semantic Annotation
topic Computation and Language
url https://arxiv.org/abs/2601.09648