Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.07475 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911758316208128 |
|---|---|
| author | Wei, Chengwei Pang, Runqi Kuo, C. -C. Jay |
| author_facet | Wei, Chengwei Pang, Runqi Kuo, C. -C. Jay |
| contents | As a fundamental tool for natural language processing (NLP), the part-of-speech (POS) tagger assigns the POS label to each word in a sentence. A novel lightweight POS tagger based on word embeddings is proposed and named GWPT (green word-embedding-based POS tagger) in this work. Following the green learning (GL) methodology, GWPT contains three modules in cascade: 1) representation learning, 2) feature learning, and 3) decision learning modules. The main novelty of GWPT lies in representation learning. It uses non-contextual or contextual word embeddings, partitions embedding dimension indices into low-, medium-, and high-frequency sets, and represents them with different N-grams. It is shown by experimental results that GWPT offers state-of-the-art accuracies with fewer model parameters and significantly lower computational complexity in both training and inference as compared with deep-learning-based methods. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2401_07475 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | GWPT: A Green Word-Embedding-based POS Tagger Wei, Chengwei Pang, Runqi Kuo, C. -C. Jay Computation and Language As a fundamental tool for natural language processing (NLP), the part-of-speech (POS) tagger assigns the POS label to each word in a sentence. A novel lightweight POS tagger based on word embeddings is proposed and named GWPT (green word-embedding-based POS tagger) in this work. Following the green learning (GL) methodology, GWPT contains three modules in cascade: 1) representation learning, 2) feature learning, and 3) decision learning modules. The main novelty of GWPT lies in representation learning. It uses non-contextual or contextual word embeddings, partitions embedding dimension indices into low-, medium-, and high-frequency sets, and represents them with different N-grams. It is shown by experimental results that GWPT offers state-of-the-art accuracies with fewer model parameters and significantly lower computational complexity in both training and inference as compared with deep-learning-based methods. |
| title | GWPT: A Green Word-Embedding-based POS Tagger |
| topic | Computation and Language |
| url | https://arxiv.org/abs/2401.07475 |