Enregistré dans:
| Auteurs principaux: | , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Publié: |
2025
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2504.19645 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866909595898740736 |
|---|---|
| author | Sabr, Shadan Shukr Mustafa, Nazira Sabr Omar, Talar Sabah Rasool, Salah Hwayyiz Omer, Nawzad Anwer Hamad, Darya Sabir Shams, Hemin Abdulhameed Kareem, Omer Mahmood Abdullah, Rozhan Noori Abdullah, Khabat Atar Mohammad, Mahabad Azad Al-Raghefy, Haneen Asaad, Safar M. Mohammed, Sara Jamal Ali, Twana Saeed Shawrow, Fazil Maghdid, Halgurd S. |
| author_facet | Sabr, Shadan Shukr Mustafa, Nazira Sabr Omar, Talar Sabah Rasool, Salah Hwayyiz Omer, Nawzad Anwer Hamad, Darya Sabir Shams, Hemin Abdulhameed Kareem, Omer Mahmood Abdullah, Rozhan Noori Abdullah, Khabat Atar Mohammad, Mahabad Azad Al-Raghefy, Haneen Asaad, Safar M. Mohammed, Sara Jamal Ali, Twana Saeed Shawrow, Fazil Maghdid, Halgurd S. |
| contents | - The field of natural language processing (NLP) has dramatically expanded within the last decade. Many human-being applications are conducted daily via NLP tasks, starting from machine translation, speech recognition, text generation and recommendations, Part-of-Speech tagging (POS), and Named-Entity Recognition (NER). However, low-resourced languages, such as the Central-Kurdish language (CKL), mainly remain unexamined due to shortage of necessary resources to support their development. The POS tagging task is the base of other NLP tasks; for example, the POS tag set has been used to standardized languages to provide the relationship between words among the sentences, followed by machine translation and text recommendation. Specifically, for the CKL, most of the utilized or provided POS tagsets are neither standardized nor comprehensive. To this end, this study presented an accurate and comprehensive POS tagset for the CKL to provide better performance of the Kurdish NLP tasks. The article also collected most of the POS tags from different studies as well as from Kurdish linguistic experts to standardized part-of-speech tags. The proposed POS tagset is designed to annotate a large CKL corpus and support Kurdish NLP tasks. The initial investigations of this study via comparison with the Universal Dependencies framework for standard languages, show that the proposed POS tagset can streamline or correct sentences more accurately for Kurdish NLP tasks. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2504_19645 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | A Comprehensive Part-of-Speech Tagging to Standardize Central-Kurdish Language: A Research Guide for Kurdish Natural Language Processing Tasks Sabr, Shadan Shukr Mustafa, Nazira Sabr Omar, Talar Sabah Rasool, Salah Hwayyiz Omer, Nawzad Anwer Hamad, Darya Sabir Shams, Hemin Abdulhameed Kareem, Omer Mahmood Abdullah, Rozhan Noori Abdullah, Khabat Atar Mohammad, Mahabad Azad Al-Raghefy, Haneen Asaad, Safar M. Mohammed, Sara Jamal Ali, Twana Saeed Shawrow, Fazil Maghdid, Halgurd S. Computation and Language Artificial Intelligence K.5; K.7; J.7 - The field of natural language processing (NLP) has dramatically expanded within the last decade. Many human-being applications are conducted daily via NLP tasks, starting from machine translation, speech recognition, text generation and recommendations, Part-of-Speech tagging (POS), and Named-Entity Recognition (NER). However, low-resourced languages, such as the Central-Kurdish language (CKL), mainly remain unexamined due to shortage of necessary resources to support their development. The POS tagging task is the base of other NLP tasks; for example, the POS tag set has been used to standardized languages to provide the relationship between words among the sentences, followed by machine translation and text recommendation. Specifically, for the CKL, most of the utilized or provided POS tagsets are neither standardized nor comprehensive. To this end, this study presented an accurate and comprehensive POS tagset for the CKL to provide better performance of the Kurdish NLP tasks. The article also collected most of the POS tags from different studies as well as from Kurdish linguistic experts to standardized part-of-speech tags. The proposed POS tagset is designed to annotate a large CKL corpus and support Kurdish NLP tasks. The initial investigations of this study via comparison with the Universal Dependencies framework for standard languages, show that the proposed POS tagset can streamline or correct sentences more accurately for Kurdish NLP tasks. |
| title | A Comprehensive Part-of-Speech Tagging to Standardize Central-Kurdish Language: A Research Guide for Kurdish Natural Language Processing Tasks |
| topic | Computation and Language Artificial Intelligence K.5; K.7; J.7 |
| url | https://arxiv.org/abs/2504.19645 |