Saved in:
Bibliographic Details
Main Authors: Luo, Hanjun, Jin, Yingbin, Li, Xinfeng, Liu, Xuecheng, Chen, Ruizhe, Shang, Tong, Wang, Kun, Wen, Qingsong, Liu, Zuozhu
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2409.11022
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914568111915008
author Luo, Hanjun
Jin, Yingbin
Li, Xinfeng
Liu, Xuecheng
Chen, Ruizhe
Shang, Tong
Wang, Kun
Wen, Qingsong
Liu, Zuozhu
author_facet Luo, Hanjun
Jin, Yingbin
Li, Xinfeng
Liu, Xuecheng
Chen, Ruizhe
Shang, Tong
Wang, Kun
Wen, Qingsong
Liu, Zuozhu
contents The advancements of Large Language Models (LLMs) have spurred a growing interest in their application to Named Entity Recognition (NER) methods. However, existing datasets are primarily designed for traditional machine learning methods and are inadequate for LLM-based methods, in terms of corpus selection and overall dataset design logic. Moreover, the prevalent fixed and relatively coarse-grained entity categorization in existing datasets fails to adequately assess the superior generalization and contextual understanding capabilities of LLM-based methods, thereby hindering a comprehensive demonstration of their broad application prospects. To address these limitations, we propose DynamicNER, the first NER dataset designed for LLM-based methods with dynamic categorization, introducing various entity types and entity type lists for the same entity in different context, leveraging the generalization of LLM-based NER better. The dataset is also multilingual and multi-granular, covering 8 languages and 155 entity types, with corpora spanning a diverse range of domains. Furthermore, we introduce CascadeNER, a novel NER method based on a two-stage strategy and lightweight LLMs, achieving higher accuracy on fine-grained tasks while requiring fewer computational resources. Experiments show that DynamicNER serves as a robust and effective benchmark for LLM-based NER methods. Furthermore, we also conduct analysis for traditional methods and LLM-based methods on our dataset. Our code and dataset are openly available at https://github.com/Astarojth/DynamicNER.
format Preprint
id arxiv_https___arxiv_org_abs_2409_11022
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle DynamicNER: A Dynamic, Multilingual, and Fine-Grained Dataset for LLM-based Named Entity Recognition
Luo, Hanjun
Jin, Yingbin
Li, Xinfeng
Liu, Xuecheng
Chen, Ruizhe
Shang, Tong
Wang, Kun
Wen, Qingsong
Liu, Zuozhu
Computation and Language
Artificial Intelligence
The advancements of Large Language Models (LLMs) have spurred a growing interest in their application to Named Entity Recognition (NER) methods. However, existing datasets are primarily designed for traditional machine learning methods and are inadequate for LLM-based methods, in terms of corpus selection and overall dataset design logic. Moreover, the prevalent fixed and relatively coarse-grained entity categorization in existing datasets fails to adequately assess the superior generalization and contextual understanding capabilities of LLM-based methods, thereby hindering a comprehensive demonstration of their broad application prospects. To address these limitations, we propose DynamicNER, the first NER dataset designed for LLM-based methods with dynamic categorization, introducing various entity types and entity type lists for the same entity in different context, leveraging the generalization of LLM-based NER better. The dataset is also multilingual and multi-granular, covering 8 languages and 155 entity types, with corpora spanning a diverse range of domains. Furthermore, we introduce CascadeNER, a novel NER method based on a two-stage strategy and lightweight LLMs, achieving higher accuracy on fine-grained tasks while requiring fewer computational resources. Experiments show that DynamicNER serves as a robust and effective benchmark for LLM-based NER methods. Furthermore, we also conduct analysis for traditional methods and LLM-based methods on our dataset. Our code and dataset are openly available at https://github.com/Astarojth/DynamicNER.
title DynamicNER: A Dynamic, Multilingual, and Fine-Grained Dataset for LLM-based Named Entity Recognition
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2409.11022