Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Wang, Yu
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2403.18218
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910386125537280
author	Wang, Yu
author_facet	Wang, Yu
contents	Fuzzy string matching remains a key issue when political scientists combine data from different sources. Existing matching methods invariably rely on string distances, such as Levenshtein distance and cosine similarity. As such, they are inherently incapable of matching strings that refer to the same entity with different names such as ''JP Morgan'' and ''Chase Bank'', ''DPRK'' and ''North Korea'', ''Chuck Fleischmann (R)'' and ''Charles Fleischmann (R)''. In this letter, we propose to use large language models to entirely sidestep this problem in an easy and intuitive manner. Extensive experiments show that our proposed methods can improve the state of the art by as much as 39% in terms of average precision while being substantially easier and more intuitive to use by political scientists. Moreover, our results are robust against various temperatures. We further note that enhanced prompting can lead to additional performance improvements.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_18218
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Leveraging Large Language Models for Fuzzy String Matching in Political Science Wang, Yu Artificial Intelligence Fuzzy string matching remains a key issue when political scientists combine data from different sources. Existing matching methods invariably rely on string distances, such as Levenshtein distance and cosine similarity. As such, they are inherently incapable of matching strings that refer to the same entity with different names such as ''JP Morgan'' and ''Chase Bank'', ''DPRK'' and ''North Korea'', ''Chuck Fleischmann (R)'' and ''Charles Fleischmann (R)''. In this letter, we propose to use large language models to entirely sidestep this problem in an easy and intuitive manner. Extensive experiments show that our proposed methods can improve the state of the art by as much as 39% in terms of average precision while being substantially easier and more intuitive to use by political scientists. Moreover, our results are robust against various temperatures. We further note that enhanced prompting can lead to additional performance improvements.
title	Leveraging Large Language Models for Fuzzy String Matching in Political Science
topic	Artificial Intelligence
url	https://arxiv.org/abs/2403.18218

Similar Items