Saved in:
Bibliographic Details
Main Authors: de Jesus, Gabriel, Nunes, Sérgio
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.07299
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929382044467200
author de Jesus, Gabriel
Nunes, Sérgio
author_facet de Jesus, Gabriel
Nunes, Sérgio
contents The Cranfield paradigm has served as a foundational approach for developing test collections, with relevance judgments typically conducted by human assessors. However, the emergence of large language models (LLMs) has introduced new possibilities for automating these tasks. This paper explores the feasibility of using LLMs to automate relevance assessments, particularly within the context of low-resource languages. In our study, LLMs are employed to automate relevance judgment tasks, by providing a series of query-document pairs in Tetun as the input text. The models are tasked with assigning relevance scores to each pair, where these scores are then compared to those from human annotators to evaluate the inter-annotator agreement levels. Our investigation reveals results that align closely with those reported in studies of high-resource languages.
format Preprint
id arxiv_https___arxiv_org_abs_2406_07299
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Exploring Large Language Models for Relevance Judgments in Tetun
de Jesus, Gabriel
Nunes, Sérgio
Information Retrieval
The Cranfield paradigm has served as a foundational approach for developing test collections, with relevance judgments typically conducted by human assessors. However, the emergence of large language models (LLMs) has introduced new possibilities for automating these tasks. This paper explores the feasibility of using LLMs to automate relevance assessments, particularly within the context of low-resource languages. In our study, LLMs are employed to automate relevance judgment tasks, by providing a series of query-document pairs in Tetun as the input text. The models are tasked with assigning relevance scores to each pair, where these scores are then compared to those from human annotators to evaluate the inter-annotator agreement levels. Our investigation reveals results that align closely with those reported in studies of high-resource languages.
title Exploring Large Language Models for Relevance Judgments in Tetun
topic Information Retrieval
url https://arxiv.org/abs/2406.07299