Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Jaekeol, Choi
Format:	Recurso digital
Language:
Published:	Zenodo 2026
Online Access:	https://doi.org/10.5281/zenodo.18241810
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866901885078732800
author	Jaekeol, Choi
author_facet	Jaekeol, Choi
contents	<p>Relevance evaluation of a query and a passage is essential in Information Retrieval (IR). Recently, numerous studies have been conducted on tasks related to relevance judgment using Large Language Models (LLMs) such as GPT-4,<br>demonstrating significant improvements. However, the efficacy of LLMs is considerably influenced by the design of the prompt. The purpose of this paper is to<br>identify which specific terms in prompts positively or negatively impact relevance<br>evaluation with LLMs. We employed two types of prompts: those used in previous<br>research and generated automatically by LLMs. By comparing the performance of<br>these prompts in both few-shot and zero-shot settings, we analyze the influence of<br>specific terms in the prompts. We have observed two main findings from our study.<br>First, we discovered that prompts using the term ‘answer’ lead to more effective<br>relevance evaluations than those using ‘relevant.’ This indicates that a more direct<br>approach, focusing on answering the query, tends to enhance performance. Second,<br>we noted the importance of appropriately balancing the scope of ‘relevance.’ While<br>the term ‘relevant’ can extend the scope too broadly, resulting in less precise evaluations, an optimal balance in defining relevance is crucial for accurate assessments.<br>The inclusion of few-shot examples helps in more precisely defining this balance.<br>By providing clearer contexts for the term ‘relevance,’ few-shot examples contribute<br>to refine relevance criteria. In conclusion, our study highlights the significance of<br>carefully selecting terms in prompts for relevance evaluation with LLMs</p>
format	Recurso digital
id	zenodo_https___doi_org_10_5281_zenodo_18241810
institution	Zenodo
language
publishDate	2026
publisher	Zenodo
record_format	zenodo
spellingShingle	Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models Jaekeol, Choi <p>Relevance evaluation of a query and a passage is essential in Information Retrieval (IR). Recently, numerous studies have been conducted on tasks related to relevance judgment using Large Language Models (LLMs) such as GPT-4,<br>demonstrating significant improvements. However, the efficacy of LLMs is considerably influenced by the design of the prompt. The purpose of this paper is to<br>identify which specific terms in prompts positively or negatively impact relevance<br>evaluation with LLMs. We employed two types of prompts: those used in previous<br>research and generated automatically by LLMs. By comparing the performance of<br>these prompts in both few-shot and zero-shot settings, we analyze the influence of<br>specific terms in the prompts. We have observed two main findings from our study.<br>First, we discovered that prompts using the term ‘answer’ lead to more effective<br>relevance evaluations than those using ‘relevant.’ This indicates that a more direct<br>approach, focusing on answering the query, tends to enhance performance. Second,<br>we noted the importance of appropriately balancing the scope of ‘relevance.’ While<br>the term ‘relevant’ can extend the scope too broadly, resulting in less precise evaluations, an optimal balance in defining relevance is crucial for accurate assessments.<br>The inclusion of few-shot examples helps in more precisely defining this balance.<br>By providing clearer contexts for the term ‘relevance,’ few-shot examples contribute<br>to refine relevance criteria. In conclusion, our study highlights the significance of<br>carefully selecting terms in prompts for relevance evaluation with LLMs</p>
title	Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models
url	https://doi.org/10.5281/zenodo.18241810

Similar Items