Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xie, Wenwen, Gwizdz, Gray, Feng, Dongji
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2502.13396
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915160671649792
author	Xie, Wenwen Gwizdz, Gray Feng, Dongji
author_facet	Xie, Wenwen Gwizdz, Gray Feng, Dongji
contents	While Large Language Models (LLMs) have emerged as promising tools for evaluating Natural Language Generation (NLG) tasks, their effectiveness is limited by their inability to appropriately weigh the importance of different topics, often overemphasizing minor details while undervaluing critical information, leading to misleading assessments. Our work proposes an efficient prompt design mechanism to address this specific limitation and provide a case study. Through strategic prompt engineering that incorporates explicit importance weighting mechanisms, we enhance using LLM-as-a-Judge ability to prioritize relevant information effectively, as demonstrated by an average improvement of 6% in the Human Alignment Rate (HAR) metric.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_13396
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Prompting a Weighting Mechanism into LLM-as-a-Judge in Two-Step: A Case Study Xie, Wenwen Gwizdz, Gray Feng, Dongji Computation and Language While Large Language Models (LLMs) have emerged as promising tools for evaluating Natural Language Generation (NLG) tasks, their effectiveness is limited by their inability to appropriately weigh the importance of different topics, often overemphasizing minor details while undervaluing critical information, leading to misleading assessments. Our work proposes an efficient prompt design mechanism to address this specific limitation and provide a case study. Through strategic prompt engineering that incorporates explicit importance weighting mechanisms, we enhance using LLM-as-a-Judge ability to prioritize relevant information effectively, as demonstrated by an average improvement of 6% in the Human Alignment Rate (HAR) metric.
title	Prompting a Weighting Mechanism into LLM-as-a-Judge in Two-Step: A Case Study
topic	Computation and Language
url	https://arxiv.org/abs/2502.13396

Similar Items