Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lin, Jessica, Zeldes, Amir
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2504.10792
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918038675128320
author	Lin, Jessica Zeldes, Amir
author_facet	Lin, Jessica Zeldes, Amir
contents	Determining and ranking the most salient entities in a text is critical for user-facing systems, especially as users increasingly rely on models to interpret long documents they only partially read. Graded entity salience addresses this need by assigning entities scores that reflect their relative importance in a text. Existing approaches fall into two main categories: subjective judgments of salience, which allow for gradient scoring but lack consistency, and summarization-based methods, which define salience as mention-worthiness in a summary, promoting explainability but limiting outputs to binary labels (entities are either summary-worthy or not). In this paper, we introduce a novel approach for graded entity salience that combines the strengths of both approaches. Using an English dataset spanning 12 spoken and written genres, we collect 5 summaries per document and calculate each entity's salience score based on its presence across these summaries. Our approach shows stronger correlation with scores based on human summaries and alignments, and outperforms existing techniques, including LLMs. We release our data and code at https://github.com/jl908069/gum_sum_salience to support further research on graded salient entity extraction.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_10792
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	GUM-SAGE: A Novel Dataset and Approach for Graded Entity Salience Prediction Lin, Jessica Zeldes, Amir Computation and Language Determining and ranking the most salient entities in a text is critical for user-facing systems, especially as users increasingly rely on models to interpret long documents they only partially read. Graded entity salience addresses this need by assigning entities scores that reflect their relative importance in a text. Existing approaches fall into two main categories: subjective judgments of salience, which allow for gradient scoring but lack consistency, and summarization-based methods, which define salience as mention-worthiness in a summary, promoting explainability but limiting outputs to binary labels (entities are either summary-worthy or not). In this paper, we introduce a novel approach for graded entity salience that combines the strengths of both approaches. Using an English dataset spanning 12 spoken and written genres, we collect 5 summaries per document and calculate each entity's salience score based on its presence across these summaries. Our approach shows stronger correlation with scores based on human summaries and alignments, and outperforms existing techniques, including LLMs. We release our data and code at https://github.com/jl908069/gum_sum_salience to support further research on graded salient entity extraction.
title	GUM-SAGE: A Novel Dataset and Approach for Graded Entity Salience Prediction
topic	Computation and Language
url	https://arxiv.org/abs/2504.10792

Similar Items