Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Benescu, Matei, de Jong, Ivo Pascal
Format:	Preprint
Published:	2026
Subjects:	Information Retrieval
Online Access:	https://arxiv.org/abs/2603.08077
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918379884904448
author	Benescu, Matei de Jong, Ivo Pascal
author_facet	Benescu, Matei de Jong, Ivo Pascal
contents	With the emergence of Large Language Models (LLMs), new methods in Information Retrieval are available in which relevance is estimated directly through language understanding and reasoning, instead of embedding similarity. We argue that similarity is a short-sighted interpretation of relevance, and that LLM-Based Relevance Judgment Systems (LLM-RJS) (with reasoning) have potential to outperform Neural Embedding Retrieval Systems (NERS) by overcoming this limitation. Using the TREC-DL 2019 passage retrieval dataset, we compare various LLM-RJS with NERS, but observe no noticeable improvement. Subsequently, we analyze the impact of reasoning by comparing LLM-RJS with and without reasoning. We find that human annotations also suffer from short-sightedness, and that false-positives in the reasoning LLM-RJS are primarily mistakes in annotations due to short-sightedness. We conclude that LLM-RJS do have the ability to address the short-sightedness limitation in NERS, but that this cannot be evaluated with standard annotated relevance datasets.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_08077
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Why Large Language Models can Secretly Outperform Embedding Similarity in Information Retrieval Benescu, Matei de Jong, Ivo Pascal Information Retrieval With the emergence of Large Language Models (LLMs), new methods in Information Retrieval are available in which relevance is estimated directly through language understanding and reasoning, instead of embedding similarity. We argue that similarity is a short-sighted interpretation of relevance, and that LLM-Based Relevance Judgment Systems (LLM-RJS) (with reasoning) have potential to outperform Neural Embedding Retrieval Systems (NERS) by overcoming this limitation. Using the TREC-DL 2019 passage retrieval dataset, we compare various LLM-RJS with NERS, but observe no noticeable improvement. Subsequently, we analyze the impact of reasoning by comparing LLM-RJS with and without reasoning. We find that human annotations also suffer from short-sightedness, and that false-positives in the reasoning LLM-RJS are primarily mistakes in annotations due to short-sightedness. We conclude that LLM-RJS do have the ability to address the short-sightedness limitation in NERS, but that this cannot be evaluated with standard annotated relevance datasets.
title	Why Large Language Models can Secretly Outperform Embedding Similarity in Information Retrieval
topic	Information Retrieval
url	https://arxiv.org/abs/2603.08077

Similar Items