Saved in:
Bibliographic Details
Main Authors: Benescu, Matei, de Jong, Ivo Pascal
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.08077
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918379884904448
author Benescu, Matei
de Jong, Ivo Pascal
author_facet Benescu, Matei
de Jong, Ivo Pascal
contents With the emergence of Large Language Models (LLMs), new methods in Information Retrieval are available in which relevance is estimated directly through language understanding and reasoning, instead of embedding similarity. We argue that similarity is a short-sighted interpretation of relevance, and that LLM-Based Relevance Judgment Systems (LLM-RJS) (with reasoning) have potential to outperform Neural Embedding Retrieval Systems (NERS) by overcoming this limitation. Using the TREC-DL 2019 passage retrieval dataset, we compare various LLM-RJS with NERS, but observe no noticeable improvement. Subsequently, we analyze the impact of reasoning by comparing LLM-RJS with and without reasoning. We find that human annotations also suffer from short-sightedness, and that false-positives in the reasoning LLM-RJS are primarily mistakes in annotations due to short-sightedness. We conclude that LLM-RJS do have the ability to address the short-sightedness limitation in NERS, but that this cannot be evaluated with standard annotated relevance datasets.
format Preprint
id arxiv_https___arxiv_org_abs_2603_08077
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Why Large Language Models can Secretly Outperform Embedding Similarity in Information Retrieval
Benescu, Matei
de Jong, Ivo Pascal
Information Retrieval
With the emergence of Large Language Models (LLMs), new methods in Information Retrieval are available in which relevance is estimated directly through language understanding and reasoning, instead of embedding similarity. We argue that similarity is a short-sighted interpretation of relevance, and that LLM-Based Relevance Judgment Systems (LLM-RJS) (with reasoning) have potential to outperform Neural Embedding Retrieval Systems (NERS) by overcoming this limitation. Using the TREC-DL 2019 passage retrieval dataset, we compare various LLM-RJS with NERS, but observe no noticeable improvement. Subsequently, we analyze the impact of reasoning by comparing LLM-RJS with and without reasoning. We find that human annotations also suffer from short-sightedness, and that false-positives in the reasoning LLM-RJS are primarily mistakes in annotations due to short-sightedness. We conclude that LLM-RJS do have the ability to address the short-sightedness limitation in NERS, but that this cannot be evaluated with standard annotated relevance datasets.
title Why Large Language Models can Secretly Outperform Embedding Similarity in Information Retrieval
topic Information Retrieval
url https://arxiv.org/abs/2603.08077