:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Goel, Arnav, Chitale, Pranjal A, Paliwal, Bhawna, Santra, Bishal, Sharma, Amit
Format:	Preprint
Published:	2026
Subjects:	Information Retrieval Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2604.17259
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Evaluating the Effectiveness and Scalability of LLM-Based Data Augmentation for Retrieval
by: Chitale, Pranjal A., et al.
Published: (2025)

Quantifying Positional Biases in Text Embedding Models
by: Lee, Reagan J., et al.
Published: (2024)

HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval
by: Gupta, Vipul, et al.
Published: (2026)

HIRO: Hierarchical Information Retrieval Optimization
by: Goel, Krish, et al.
Published: (2024)

OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning
by: Opsahl-Ong, Krista, et al.
Published: (2026)

Attribution in Scientific Literature: New Benchmark and Methods
by: Saxena, Yash, et al.
Published: (2024)

LangProBe: a Language Programs Benchmark
by: Tan, Shangyin, et al.
Published: (2025)

SymTax: Symbiotic Relationship and Taxonomy Fusion for Effective Citation Recommendation
by: Goyal, Karan, et al.
Published: (2024)

Understanding the Role of User Profile in the Personalization of Large Language Models
by: Wu, Bin, et al.
Published: (2024)

MMTEB: Massive Multilingual Text Embedding Benchmark
by: Enevoldsen, Kenneth, et al.
Published: (2025)

Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models
by: Sharma, Nikhil, et al.
Published: (2024)

Deep Learning Based Named Entity Recognition Models for Recipes
by: Goel, Mansi, et al.
Published: (2024)

Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation
by: Yoon, Se-eun, et al.
Published: (2024)

Memory-Based vs. Context-Only Conditioning Produces Distinct Behavioral Patterns in Stateful Personalization
by: Park, Junsoo, et al.
Published: (2026)

Domain-Partitioned Hybrid RAG for Legal Reasoning: Toward Modular and Explainable Legal AI for India
by: Goel, Rakshita, et al.
Published: (2025)

Benchmarking Prompt Sensitivity in Large Language Models
by: Razavi, Amirhossein, et al.
Published: (2025)

Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies
by: Shah, Chirag, et al.
Published: (2023)

URAG: A Benchmark for Uncertainty Quantification in Retrieval-Augmented Large Language Models
by: Nguyen, Vinh, et al.
Published: (2026)

Benchmarking Information Retrieval Models on Complex Retrieval Tasks
by: Killingback, Julian, et al.
Published: (2025)

Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models
by: Shi, Zhengliang, et al.
Published: (2025)

OAEI-LLM: A Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching
by: Qiang, Zhangcheng, et al.
Published: (2024)

TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval
by: Ezerceli, Özay, et al.
Published: (2025)

Evaluating Chain-of-Thought Reasoning through Reusability and Verifiability
by: Aggarwal, Shashank, et al.
Published: (2026)

Benchmarking Large Language Models on Reference Extraction and Parsing in the Social Sciences and Humanities
by: Zhu, Yurui, et al.
Published: (2026)

MizanQA: Benchmarking Large Language Models on Moroccan Legal Question Answering
by: Bahaj, Adil, et al.
Published: (2025)

GISA: A Benchmark for General Information-Seeking Assistant
by: Zhu, Yutao, et al.
Published: (2026)

RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
by: Jin, Zhuoran, et al.
Published: (2024)

Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study
by: Sui, Yuan, et al.
Published: (2023)

ResearchArena: Benchmarking Large Language Models' Ability to Collect and Organize Information as Research Agents
by: Kang, Hao, et al.
Published: (2024)

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
by: Su, Hongjin, et al.
Published: (2024)

FinRetrieval: A Benchmark for Financial Data Retrieval by AI Agents
by: Kim, Eric Y., et al.
Published: (2026)

SurGE: A Benchmark and Evaluation Framework for Scientific Survey Generation
by: Su, Weihang, et al.
Published: (2025)

The Massive Legal Embedding Benchmark (MLEB)
by: Butler, Umar, et al.
Published: (2025)

Benchmarking Retrieval-Augmented Generation for Chemistry
by: Zhong, Xianrui, et al.
Published: (2025)

Exploring the Escalation of Source Bias in User, Data, and Recommender System Feedback Loop
by: Zhou, Yuqi, et al.
Published: (2024)

PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading
by: Wu, Yutao, et al.
Published: (2025)

Towards Personalized Deep Research: Benchmarks and Evaluations
by: Liang, Yuan, et al.
Published: (2025)

ALARB: An Arabic Legal Argument Reasoning Benchmark
by: Shairah, Harethah Abu, et al.
Published: (2025)

Reasoning over User Preferences: Knowledge Graph-Augmented LLMs for Explainable Conversational Recommendations
by: Qiu, Zhangchi, et al.
Published: (2024)

Benchmarking Advanced Text Anonymisation Methods: A Comparative Study on Novel and Traditional Approaches
by: Asimopoulos, Dimitris, et al.
Published: (2024)