Saved in:
| Main Authors: | Goel, Arnav, Chitale, Pranjal A, Paliwal, Bhawna, Santra, Bishal, Sharma, Amit |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.17259 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Evaluating the Effectiveness and Scalability of LLM-Based Data Augmentation for Retrieval
by: Chitale, Pranjal A., et al.
Published: (2025)
by: Chitale, Pranjal A., et al.
Published: (2025)
Quantifying Positional Biases in Text Embedding Models
by: Lee, Reagan J., et al.
Published: (2024)
by: Lee, Reagan J., et al.
Published: (2024)
HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval
by: Gupta, Vipul, et al.
Published: (2026)
by: Gupta, Vipul, et al.
Published: (2026)
HIRO: Hierarchical Information Retrieval Optimization
by: Goel, Krish, et al.
Published: (2024)
by: Goel, Krish, et al.
Published: (2024)
OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning
by: Opsahl-Ong, Krista, et al.
Published: (2026)
by: Opsahl-Ong, Krista, et al.
Published: (2026)
Attribution in Scientific Literature: New Benchmark and Methods
by: Saxena, Yash, et al.
Published: (2024)
by: Saxena, Yash, et al.
Published: (2024)
LangProBe: a Language Programs Benchmark
by: Tan, Shangyin, et al.
Published: (2025)
by: Tan, Shangyin, et al.
Published: (2025)
SymTax: Symbiotic Relationship and Taxonomy Fusion for Effective Citation Recommendation
by: Goyal, Karan, et al.
Published: (2024)
by: Goyal, Karan, et al.
Published: (2024)
Understanding the Role of User Profile in the Personalization of Large Language Models
by: Wu, Bin, et al.
Published: (2024)
by: Wu, Bin, et al.
Published: (2024)
MMTEB: Massive Multilingual Text Embedding Benchmark
by: Enevoldsen, Kenneth, et al.
Published: (2025)
by: Enevoldsen, Kenneth, et al.
Published: (2025)
Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models
by: Sharma, Nikhil, et al.
Published: (2024)
by: Sharma, Nikhil, et al.
Published: (2024)
Deep Learning Based Named Entity Recognition Models for Recipes
by: Goel, Mansi, et al.
Published: (2024)
by: Goel, Mansi, et al.
Published: (2024)
Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation
by: Yoon, Se-eun, et al.
Published: (2024)
by: Yoon, Se-eun, et al.
Published: (2024)
Memory-Based vs. Context-Only Conditioning Produces Distinct Behavioral Patterns in Stateful Personalization
by: Park, Junsoo, et al.
Published: (2026)
by: Park, Junsoo, et al.
Published: (2026)
Domain-Partitioned Hybrid RAG for Legal Reasoning: Toward Modular and Explainable Legal AI for India
by: Goel, Rakshita, et al.
Published: (2025)
by: Goel, Rakshita, et al.
Published: (2025)
Benchmarking Prompt Sensitivity in Large Language Models
by: Razavi, Amirhossein, et al.
Published: (2025)
by: Razavi, Amirhossein, et al.
Published: (2025)
Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies
by: Shah, Chirag, et al.
Published: (2023)
by: Shah, Chirag, et al.
Published: (2023)
URAG: A Benchmark for Uncertainty Quantification in Retrieval-Augmented Large Language Models
by: Nguyen, Vinh, et al.
Published: (2026)
by: Nguyen, Vinh, et al.
Published: (2026)
Benchmarking Information Retrieval Models on Complex Retrieval Tasks
by: Killingback, Julian, et al.
Published: (2025)
by: Killingback, Julian, et al.
Published: (2025)
Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models
by: Shi, Zhengliang, et al.
Published: (2025)
by: Shi, Zhengliang, et al.
Published: (2025)
OAEI-LLM: A Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching
by: Qiang, Zhangcheng, et al.
Published: (2024)
by: Qiang, Zhangcheng, et al.
Published: (2024)
TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval
by: Ezerceli, Özay, et al.
Published: (2025)
by: Ezerceli, Özay, et al.
Published: (2025)
Evaluating Chain-of-Thought Reasoning through Reusability and Verifiability
by: Aggarwal, Shashank, et al.
Published: (2026)
by: Aggarwal, Shashank, et al.
Published: (2026)
Benchmarking Large Language Models on Reference Extraction and Parsing in the Social Sciences and Humanities
by: Zhu, Yurui, et al.
Published: (2026)
by: Zhu, Yurui, et al.
Published: (2026)
MizanQA: Benchmarking Large Language Models on Moroccan Legal Question Answering
by: Bahaj, Adil, et al.
Published: (2025)
by: Bahaj, Adil, et al.
Published: (2025)
GISA: A Benchmark for General Information-Seeking Assistant
by: Zhu, Yutao, et al.
Published: (2026)
by: Zhu, Yutao, et al.
Published: (2026)
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
by: Jin, Zhuoran, et al.
Published: (2024)
by: Jin, Zhuoran, et al.
Published: (2024)
Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study
by: Sui, Yuan, et al.
Published: (2023)
by: Sui, Yuan, et al.
Published: (2023)
ResearchArena: Benchmarking Large Language Models' Ability to Collect and Organize Information as Research Agents
by: Kang, Hao, et al.
Published: (2024)
by: Kang, Hao, et al.
Published: (2024)
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
by: Su, Hongjin, et al.
Published: (2024)
by: Su, Hongjin, et al.
Published: (2024)
FinRetrieval: A Benchmark for Financial Data Retrieval by AI Agents
by: Kim, Eric Y., et al.
Published: (2026)
by: Kim, Eric Y., et al.
Published: (2026)
SurGE: A Benchmark and Evaluation Framework for Scientific Survey Generation
by: Su, Weihang, et al.
Published: (2025)
by: Su, Weihang, et al.
Published: (2025)
The Massive Legal Embedding Benchmark (MLEB)
by: Butler, Umar, et al.
Published: (2025)
by: Butler, Umar, et al.
Published: (2025)
Benchmarking Retrieval-Augmented Generation for Chemistry
by: Zhong, Xianrui, et al.
Published: (2025)
by: Zhong, Xianrui, et al.
Published: (2025)
Exploring the Escalation of Source Bias in User, Data, and Recommender System Feedback Loop
by: Zhou, Yuqi, et al.
Published: (2024)
by: Zhou, Yuqi, et al.
Published: (2024)
PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading
by: Wu, Yutao, et al.
Published: (2025)
by: Wu, Yutao, et al.
Published: (2025)
Towards Personalized Deep Research: Benchmarks and Evaluations
by: Liang, Yuan, et al.
Published: (2025)
by: Liang, Yuan, et al.
Published: (2025)
ALARB: An Arabic Legal Argument Reasoning Benchmark
by: Shairah, Harethah Abu, et al.
Published: (2025)
by: Shairah, Harethah Abu, et al.
Published: (2025)
Reasoning over User Preferences: Knowledge Graph-Augmented LLMs for Explainable Conversational Recommendations
by: Qiu, Zhangchi, et al.
Published: (2024)
by: Qiu, Zhangchi, et al.
Published: (2024)
Benchmarking Advanced Text Anonymisation Methods: A Comparative Study on Novel and Traditional Approaches
by: Asimopoulos, Dimitris, et al.
Published: (2024)
by: Asimopoulos, Dimitris, et al.
Published: (2024)
Similar Items
-
Evaluating the Effectiveness and Scalability of LLM-Based Data Augmentation for Retrieval
by: Chitale, Pranjal A., et al.
Published: (2025) -
Quantifying Positional Biases in Text Embedding Models
by: Lee, Reagan J., et al.
Published: (2024) -
HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval
by: Gupta, Vipul, et al.
Published: (2026) -
HIRO: Hierarchical Information Retrieval Optimization
by: Goel, Krish, et al.
Published: (2024) -
OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning
by: Opsahl-Ong, Krista, et al.
Published: (2026)