Saved in:
| Main Authors: | Arabzadeh, Negar, Clarke, Charles L. A. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.17543 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Generative Information Retrieval Evaluation
by: Alaofi, Marwah, et al.
Published: (2024)
by: Alaofi, Marwah, et al.
Published: (2024)
Adapting Standard Retrieval Benchmarks to Evaluate Generated Answers
by: Arabzadeh, Negar, et al.
Published: (2024)
by: Arabzadeh, Negar, et al.
Published: (2024)
A Comparison of Methods for Evaluating Generative IR
by: Arabzadeh, Negar, et al.
Published: (2024)
by: Arabzadeh, Negar, et al.
Published: (2024)
Benchmarking LLM-based Relevance Judgment Methods
by: Arabzadeh, Negar, et al.
Published: (2025)
by: Arabzadeh, Negar, et al.
Published: (2025)
A Human-AI Comparative Analysis of Prompt Sensitivity in LLM-Based Relevance Judgment
by: Arabzadeh, Negar, et al.
Published: (2025)
by: Arabzadeh, Negar, et al.
Published: (2025)
Offline Evaluation of Set-Based Text-to-Image Generation
by: Arabzadeh, Negar, et al.
Published: (2024)
by: Arabzadeh, Negar, et al.
Published: (2024)
EMPRA: Embedding Perturbation Rank Attack against Neural Ranking Models
by: Bigdeli, Amin, et al.
Published: (2024)
by: Bigdeli, Amin, et al.
Published: (2024)
Adversarial Attacks against Neural Ranking Models via In-Context Learning
by: Bigdeli, Amin, et al.
Published: (2025)
by: Bigdeli, Amin, et al.
Published: (2025)
ReFormeR: Learning and Applying Explicit Query Reformulation Patterns
by: Bigdeli, Amin, et al.
Published: (2026)
by: Bigdeli, Amin, et al.
Published: (2026)
Optimal Dataset Size for Recommender Systems: Evaluating Algorithms' Performance via Downsampling
by: Arabzadeh, Ardalan
Published: (2025)
by: Arabzadeh, Ardalan
Published: (2025)
QueryGym: A Toolkit for Reproducible LLM-Based Query Reformulation
by: Bigdeli, Amin, et al.
Published: (2025)
by: Bigdeli, Amin, et al.
Published: (2025)
Can QPP Choose the Right Query Variant? Evaluating Query Variant Selection for RAG Pipelines
by: Arabzadeh, Negar, et al.
Published: (2026)
by: Arabzadeh, Negar, et al.
Published: (2026)
A Reproducibility Study of LLM-Based Query Reformulation
by: Bigdeli, Amin, et al.
Published: (2026)
by: Bigdeli, Amin, et al.
Published: (2026)
RAG over Thinking Traces Can Improve Reasoning Tasks
by: Arabzadeh, Negar, et al.
Published: (2026)
by: Arabzadeh, Negar, et al.
Published: (2026)
exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem
by: Ebrahimi, Sajad, et al.
Published: (2025)
by: Ebrahimi, Sajad, et al.
Published: (2025)
Evaluating the Robustness of Retrieval-Augmented Generation to Adversarial Evidence in the Health Domain
by: Amirshahi, Shakiba, et al.
Published: (2025)
by: Amirshahi, Shakiba, et al.
Published: (2025)
Report on the 1st Workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval 2024) at SIGIR 2024
by: Rahmani, Hossein A., et al.
Published: (2024)
by: Rahmani, Hossein A., et al.
Published: (2024)
Green Recommender Systems: Optimizing Dataset Size for Energy-Efficient Algorithm Performance
by: Arabzadeh, Ardalan, et al.
Published: (2024)
by: Arabzadeh, Ardalan, et al.
Published: (2024)
Annotative Indexing
by: Clarke, Charles L. A.
Published: (2024)
by: Clarke, Charles L. A.
Published: (2024)
Resources for Automated Evaluation of Assistive RAG Systems that Help Readers with News Trustworthiness Assessment
by: Zhang, Dake, et al.
Published: (2026)
by: Zhang, Dake, et al.
Published: (2026)
Benchmarking Prompt Sensitivity in Large Language Models
by: Razavi, Amirhossein, et al.
Published: (2025)
by: Razavi, Amirhossein, et al.
Published: (2025)
Peerispect: Claim Verification in Scientific Peer Reviews
by: Ghorbanpour, Ali, et al.
Published: (2026)
by: Ghorbanpour, Ali, et al.
Published: (2026)
Offline Evaluation Measures of Fairness in Recommender Systems
by: Rampisela, Theresia Veronika
Published: (2026)
by: Rampisela, Theresia Veronika
Published: (2026)
WildClaims: Information Access Conversations in the Wild(Chat)
by: Joko, Hideaki, et al.
Published: (2025)
by: Joko, Hideaki, et al.
Published: (2025)
Towards Robust Offline Evaluation: A Causal and Information Theoretic Framework for Debiasing Ranking Systems
by: Khatami, Seyedeh Baharan, et al.
Published: (2025)
by: Khatami, Seyedeh Baharan, et al.
Published: (2025)
Human-Computer Interaction as a basis for assessing Geographic Information Retrieval Systems.
by: Manuel Enrique Puebla Martínez
Published: (2018)
by: Manuel Enrique Puebla Martínez
Published: (2018)
COS-Mix: Cosine Similarity and Distance Fusion for Improved Information Retrieval
by: Juvekar, Kush, et al.
Published: (2024)
by: Juvekar, Kush, et al.
Published: (2024)
Beyond Utility: Evaluating LLM as Recommender
by: Jiang, Chumeng, et al.
Published: (2024)
by: Jiang, Chumeng, et al.
Published: (2024)
Online and Offline Evaluation in Search Clarification
by: Tavakoli, Leila, et al.
Published: (2024)
by: Tavakoli, Leila, et al.
Published: (2024)
Ranked List Truncation for Large Language Model-based Re-Ranking
by: Meng, Chuan, et al.
Published: (2024)
by: Meng, Chuan, et al.
Published: (2024)
Query Performance Prediction using Relevance Judgments Generated by Large Language Models
by: Meng, Chuan, et al.
Published: (2024)
by: Meng, Chuan, et al.
Published: (2024)
Computerized Information Storage and Retrieval Systems.
by: Azubuike, Abraham A., et al.
Published: (1988)
by: Azubuike, Abraham A., et al.
Published: (1988)
On the Reliability of Sampling Strategies in Offline Recommender Evaluation
by: Pereira, Bruno L., et al.
Published: (2025)
by: Pereira, Bruno L., et al.
Published: (2025)
LLM-based relevance assessment still can't replace human relevance assessment
by: Clarke, Charles L. A., et al.
Published: (2024)
by: Clarke, Charles L. A., et al.
Published: (2024)
Simple Domain Adaptation for Sparse Retrievers
by: Vast, Mathias, et al.
Published: (2024)
by: Vast, Mathias, et al.
Published: (2024)
SPLATE: Sparse Late Interaction Retrieval
by: Formal, Thibault, et al.
Published: (2024)
by: Formal, Thibault, et al.
Published: (2024)
Replicability Measures for Longitudinal Information Retrieval Evaluation
by: Keller, Jüri, et al.
Published: (2024)
by: Keller, Jüri, et al.
Published: (2024)
Interactions with Generative Information Retrieval Systems
by: Aliannejadi, Mohammad, et al.
Published: (2024)
by: Aliannejadi, Mohammad, et al.
Published: (2024)
A Universal Framework for Offline Serendipity Evaluation in Recommender Systems via Large Language Models
by: Tokutake, Yu, et al.
Published: (2025)
by: Tokutake, Yu, et al.
Published: (2025)
Efficiency Optimizations for Superblock-based Sparse Retrieval
by: Carlson, Parker, et al.
Published: (2026)
by: Carlson, Parker, et al.
Published: (2026)
Similar Items
-
Generative Information Retrieval Evaluation
by: Alaofi, Marwah, et al.
Published: (2024) -
Adapting Standard Retrieval Benchmarks to Evaluate Generated Answers
by: Arabzadeh, Negar, et al.
Published: (2024) -
A Comparison of Methods for Evaluating Generative IR
by: Arabzadeh, Negar, et al.
Published: (2024) -
Benchmarking LLM-based Relevance Judgment Methods
by: Arabzadeh, Negar, et al.
Published: (2025) -
A Human-AI Comparative Analysis of Prompt Sensitivity in LLM-Based Relevance Judgment
by: Arabzadeh, Negar, et al.
Published: (2025)