Saved in:
| Main Authors: | Kernycky, Andrew, Coleman, David, Spence, Christopher, Das, Udayan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.15503 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Text2Cypher Across Languages: Evaluating and Finetuning LLMs
by: Ozsoy, Makbule Gulcin, et al.
Published: (2025)
by: Ozsoy, Makbule Gulcin, et al.
Published: (2025)
FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
by: Thakur, Nandan, et al.
Published: (2025)
by: Thakur, Nandan, et al.
Published: (2025)
Evaluating the Efficacy of Open-Source LLMs in Enterprise-Specific RAG Systems: A Comparative Study of Performance and Scalability
by: B, Gautam, et al.
Published: (2024)
by: B, Gautam, et al.
Published: (2024)
Redefining Retrieval Evaluation in the Era of LLMs
by: Trappolini, Giovanni, et al.
Published: (2025)
by: Trappolini, Giovanni, et al.
Published: (2025)
Evaluating LLMs for Gender Disparities in Notable Persons
by: Rhue, Lauren, et al.
Published: (2024)
by: Rhue, Lauren, et al.
Published: (2024)
UserGPT Technical Report
by: Xuan, Yunyi, et al.
Published: (2026)
by: Xuan, Yunyi, et al.
Published: (2026)
RecGPT Technical Report
by: Yi, Chao, et al.
Published: (2025)
by: Yi, Chao, et al.
Published: (2025)
Can LLMs Outshine Conventional Recommenders? A Comparative Evaluation
by: Liu, Qijiong, et al.
Published: (2025)
by: Liu, Qijiong, et al.
Published: (2025)
ThinkQE: Query Expansion via an Evolving Thinking Process
by: Lei, Yibin, et al.
Published: (2025)
by: Lei, Yibin, et al.
Published: (2025)
Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs
by: Siro, Clemencia, et al.
Published: (2024)
by: Siro, Clemencia, et al.
Published: (2024)
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
by: Li, Haitao, et al.
Published: (2024)
by: Li, Haitao, et al.
Published: (2024)
RecGPT-V2 Technical Report
by: Yi, Chao, et al.
Published: (2025)
by: Yi, Chao, et al.
Published: (2025)
Evaluating Small Open LLMs for Medical Question Answering: A Practical Framework
by: Buskila, Avi-ad Avraam
Published: (2026)
by: Buskila, Avi-ad Avraam
Published: (2026)
BioPulse-QA: A Dynamic Biomedical Question-Answering Benchmark for Evaluating Factuality, Robustness, and Bias in Large Language Models
by: Bhattarai, Kriti, et al.
Published: (2026)
by: Bhattarai, Kriti, et al.
Published: (2026)
Evaluation of LLMs for Process Model Analysis and Optimization
by: Kumar, Akhil, et al.
Published: (2025)
by: Kumar, Akhil, et al.
Published: (2025)
UniRetriever: Multi-task Candidates Selection for Various Context-Adaptive Conversational Retrieval
by: Wang, Hongru, et al.
Published: (2024)
by: Wang, Hongru, et al.
Published: (2024)
Multilingual E5 Text Embeddings: A Technical Report
by: Wang, Liang, et al.
Published: (2024)
by: Wang, Liang, et al.
Published: (2024)
Leveraging Large Language Models for Knowledge-free Weak Supervision in Clinical Natural Language Processing
by: Hsu, Enshuo, et al.
Published: (2024)
by: Hsu, Enshuo, et al.
Published: (2024)
Large Language Models as Evaluators for Recommendation Explanations
by: Zhang, Xiaoyu, et al.
Published: (2024)
by: Zhang, Xiaoyu, et al.
Published: (2024)
Hallucination Detection and Evaluation of Large Language Model
by: Zhang, Chenggong, et al.
Published: (2025)
by: Zhang, Chenggong, et al.
Published: (2025)
Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers
by: Wang, Yuan, et al.
Published: (2024)
by: Wang, Yuan, et al.
Published: (2024)
Evaluating Large Language Models for Cross-Lingual Retrieval
by: Zuo, Longfei, et al.
Published: (2025)
by: Zuo, Longfei, et al.
Published: (2025)
From Model-centered to Human-Centered: Revision Distance as a Metric for Text Evaluation in LLMs-based Applications
by: Ma, Yongqiang, et al.
Published: (2024)
by: Ma, Yongqiang, et al.
Published: (2024)
SLIMER-IT: Zero-Shot NER on Italian Language
by: Zamai, Andrew, et al.
Published: (2024)
by: Zamai, Andrew, et al.
Published: (2024)
Team LA at SCIDOCA shared task 2025: Citation Discovery via relation-based zero-shot retrieval
by: An, Trieu, et al.
Published: (2025)
by: An, Trieu, et al.
Published: (2025)
Author Unknown: Evaluating Performance of Author Extraction Libraries on Global Online News Articles
by: Hatwar, Sriharsha, et al.
Published: (2024)
by: Hatwar, Sriharsha, et al.
Published: (2024)
Contrastive Learning Using Graph Embeddings for Domain Adaptation of Language Models in the Process Industry
by: Zhukova, Anastasia, et al.
Published: (2025)
by: Zhukova, Anastasia, et al.
Published: (2025)
Making Large Language Models Efficient Dense Retrievers
by: Lei, Yibin, et al.
Published: (2025)
by: Lei, Yibin, et al.
Published: (2025)
Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models
by: Wang, Xiaolei, et al.
Published: (2023)
by: Wang, Xiaolei, et al.
Published: (2023)
MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models
by: Chen, Zhongpu, et al.
Published: (2025)
by: Chen, Zhongpu, et al.
Published: (2025)
Music Recommendation with Large Language Models: Challenges, Opportunities, and Evaluation
by: Epure, Elena V., et al.
Published: (2025)
by: Epure, Elena V., et al.
Published: (2025)
Enhancing Lexicon-Based Text Embeddings with Large Language Models
by: Lei, Yibin, et al.
Published: (2025)
by: Lei, Yibin, et al.
Published: (2025)
Can QPP Choose the Right Query Variant? Evaluating Query Variant Selection for RAG Pipelines
by: Arabzadeh, Negar, et al.
Published: (2026)
by: Arabzadeh, Negar, et al.
Published: (2026)
Analyticup E-commerce Product Search Competition Technical Report from Team Tredence_AICOE
by: R, Rakshith, et al.
Published: (2025)
by: R, Rakshith, et al.
Published: (2025)
Shaping the Future of Endangered and Low-Resource Languages -- Our Role in the Age of LLMs: A Keynote at ECIR 2024
by: Mothe, Josiane
Published: (2024)
by: Mothe, Josiane
Published: (2024)
Corpus-Steered Query Expansion with Large Language Models
by: Lei, Yibin, et al.
Published: (2024)
by: Lei, Yibin, et al.
Published: (2024)
PRE: A Peer Review Based Large Language Model Evaluator
by: Chu, Zhumin, et al.
Published: (2024)
by: Chu, Zhumin, et al.
Published: (2024)
A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting
by: Chang, He, et al.
Published: (2024)
by: Chang, He, et al.
Published: (2024)
Evaluating improvements on using Large Language Models (LLMs) for property extraction in the Open Research Knowledge Graph (ORKG)
by: Schaftner, Sandra
Published: (2025)
by: Schaftner, Sandra
Published: (2025)
Navigating Tomorrow: Reliably Assessing Large Language Models Performance on Future Event Prediction
by: Nako, Petraq, et al.
Published: (2025)
by: Nako, Petraq, et al.
Published: (2025)
Similar Items
-
Text2Cypher Across Languages: Evaluating and Finetuning LLMs
by: Ozsoy, Makbule Gulcin, et al.
Published: (2025) -
FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
by: Thakur, Nandan, et al.
Published: (2025) -
Evaluating the Efficacy of Open-Source LLMs in Enterprise-Specific RAG Systems: A Comparative Study of Performance and Scalability
by: B, Gautam, et al.
Published: (2024) -
Redefining Retrieval Evaluation in the Era of LLMs
by: Trappolini, Giovanni, et al.
Published: (2025) -
Evaluating LLMs for Gender Disparities in Notable Persons
by: Rhue, Lauren, et al.
Published: (2024)