Saved in:
Bibliographic Details
Main Authors: Thorne, William, James, Joseph, Wang, Yang, Lin, Chenghua, Maynard, Diana
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.08281
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914386602360832
author Thorne, William
James, Joseph
Wang, Yang
Lin, Chenghua
Maynard, Diana
author_facet Thorne, William
James, Joseph
Wang, Yang
Lin, Chenghua
Maynard, Diana
contents As AI-assisted grant proposals outpace manual review capacity in a kind of ``Malthusian trap'' for the research ecosystem, this paper investigates the capabilities and limitations of LLM-based grant reviewing for high-stakes evaluation. Using six EPSRC proposals, we develop a perturbation-based framework probing LLM sensitivity across six quality axes: funding, timeline, competency, alignment, clarity, and impact. We compare three review architectures: single-pass review, section-by-section analysis, and a 'Council of Personas' ensemble emulating expert panels. The section-level approach significantly outperforms alternatives in both detection rate and scoring reliability, while the computationally expensive council method performs no better than baseline. Detection varies substantially by perturbation type, with alignment issues readily identified but clarity flaws largely missed by all systems. Human evaluation shows LLM feedback is largely valid but skewed toward compliance checking over holistic assessment. We conclude that current LLMs may provide supplementary value within EPSRC review but exhibit high variability and misaligned review priorities. We release our code and any non-protected data.
format Preprint
id arxiv_https___arxiv_org_abs_2603_08281
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Evaluating LLM-Based Grant Proposal Review via Structured Perturbations
Thorne, William
James, Joseph
Wang, Yang
Lin, Chenghua
Maynard, Diana
Computation and Language
Artificial Intelligence
Computers and Society
I.2.7; J.1
As AI-assisted grant proposals outpace manual review capacity in a kind of ``Malthusian trap'' for the research ecosystem, this paper investigates the capabilities and limitations of LLM-based grant reviewing for high-stakes evaluation. Using six EPSRC proposals, we develop a perturbation-based framework probing LLM sensitivity across six quality axes: funding, timeline, competency, alignment, clarity, and impact. We compare three review architectures: single-pass review, section-by-section analysis, and a 'Council of Personas' ensemble emulating expert panels. The section-level approach significantly outperforms alternatives in both detection rate and scoring reliability, while the computationally expensive council method performs no better than baseline. Detection varies substantially by perturbation type, with alignment issues readily identified but clarity flaws largely missed by all systems. Human evaluation shows LLM feedback is largely valid but skewed toward compliance checking over holistic assessment. We conclude that current LLMs may provide supplementary value within EPSRC review but exhibit high variability and misaligned review priorities. We release our code and any non-protected data.
title Evaluating LLM-Based Grant Proposal Review via Structured Perturbations
topic Computation and Language
Artificial Intelligence
Computers and Society
I.2.7; J.1
url https://arxiv.org/abs/2603.08281