Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Zhang, Wentao, Zhang, Qi, Xu, Mingkun, You, Mu, Shen, Henghua, He, Zhongzhi, Jin, Keyan, Wong, Derek F., Fang, Tao
Format:	Preprint
Publié:	2026
Sujets:	Computation and Language Artificial Intelligence Computer Vision and Pattern Recognition
Accès en ligne:	https://arxiv.org/abs/2604.23701
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866911623336165376
author	Zhang, Wentao Zhang, Qi Xu, Mingkun You, Mu Shen, Henghua He, Zhongzhi Jin, Keyan Wong, Derek F. Fang, Tao
author_facet	Zhang, Wentao Zhang, Qi Xu, Mingkun You, Mu Shen, Henghua He, Zhongzhi Jin, Keyan Wong, Derek F. Fang, Tao
contents	Crop disease diagnosis from field photographs faces two recurring problems: models that score well on benchmarks frequently hallucinate species names, and when predictions are correct, the reasoning behind them is typically inaccessible to the practitioner. This paper describes Agri-CPJ (Caption-Prompt-Judge), a training-free few-shot framework in which a large vision-language model first generates a structured morphological caption, iteratively refined through multi-dimensional quality gating, before any diagnostic question is answered. Two candidate responses are then generated from complementary viewpoints, and an LLM judge selects the stronger one based on domain-specific criteria. Caption refinement is the component with the largest individual impact: ablations confirm that skipping it consistently degrades downstream accuracy across both models tested. On CDDMBench, pairing GPT-5-Nano with GPT-5-mini-generated captions yields \textbf{+22.7} pp in disease classification and \textbf{+19.5} points in QA score over no-caption baselines. Evaluated without modification on AgMMU-MCQs, GPT-5-Nano reached 77.84\% and Qwen-VL-Chat reached 64.54\%, placing them at or above most open-source models of comparable scale despite the format shift from open-ended to multiple-choice. The structured caption and judge rationale together constitute a readable audit trail: a practitioner who disagrees with a diagnosis can identify the specific caption observation that was incorrect. Code and data are publicly available https://github.com/CPJ-Agricultural/CPJ-Agricultural-Diagnosis
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_23701
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Agri-CPJ: A Training-Free Explainable Framework for Agricultural Pest Diagnosis Using Caption-Prompt-Judge and LLM-as-a-Judge Zhang, Wentao Zhang, Qi Xu, Mingkun You, Mu Shen, Henghua He, Zhongzhi Jin, Keyan Wong, Derek F. Fang, Tao Computation and Language Artificial Intelligence Computer Vision and Pattern Recognition Crop disease diagnosis from field photographs faces two recurring problems: models that score well on benchmarks frequently hallucinate species names, and when predictions are correct, the reasoning behind them is typically inaccessible to the practitioner. This paper describes Agri-CPJ (Caption-Prompt-Judge), a training-free few-shot framework in which a large vision-language model first generates a structured morphological caption, iteratively refined through multi-dimensional quality gating, before any diagnostic question is answered. Two candidate responses are then generated from complementary viewpoints, and an LLM judge selects the stronger one based on domain-specific criteria. Caption refinement is the component with the largest individual impact: ablations confirm that skipping it consistently degrades downstream accuracy across both models tested. On CDDMBench, pairing GPT-5-Nano with GPT-5-mini-generated captions yields \textbf{+22.7} pp in disease classification and \textbf{+19.5} points in QA score over no-caption baselines. Evaluated without modification on AgMMU-MCQs, GPT-5-Nano reached 77.84\% and Qwen-VL-Chat reached 64.54\%, placing them at or above most open-source models of comparable scale despite the format shift from open-ended to multiple-choice. The structured caption and judge rationale together constitute a readable audit trail: a practitioner who disagrees with a diagnosis can identify the specific caption observation that was incorrect. Code and data are publicly available https://github.com/CPJ-Agricultural/CPJ-Agricultural-Diagnosis
title	Agri-CPJ: A Training-Free Explainable Framework for Agricultural Pest Diagnosis Using Caption-Prompt-Judge and LLM-as-a-Judge
topic	Computation and Language Artificial Intelligence Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2604.23701

Documents similaires