Visualització del personal: :: Library Catalog

Guardat en:

Dades bibliogràfiques
Autors principals:	Ye, Rui, Pang, Xianghe, Chai, Jingyi, Chen, Jiaao, Yin, Zhenfei, Xiang, Zhen, Dong, Xiaowen, Shao, Jing, Chen, Siheng
Format:	Preprint
Publicat:	2024
Matèries:	Computation and Language Artificial Intelligence Human-Computer Interaction Machine Learning
Accés en línia:	https://arxiv.org/abs/2412.01708
Etiquetes:	Afegir etiqueta Sense etiquetes, Sigues el primer a etiquetar aquest registre!

_version_	1866912140940541952
author	Ye, Rui Pang, Xianghe Chai, Jingyi Chen, Jiaao Yin, Zhenfei Xiang, Zhen Dong, Xiaowen Shao, Jing Chen, Siheng
author_facet	Ye, Rui Pang, Xianghe Chai, Jingyi Chen, Jiaao Yin, Zhenfei Xiang, Zhen Dong, Xiaowen Shao, Jing Chen, Siheng
contents	Scholarly peer review is a cornerstone of scientific advancement, but the system is under strain due to increasing manuscript submissions and the labor-intensive nature of the process. Recent advancements in large language models (LLMs) have led to their integration into peer review, with promising results such as substantial overlaps between LLM- and human-generated reviews. However, the unchecked adoption of LLMs poses significant risks to the integrity of the peer review system. In this study, we comprehensively analyze the vulnerabilities of LLM-generated reviews by focusing on manipulation and inherent flaws. Our experiments show that injecting covert deliberate content into manuscripts allows authors to explicitly manipulate LLM reviews, leading to inflated ratings and reduced alignment with human reviews. In a simulation, we find that manipulating 5% of the reviews could potentially cause 12% of the papers to lose their position in the top 30% rankings. Implicit manipulation, where authors strategically highlight minor limitations in their papers, further demonstrates LLMs' susceptibility compared to human reviewers, with a 4.5 times higher consistency with disclosed limitations. Additionally, LLMs exhibit inherent flaws, such as potentially assigning higher ratings to incomplete papers compared to full papers and favoring well-known authors in single-blind review process. These findings highlight the risks of over-reliance on LLMs in peer review, underscoring that we are not yet ready for widespread adoption and emphasizing the need for robust safeguards.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_01708
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review Ye, Rui Pang, Xianghe Chai, Jingyi Chen, Jiaao Yin, Zhenfei Xiang, Zhen Dong, Xiaowen Shao, Jing Chen, Siheng Computation and Language Artificial Intelligence Human-Computer Interaction Machine Learning Scholarly peer review is a cornerstone of scientific advancement, but the system is under strain due to increasing manuscript submissions and the labor-intensive nature of the process. Recent advancements in large language models (LLMs) have led to their integration into peer review, with promising results such as substantial overlaps between LLM- and human-generated reviews. However, the unchecked adoption of LLMs poses significant risks to the integrity of the peer review system. In this study, we comprehensively analyze the vulnerabilities of LLM-generated reviews by focusing on manipulation and inherent flaws. Our experiments show that injecting covert deliberate content into manuscripts allows authors to explicitly manipulate LLM reviews, leading to inflated ratings and reduced alignment with human reviews. In a simulation, we find that manipulating 5% of the reviews could potentially cause 12% of the papers to lose their position in the top 30% rankings. Implicit manipulation, where authors strategically highlight minor limitations in their papers, further demonstrates LLMs' susceptibility compared to human reviewers, with a 4.5 times higher consistency with disclosed limitations. Additionally, LLMs exhibit inherent flaws, such as potentially assigning higher ratings to incomplete papers compared to full papers and favoring well-known authors in single-blind review process. These findings highlight the risks of over-reliance on LLMs in peer review, underscoring that we are not yet ready for widespread adoption and emphasizing the need for robust safeguards.
title	Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review
topic	Computation and Language Artificial Intelligence Human-Computer Interaction Machine Learning
url	https://arxiv.org/abs/2412.01708

Ítems similars