Saved in:
Bibliographic Details
Main Author: Wen, Yingxuan
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.14162
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914476572278784
author Wen, Yingxuan
author_facet Wen, Yingxuan
contents Authors often struggle to interpret peer review feedback, deriving false hope from polite comments or feeling confused by specific low scores. To investigate this, we construct a dataset of over 30,000 ICLR 2021-2025 submissions and compare acceptance prediction performance using numerical scores versus text reviews. Our experiments reveal a significant performance gap: score-based models achieve 91% accuracy, while text-based models reach only 81% even with large language models, indicating that textual information is considerably less reliable. To explain this phenomenon, we first analyze the 9% of samples that score-based models fail to predict, finding their score distributions exhibit high kurtosis and negative skewness, which suggests that individual low scores play a decisive role in rejection even when the average score falls near the borderline. We then examine why text-based accuracy significantly lags behind scores from a review sentiment perspective, revealing the prevalence of the Politeness Principle: reviews of rejected papers still contain more positive than negative sentiment words, masking the true rejection signal and making it difficult for authors to judge outcomes from text alone.
format Preprint
id arxiv_https___arxiv_org_abs_2604_14162
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Decoupling Scores and Text: The Politeness Principle in Peer Review
Wen, Yingxuan
Computation and Language
Machine Learning
Authors often struggle to interpret peer review feedback, deriving false hope from polite comments or feeling confused by specific low scores. To investigate this, we construct a dataset of over 30,000 ICLR 2021-2025 submissions and compare acceptance prediction performance using numerical scores versus text reviews. Our experiments reveal a significant performance gap: score-based models achieve 91% accuracy, while text-based models reach only 81% even with large language models, indicating that textual information is considerably less reliable. To explain this phenomenon, we first analyze the 9% of samples that score-based models fail to predict, finding their score distributions exhibit high kurtosis and negative skewness, which suggests that individual low scores play a decisive role in rejection even when the average score falls near the borderline. We then examine why text-based accuracy significantly lags behind scores from a review sentiment perspective, revealing the prevalence of the Politeness Principle: reviews of rejected papers still contain more positive than negative sentiment words, masking the true rejection signal and making it difficult for authors to judge outcomes from text alone.
title Decoupling Scores and Text: The Politeness Principle in Peer Review
topic Computation and Language
Machine Learning
url https://arxiv.org/abs/2604.14162