Saved in:
Bibliographic Details
Main Authors: Rahman, Salman, Issaka, Sheriff, Suvarna, Ashima, Liu, Genglin, Shiffer, James, Lee, Jaeyoung, Parvez, Md Rizwan, Palangi, Hamid, Feng, Shi, Peng, Nanyun, Choi, Yejin, Michael, Julian, Jiang, Liwei, Gabriel, Saadia
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2506.02175
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911239953711104
author Rahman, Salman
Issaka, Sheriff
Suvarna, Ashima
Liu, Genglin
Shiffer, James
Lee, Jaeyoung
Parvez, Md Rizwan
Palangi, Hamid
Feng, Shi
Peng, Nanyun
Choi, Yejin
Michael, Julian
Jiang, Liwei
Gabriel, Saadia
author_facet Rahman, Salman
Issaka, Sheriff
Suvarna, Ashima
Liu, Genglin
Shiffer, James
Lee, Jaeyoung
Parvez, Md Rizwan
Palangi, Hamid
Feng, Shi
Peng, Nanyun
Choi, Yejin
Michael, Julian
Jiang, Liwei
Gabriel, Saadia
contents As AI grows more powerful, it will increasingly shape how we understand the world. But with this influence comes the risk of amplifying misinformation and deepening social divides-especially on consequential topics where factual accuracy directly impacts well-being. Scalable Oversight aims to ensure AI systems remain truthful even when their capabilities exceed those of their evaluators. Yet when humans serve as evaluators, their own beliefs and biases can impair judgment. We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial factuality claims on COVID-19 and climate change where people hold strong prior beliefs. We conduct two studies. Study I recruits human judges with either mainstream or skeptical beliefs who evaluate claims through two protocols: debate (interaction with two AI advisors arguing opposing sides) or consultancy (interaction with a single AI advisor). Study II uses AI judges with and without human-like personas to evaluate the same protocols. In Study I, debate consistently improves human judgment accuracy and confidence calibration, outperforming consultancy by 4-10% across COVID-19 and climate change claims. The improvement is most significant for judges with mainstream beliefs (up to +15.2% accuracy on COVID-19 claims), though debate also helps skeptical judges who initially misjudge claims move toward accurate views (+4.7% accuracy). In Study II, AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%), suggesting their potential for supervising frontier AI models. These findings highlight AI debate as a promising path toward scalable, bias-resilient oversight in contested domains.
format Preprint
id arxiv_https___arxiv_org_abs_2506_02175
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle AI Debate Aids Assessment of Controversial Claims
Rahman, Salman
Issaka, Sheriff
Suvarna, Ashima
Liu, Genglin
Shiffer, James
Lee, Jaeyoung
Parvez, Md Rizwan
Palangi, Hamid
Feng, Shi
Peng, Nanyun
Choi, Yejin
Michael, Julian
Jiang, Liwei
Gabriel, Saadia
Computation and Language
As AI grows more powerful, it will increasingly shape how we understand the world. But with this influence comes the risk of amplifying misinformation and deepening social divides-especially on consequential topics where factual accuracy directly impacts well-being. Scalable Oversight aims to ensure AI systems remain truthful even when their capabilities exceed those of their evaluators. Yet when humans serve as evaluators, their own beliefs and biases can impair judgment. We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial factuality claims on COVID-19 and climate change where people hold strong prior beliefs. We conduct two studies. Study I recruits human judges with either mainstream or skeptical beliefs who evaluate claims through two protocols: debate (interaction with two AI advisors arguing opposing sides) or consultancy (interaction with a single AI advisor). Study II uses AI judges with and without human-like personas to evaluate the same protocols. In Study I, debate consistently improves human judgment accuracy and confidence calibration, outperforming consultancy by 4-10% across COVID-19 and climate change claims. The improvement is most significant for judges with mainstream beliefs (up to +15.2% accuracy on COVID-19 claims), though debate also helps skeptical judges who initially misjudge claims move toward accurate views (+4.7% accuracy). In Study II, AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%), suggesting their potential for supervising frontier AI models. These findings highlight AI debate as a promising path toward scalable, bias-resilient oversight in contested domains.
title AI Debate Aids Assessment of Controversial Claims
topic Computation and Language
url https://arxiv.org/abs/2506.02175