Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Brown-Cohen, Jonah, Irving, Geoffrey, Piliouras, Georgios
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Computational Complexity Data Structures and Algorithms
Online Access:	https://arxiv.org/abs/2506.13609
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Training powerful AI systems to exhibit desired behaviors hinges on the ability to provide accurate human supervision on increasingly complex tasks. A promising approach to this problem is to amplify human judgement by leveraging the power of two competing AIs in a debate about the correct solution to a given problem. Prior theoretical work has provided a complexity-theoretic formalization of AI debate, and posed the problem of designing protocols for AI debate that guarantee the correctness of human judgements for as complex a class of problems as possible. Recursive debates, in which debaters decompose a complex problem into simpler subproblems, hold promise for growing the class of problems that can be accurately judged in a debate. However, existing protocols for recursive debate run into the obfuscated arguments problem: a dishonest debater can use a computationally efficient strategy that forces an honest opponent to solve a computationally intractable problem to win. We mitigate this problem with a new recursive debate protocol that, under certain stability assumptions, ensures that an honest debater can win with a strategy requiring computational efficiency comparable to their opponent.

Similar Items