Saved in:
Bibliographic Details
Main Authors: Pan, Yuqi, Zhao, Sadie, Tambe, Milind, Chen, Yiling
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.15331
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913130642145280
author Pan, Yuqi
Zhao, Sadie
Tambe, Milind
Chen, Yiling
author_facet Pan, Yuqi
Zhao, Sadie
Tambe, Milind
Chen, Yiling
contents We study a repeated information design setting in which the receiver, who is also the decision-maker, updates beliefs in a systematically biased way. More specifically, a distorted posterior in our model can be written as a convex combination of the prior and the Bayesian posterior, governed by a fixed but unknown parameter. Over repeated interactions, the sender chooses persuasive signaling schemes, observes only the receiver's realized actions, and seeks to minimize regret relative to a full-information oracle that knows the receiver's biased updating rule. We propose a safe exploration algorithm for learning the receiver's bias while maintaining high persuasion value. The algorithm exploits the asymmetric cost of probing: conservative probes incur only local loss, whereas overly aggressive probes may lose the persuasive opportunity entirely. For general finite state and action spaces and arbitrary bounded utilities, our method achieves $O(\log\log T)$ regret. A matching $Ω(\log\log T)$ lower bound shows that this rate is optimal. We further discuss the influence on receiver welfare, as well as extensions to jointly unknown prior and bias, and contextual settings with time-varying priors and utilities.
format Preprint
id arxiv_https___arxiv_org_abs_2605_15331
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Learning to Persuade a Biased Receiver
Pan, Yuqi
Zhao, Sadie
Tambe, Milind
Chen, Yiling
Computer Science and Game Theory
We study a repeated information design setting in which the receiver, who is also the decision-maker, updates beliefs in a systematically biased way. More specifically, a distorted posterior in our model can be written as a convex combination of the prior and the Bayesian posterior, governed by a fixed but unknown parameter. Over repeated interactions, the sender chooses persuasive signaling schemes, observes only the receiver's realized actions, and seeks to minimize regret relative to a full-information oracle that knows the receiver's biased updating rule. We propose a safe exploration algorithm for learning the receiver's bias while maintaining high persuasion value. The algorithm exploits the asymmetric cost of probing: conservative probes incur only local loss, whereas overly aggressive probes may lose the persuasive opportunity entirely. For general finite state and action spaces and arbitrary bounded utilities, our method achieves $O(\log\log T)$ regret. A matching $Ω(\log\log T)$ lower bound shows that this rate is optimal. We further discuss the influence on receiver welfare, as well as extensions to jointly unknown prior and bias, and contextual settings with time-varying priors and utilities.
title Learning to Persuade a Biased Receiver
topic Computer Science and Game Theory
url https://arxiv.org/abs/2605.15331