Saved in:
Bibliographic Details
Main Authors: Ghalebikesabi, Sahra, Bagdasaryan, Eugene, Yi, Ren, Yona, Itay, Shumailov, Ilia, Pappu, Aneesh, Shi, Chongyang, Weidinger, Laura, Stanforth, Robert, Berrada, Leonard, Kohli, Pushmeet, Huang, Po-Sen, Balle, Borja
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.02373
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Advanced AI assistants combine frontier LLMs and tool access to autonomously perform complex tasks on behalf of users. While the helpfulness of such assistants can increase dramatically with access to user information including emails and documents, this raises privacy concerns about assistants sharing inappropriate information with third parties without user supervision. To steer information-sharing assistants to behave in accordance with privacy expectations, we propose to operationalize contextual integrity (CI), a framework that equates privacy with the appropriate flow of information in a given context. In particular, we design and evaluate a number of strategies to steer assistants' information-sharing actions to be CI compliant. Our evaluation is based on a novel form filling benchmark composed of human annotations of common webform applications, and it reveals that prompting frontier LLMs to perform CI-based reasoning yields strong results.