Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Bajcsy, Andrea, Fisac, Jaime F.
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Computers and Society Systems and Control I.2
Online Access:	https://arxiv.org/abs/2405.09794
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913400749031424
author	Bajcsy, Andrea Fisac, Jaime F.
author_facet	Bajcsy, Andrea Fisac, Jaime F.
contents	Artificial intelligence (AI) is interacting with people at an unprecedented scale, offering new avenues for immense positive impact, but also raising widespread concerns around the potential for individual and societal harm. Today, the predominant paradigm for human--AI safety focuses on fine-tuning the generative model's outputs to better agree with human-provided examples or feedback. In reality, however, the consequences of an AI model's outputs cannot be determined in isolation: they are tightly entangled with the responses and behavior of human users over time. In this paper, we distill key complementary lessons from AI safety and control systems safety, highlighting open challenges as well as key synergies between both fields. We then argue that meaningful safety assurances for advanced AI technologies require reasoning about how the feedback loop formed by AI outputs and human behavior may drive the interaction towards different outcomes. To this end, we introduce a unifying formalism to capture dynamic, safety-critical human--AI interactions and propose a concrete technical roadmap towards next-generation human-centered AI safety.
format	Preprint
id	arxiv_https___arxiv_org_abs_2405_09794
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Human-AI Safety: A Descendant of Generative AI and Control Systems Safety Bajcsy, Andrea Fisac, Jaime F. Artificial Intelligence Computers and Society Systems and Control I.2 Artificial intelligence (AI) is interacting with people at an unprecedented scale, offering new avenues for immense positive impact, but also raising widespread concerns around the potential for individual and societal harm. Today, the predominant paradigm for human--AI safety focuses on fine-tuning the generative model's outputs to better agree with human-provided examples or feedback. In reality, however, the consequences of an AI model's outputs cannot be determined in isolation: they are tightly entangled with the responses and behavior of human users over time. In this paper, we distill key complementary lessons from AI safety and control systems safety, highlighting open challenges as well as key synergies between both fields. We then argue that meaningful safety assurances for advanced AI technologies require reasoning about how the feedback loop formed by AI outputs and human behavior may drive the interaction towards different outcomes. To this end, we introduce a unifying formalism to capture dynamic, safety-critical human--AI interactions and propose a concrete technical roadmap towards next-generation human-centered AI safety.
title	Human-AI Safety: A Descendant of Generative AI and Control Systems Safety
topic	Artificial Intelligence Computers and Society Systems and Control I.2
url	https://arxiv.org/abs/2405.09794

Similar Items