Saved in:
Bibliographic Details
Main Authors: Longpre, Shayne, Kapoor, Sayash, Klyman, Kevin, Ramaswami, Ashwin, Bommasani, Rishi, Blili-Hamelin, Borhane, Huang, Yangsibo, Skowron, Aviya, Yong, Zheng-Xin, Kotha, Suhas, Zeng, Yi, Shi, Weiyan, Yang, Xianjun, Southen, Reid, Robey, Alexander, Chao, Patrick, Yang, Diyi, Jia, Ruoxi, Kang, Daniel, Pentland, Sandy, Narayanan, Arvind, Liang, Percy, Henderson, Peter
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2403.04893
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917608186445824
author Longpre, Shayne
Kapoor, Sayash
Klyman, Kevin
Ramaswami, Ashwin
Bommasani, Rishi
Blili-Hamelin, Borhane
Huang, Yangsibo
Skowron, Aviya
Yong, Zheng-Xin
Kotha, Suhas
Zeng, Yi
Shi, Weiyan
Yang, Xianjun
Southen, Reid
Robey, Alexander
Chao, Patrick
Yang, Diyi
Jia, Ruoxi
Kang, Daniel
Pentland, Sandy
Narayanan, Arvind
Liang, Percy
Henderson, Peter
author_facet Longpre, Shayne
Kapoor, Sayash
Klyman, Kevin
Ramaswami, Ashwin
Bommasani, Rishi
Blili-Hamelin, Borhane
Huang, Yangsibo
Skowron, Aviya
Yong, Zheng-Xin
Kotha, Suhas
Zeng, Yi
Shi, Weiyan
Yang, Xianjun
Southen, Reid
Robey, Alexander
Chao, Patrick
Yang, Diyi
Jia, Ruoxi
Kang, Daniel
Pentland, Sandy
Narayanan, Arvind
Liang, Percy
Henderson, Peter
contents Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems. However, the terms of service and enforcement strategies used by prominent AI companies to deter model misuse have disincentives on good faith safety evaluations. This causes some researchers to fear that conducting such research or releasing their findings will result in account suspensions or legal reprisal. Although some companies offer researcher access programs, they are an inadequate substitute for independent research access, as they have limited community representation, receive inadequate funding, and lack independence from corporate incentives. We propose that major AI developers commit to providing a legal and technical safe harbor, indemnifying public interest safety research and protecting it from the threat of account suspensions or legal reprisal. These proposals emerged from our collective experience conducting safety, privacy, and trustworthiness research on generative AI systems, where norms and incentives could be better aligned with public interests, without exacerbating model misuse. We believe these commitments are a necessary step towards more inclusive and unimpeded community efforts to tackle the risks of generative AI.
format Preprint
id arxiv_https___arxiv_org_abs_2403_04893
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle A Safe Harbor for AI Evaluation and Red Teaming
Longpre, Shayne
Kapoor, Sayash
Klyman, Kevin
Ramaswami, Ashwin
Bommasani, Rishi
Blili-Hamelin, Borhane
Huang, Yangsibo
Skowron, Aviya
Yong, Zheng-Xin
Kotha, Suhas
Zeng, Yi
Shi, Weiyan
Yang, Xianjun
Southen, Reid
Robey, Alexander
Chao, Patrick
Yang, Diyi
Jia, Ruoxi
Kang, Daniel
Pentland, Sandy
Narayanan, Arvind
Liang, Percy
Henderson, Peter
Artificial Intelligence
Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems. However, the terms of service and enforcement strategies used by prominent AI companies to deter model misuse have disincentives on good faith safety evaluations. This causes some researchers to fear that conducting such research or releasing their findings will result in account suspensions or legal reprisal. Although some companies offer researcher access programs, they are an inadequate substitute for independent research access, as they have limited community representation, receive inadequate funding, and lack independence from corporate incentives. We propose that major AI developers commit to providing a legal and technical safe harbor, indemnifying public interest safety research and protecting it from the threat of account suspensions or legal reprisal. These proposals emerged from our collective experience conducting safety, privacy, and trustworthiness research on generative AI systems, where norms and incentives could be better aligned with public interests, without exacerbating model misuse. We believe these commitments are a necessary step towards more inclusive and unimpeded community efforts to tackle the risks of generative AI.
title A Safe Harbor for AI Evaluation and Red Teaming
topic Artificial Intelligence
url https://arxiv.org/abs/2403.04893