Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kim, Beomjun, Kim, Kangyeon, Kim, Sunwoo, Shin, Yeonsang, Ahn, Heejin
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2504.20924
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911191537811456
author	Kim, Beomjun Kim, Kangyeon Kim, Sunwoo Shin, Yeonsang Ahn, Heejin
author_facet	Kim, Beomjun Kim, Kangyeon Kim, Sunwoo Shin, Yeonsang Ahn, Heejin
contents	AI safety has emerged as a critical priority as these systems are increasingly deployed in real-world applications. We propose the first domain-agnostic AI safety ensuring framework that achieves strong safety guarantees while preserving high performance, grounded in rigorous theoretical foundations. Our framework includes: (1) an optimization component with chance constraints, (2) a safety classification model, (3) internal test data, (4) conservative testing procedures, (5) informative dataset quality measures, and (6) continuous approximate loss functions with gradient computation. Furthermore, to our knowledge, we mathematically establish the first scaling law in AI safety research, relating data quantity to safety-performance trade-offs. Experiments across reinforcement learning, natural language generation, and production planning validate our framework and demonstrate superior performance. Notably, in reinforcement learning, we achieve 3 collisions during 10M actions, compared with 1,000-3,000 for PPO-Lag baselines at equivalent performance levels -- a safety level unattainable by previous AI methods. We believe our framework opens a new foundation for safe AI deployment across safety-critical domains.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_20924
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Domain-Agnostic Scalable AI Safety Ensuring Framework Kim, Beomjun Kim, Kangyeon Kim, Sunwoo Shin, Yeonsang Ahn, Heejin Artificial Intelligence AI safety has emerged as a critical priority as these systems are increasingly deployed in real-world applications. We propose the first domain-agnostic AI safety ensuring framework that achieves strong safety guarantees while preserving high performance, grounded in rigorous theoretical foundations. Our framework includes: (1) an optimization component with chance constraints, (2) a safety classification model, (3) internal test data, (4) conservative testing procedures, (5) informative dataset quality measures, and (6) continuous approximate loss functions with gradient computation. Furthermore, to our knowledge, we mathematically establish the first scaling law in AI safety research, relating data quantity to safety-performance trade-offs. Experiments across reinforcement learning, natural language generation, and production planning validate our framework and demonstrate superior performance. Notably, in reinforcement learning, we achieve 3 collisions during 10M actions, compared with 1,000-3,000 for PPO-Lag baselines at equivalent performance levels -- a safety level unattainable by previous AI methods. We believe our framework opens a new foundation for safe AI deployment across safety-critical domains.
title	Domain-Agnostic Scalable AI Safety Ensuring Framework
topic	Artificial Intelligence
url	https://arxiv.org/abs/2504.20924

Similar Items