Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Zhexin, Lei, Leqi, Yang, Junxiao, Huang, Xijie, Lu, Yida, Cui, Shiyao, Chen, Renmiao, Zhang, Qinglin, Wang, Xinyuan, Wang, Hao, Li, Hao, Lei, Xianqi, Pan, Chengwei, Sha, Lei, Wang, Hongning, Huang, Minlie
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.16776
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910842123976704
author	Zhang, Zhexin Lei, Leqi Yang, Junxiao Huang, Xijie Lu, Yida Cui, Shiyao Chen, Renmiao Zhang, Qinglin Wang, Xinyuan Wang, Hao Li, Hao Lei, Xianqi Pan, Chengwei Sha, Lei Wang, Hongning Huang, Minlie
author_facet	Zhang, Zhexin Lei, Leqi Yang, Junxiao Huang, Xijie Lu, Yida Cui, Shiyao Chen, Renmiao Zhang, Qinglin Wang, Xinyuan Wang, Hao Li, Hao Lei, Xianqi Pan, Chengwei Sha, Lei Wang, Hongning Huang, Minlie
contents	As AI models are increasingly deployed across diverse real-world scenarios, ensuring their safety remains a critical yet underexplored challenge. While substantial efforts have been made to evaluate and enhance AI safety, the lack of a standardized framework and comprehensive toolkit poses significant obstacles to systematic research and practical adoption. To bridge this gap, we introduce AISafetyLab, a unified framework and toolkit that integrates representative attack, defense, and evaluation methodologies for AI safety. AISafetyLab features an intuitive interface that enables developers to seamlessly apply various techniques while maintaining a well-structured and extensible codebase for future advancements. Additionally, we conduct empirical studies on Vicuna, analyzing different attack and defense strategies to provide valuable insights into their comparative effectiveness. To facilitate ongoing research and development in AI safety, AISafetyLab is publicly available at https://github.com/thu-coai/AISafetyLab, and we are committed to its continuous maintenance and improvement.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_16776
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement Zhang, Zhexin Lei, Leqi Yang, Junxiao Huang, Xijie Lu, Yida Cui, Shiyao Chen, Renmiao Zhang, Qinglin Wang, Xinyuan Wang, Hao Li, Hao Lei, Xianqi Pan, Chengwei Sha, Lei Wang, Hongning Huang, Minlie Computation and Language Artificial Intelligence As AI models are increasingly deployed across diverse real-world scenarios, ensuring their safety remains a critical yet underexplored challenge. While substantial efforts have been made to evaluate and enhance AI safety, the lack of a standardized framework and comprehensive toolkit poses significant obstacles to systematic research and practical adoption. To bridge this gap, we introduce AISafetyLab, a unified framework and toolkit that integrates representative attack, defense, and evaluation methodologies for AI safety. AISafetyLab features an intuitive interface that enables developers to seamlessly apply various techniques while maintaining a well-structured and extensible codebase for future advancements. Additionally, we conduct empirical studies on Vicuna, analyzing different attack and defense strategies to provide valuable insights into their comparative effectiveness. To facilitate ongoing research and development in AI safety, AISafetyLab is publicly available at https://github.com/thu-coai/AISafetyLab, and we are committed to its continuous maintenance and improvement.
title	AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2502.16776

Similar Items