Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ma, Xingjun, Gao, Yifeng, Wang, Yixu, Wang, Ruofan, Wang, Xin, Sun, Ye, Ding, Yifan, Xu, Hengyuan, Chen, Yunhao, Zhao, Yunhan, Huang, Hanxun, Li, Yige, Wu, Yutao, Zhang, Jiaming, Zheng, Xiang, Bai, Yang, Wu, Zuxuan, Qiu, Xipeng, Zhang, Jingfeng, Li, Yiming, Han, Xudong, Li, Haonan, Sun, Jun, Wang, Cong, Gu, Jindong, Wu, Baoyuan, Chen, Siheng, Zhang, Tianwei, Liu, Yang, Gong, Mingming, Liu, Tongliang, Pan, Shirui, Xie, Cihang, Pang, Tianyu, Dong, Yinpeng, Jia, Ruoxi, Zhang, Yang, Ma, Shiqing, Zhang, Xiangyu, Gong, Neil, Xiao, Chaowei, Erfani, Sarah, Baldwin, Tim, Li, Bo, Sugiyama, Masashi, Tao, Dacheng, Bailey, James, Jiang, Yu-Gang
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2502.05206
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914471856832512
author	Ma, Xingjun Gao, Yifeng Wang, Yixu Wang, Ruofan Wang, Xin Sun, Ye Ding, Yifan Xu, Hengyuan Chen, Yunhao Zhao, Yunhan Huang, Hanxun Li, Yige Wu, Yutao Zhang, Jiaming Zheng, Xiang Bai, Yang Wu, Zuxuan Qiu, Xipeng Zhang, Jingfeng Li, Yiming Han, Xudong Li, Haonan Sun, Jun Wang, Cong Gu, Jindong Wu, Baoyuan Chen, Siheng Zhang, Tianwei Liu, Yang Gong, Mingming Liu, Tongliang Pan, Shirui Xie, Cihang Pang, Tianyu Dong, Yinpeng Jia, Ruoxi Zhang, Yang Ma, Shiqing Zhang, Xiangyu Gong, Neil Xiao, Chaowei Erfani, Sarah Baldwin, Tim Li, Bo Sugiyama, Masashi Tao, Dacheng Bailey, James Jiang, Yu-Gang
author_facet	Ma, Xingjun Gao, Yifeng Wang, Yixu Wang, Ruofan Wang, Xin Sun, Ye Ding, Yifan Xu, Hengyuan Chen, Yunhao Zhao, Yunhan Huang, Hanxun Li, Yige Wu, Yutao Zhang, Jiaming Zheng, Xiang Bai, Yang Wu, Zuxuan Qiu, Xipeng Zhang, Jingfeng Li, Yiming Han, Xudong Li, Haonan Sun, Jun Wang, Cong Gu, Jindong Wu, Baoyuan Chen, Siheng Zhang, Tianwei Liu, Yang Gong, Mingming Liu, Tongliang Pan, Shirui Xie, Cihang Pang, Tianyu Dong, Yinpeng Jia, Ruoxi Zhang, Yang Ma, Shiqing Zhang, Xiangyu Gong, Neil Xiao, Chaowei Erfani, Sarah Baldwin, Tim Li, Bo Sugiyama, Masashi Tao, Dacheng Bailey, James Jiang, Yu-Gang
contents	The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI). These models are now foundational to a wide range of applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, and scientific discovery. However, their widespread deployment also exposes them to significant safety risks, raising concerns about robustness, reliability, and ethical implications. This survey provides a systematic review of current safety research on large models, covering Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-powered Agents. Our contributions are summarized as follows: (1) We present a comprehensive taxonomy of safety threats to these models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. (2) We review defense strategies proposed for each type of attacks if available and summarize the commonly used datasets and benchmarks for safety research. (3) Building on this, we identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices. More importantly, we highlight the necessity of collective efforts from the research community and international collaboration. Our work can serve as a useful reference for researchers and practitioners, fostering the ongoing development of comprehensive defense systems and platforms to safeguard AI models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_05206
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety Ma, Xingjun Gao, Yifeng Wang, Yixu Wang, Ruofan Wang, Xin Sun, Ye Ding, Yifan Xu, Hengyuan Chen, Yunhao Zhao, Yunhan Huang, Hanxun Li, Yige Wu, Yutao Zhang, Jiaming Zheng, Xiang Bai, Yang Wu, Zuxuan Qiu, Xipeng Zhang, Jingfeng Li, Yiming Han, Xudong Li, Haonan Sun, Jun Wang, Cong Gu, Jindong Wu, Baoyuan Chen, Siheng Zhang, Tianwei Liu, Yang Gong, Mingming Liu, Tongliang Pan, Shirui Xie, Cihang Pang, Tianyu Dong, Yinpeng Jia, Ruoxi Zhang, Yang Ma, Shiqing Zhang, Xiangyu Gong, Neil Xiao, Chaowei Erfani, Sarah Baldwin, Tim Li, Bo Sugiyama, Masashi Tao, Dacheng Bailey, James Jiang, Yu-Gang Cryptography and Security Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI). These models are now foundational to a wide range of applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, and scientific discovery. However, their widespread deployment also exposes them to significant safety risks, raising concerns about robustness, reliability, and ethical implications. This survey provides a systematic review of current safety research on large models, covering Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-powered Agents. Our contributions are summarized as follows: (1) We present a comprehensive taxonomy of safety threats to these models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. (2) We review defense strategies proposed for each type of attacks if available and summarize the commonly used datasets and benchmarks for safety research. (3) Building on this, we identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices. More importantly, we highlight the necessity of collective efforts from the research community and international collaboration. Our work can serve as a useful reference for researchers and practitioners, fostering the ongoing development of comprehensive defense systems and platforms to safeguard AI models.
title	Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety
topic	Cryptography and Security Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2502.05206

Similar Items