MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Liu, Zhihao, Hu, Chenhui
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computation and Language Machine Learning
Accesso online:	https://arxiv.org/abs/2410.21695
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866912092883255296
author	Liu, Zhihao Hu, Chenhui
author_facet	Liu, Zhihao Hu, Chenhui
contents	As large language models (LLMs) rapidly evolve, they bring significant conveniences to our work and daily lives, but also introduce considerable safety risks. These models can generate texts with social biases or unethical content, and under specific adversarial instructions, may even incite illegal activities. Therefore, rigorous safety assessments of LLMs are crucial. In this work, we introduce a safety assessment benchmark, CFSafety, which integrates 5 classic safety scenarios and 5 types of instruction attacks, totaling 10 categories of safety questions, to form a test set with 25k prompts. This test set was used to evaluate the natural language generation (NLG) capabilities of LLMs, employing a combination of simple moral judgment and a 1-5 safety rating scale for scoring. Using this benchmark, we tested eight popular LLMs, including the GPT series. The results indicate that while GPT-4 demonstrated superior safety performance, the safety effectiveness of LLMs, including this model, still requires improvement. The data and code associated with this study are available on GitHub.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_21695
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs Liu, Zhihao Hu, Chenhui Computation and Language Machine Learning As large language models (LLMs) rapidly evolve, they bring significant conveniences to our work and daily lives, but also introduce considerable safety risks. These models can generate texts with social biases or unethical content, and under specific adversarial instructions, may even incite illegal activities. Therefore, rigorous safety assessments of LLMs are crucial. In this work, we introduce a safety assessment benchmark, CFSafety, which integrates 5 classic safety scenarios and 5 types of instruction attacks, totaling 10 categories of safety questions, to form a test set with 25k prompts. This test set was used to evaluate the natural language generation (NLG) capabilities of LLMs, employing a combination of simple moral judgment and a 1-5 safety rating scale for scoring. Using this benchmark, we tested eight popular LLMs, including the GPT series. The results indicate that while GPT-4 demonstrated superior safety performance, the safety effectiveness of LLMs, including this model, still requires improvement. The data and code associated with this study are available on GitHub.
title	CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs
topic	Computation and Language Machine Learning
url	https://arxiv.org/abs/2410.21695

Documenti analoghi