Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Cui, Shiyao, Zhang, Zhenyu, Chen, Yilong, Zhang, Wenyuan, Liu, Tianyun, Wang, Siqi, Liu, Tingwen
Format:	Preprint
Published:	2023
Subjects:	Computation and Language Cryptography and Security
Online Access:	https://arxiv.org/abs/2311.18580
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909437120217088
author	Cui, Shiyao Zhang, Zhenyu Chen, Yilong Zhang, Wenyuan Liu, Tianyun Wang, Siqi Liu, Tingwen
author_facet	Cui, Shiyao Zhang, Zhenyu Chen, Yilong Zhang, Wenyuan Liu, Tianyun Wang, Siqi Liu, Tingwen
contents	The widespread of generative artificial intelligence has heightened concerns about the potential harms posed by AI-generated texts, primarily stemming from factoid, unfair, and toxic content. Previous researchers have invested much effort in assessing the harmlessness of generative language models. However, existing benchmarks are struggling in the era of large language models (LLMs), due to the stronger language generation and instruction following capabilities, as well as wider applications. In this paper, we propose FFT, a new benchmark with 2116 elaborated-designed instances, for LLM harmlessness evaluation with factuality, fairness, and toxicity. To investigate the potential harms of LLMs, we evaluate 9 representative LLMs covering various parameter scales, training stages, and creators. Experiments show that the harmlessness of LLMs is still under-satisfactory, and extensive analysis derives some insightful findings that could inspire future research for harmless LLM research.
format	Preprint
id	arxiv_https___arxiv_org_abs_2311_18580
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	FFT: Towards Harmlessness Evaluation and Analysis for LLMs with Factuality, Fairness, Toxicity Cui, Shiyao Zhang, Zhenyu Chen, Yilong Zhang, Wenyuan Liu, Tianyun Wang, Siqi Liu, Tingwen Computation and Language Cryptography and Security The widespread of generative artificial intelligence has heightened concerns about the potential harms posed by AI-generated texts, primarily stemming from factoid, unfair, and toxic content. Previous researchers have invested much effort in assessing the harmlessness of generative language models. However, existing benchmarks are struggling in the era of large language models (LLMs), due to the stronger language generation and instruction following capabilities, as well as wider applications. In this paper, we propose FFT, a new benchmark with 2116 elaborated-designed instances, for LLM harmlessness evaluation with factuality, fairness, and toxicity. To investigate the potential harms of LLMs, we evaluate 9 representative LLMs covering various parameter scales, training stages, and creators. Experiments show that the harmlessness of LLMs is still under-satisfactory, and extensive analysis derives some insightful findings that could inspire future research for harmless LLM research.
title	FFT: Towards Harmlessness Evaluation and Analysis for LLMs with Factuality, Fairness, Toxicity
topic	Computation and Language Cryptography and Security
url	https://arxiv.org/abs/2311.18580

Similar Items