Saved in:
Bibliographic Details
Main Authors: Cui, Shiyao, Zhang, Zhenyu, Chen, Yilong, Zhang, Wenyuan, Liu, Tianyun, Wang, Siqi, Liu, Tingwen
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2311.18580
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909437120217088
author Cui, Shiyao
Zhang, Zhenyu
Chen, Yilong
Zhang, Wenyuan
Liu, Tianyun
Wang, Siqi
Liu, Tingwen
author_facet Cui, Shiyao
Zhang, Zhenyu
Chen, Yilong
Zhang, Wenyuan
Liu, Tianyun
Wang, Siqi
Liu, Tingwen
contents The widespread of generative artificial intelligence has heightened concerns about the potential harms posed by AI-generated texts, primarily stemming from factoid, unfair, and toxic content. Previous researchers have invested much effort in assessing the harmlessness of generative language models. However, existing benchmarks are struggling in the era of large language models (LLMs), due to the stronger language generation and instruction following capabilities, as well as wider applications. In this paper, we propose FFT, a new benchmark with 2116 elaborated-designed instances, for LLM harmlessness evaluation with factuality, fairness, and toxicity. To investigate the potential harms of LLMs, we evaluate 9 representative LLMs covering various parameter scales, training stages, and creators. Experiments show that the harmlessness of LLMs is still under-satisfactory, and extensive analysis derives some insightful findings that could inspire future research for harmless LLM research.
format Preprint
id arxiv_https___arxiv_org_abs_2311_18580
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle FFT: Towards Harmlessness Evaluation and Analysis for LLMs with Factuality, Fairness, Toxicity
Cui, Shiyao
Zhang, Zhenyu
Chen, Yilong
Zhang, Wenyuan
Liu, Tianyun
Wang, Siqi
Liu, Tingwen
Computation and Language
Cryptography and Security
The widespread of generative artificial intelligence has heightened concerns about the potential harms posed by AI-generated texts, primarily stemming from factoid, unfair, and toxic content. Previous researchers have invested much effort in assessing the harmlessness of generative language models. However, existing benchmarks are struggling in the era of large language models (LLMs), due to the stronger language generation and instruction following capabilities, as well as wider applications. In this paper, we propose FFT, a new benchmark with 2116 elaborated-designed instances, for LLM harmlessness evaluation with factuality, fairness, and toxicity. To investigate the potential harms of LLMs, we evaluate 9 representative LLMs covering various parameter scales, training stages, and creators. Experiments show that the harmlessness of LLMs is still under-satisfactory, and extensive analysis derives some insightful findings that could inspire future research for harmless LLM research.
title FFT: Towards Harmlessness Evaluation and Analysis for LLMs with Factuality, Fairness, Toxicity
topic Computation and Language
Cryptography and Security
url https://arxiv.org/abs/2311.18580