Saved in:
Bibliographic Details
Main Authors: Li, Yuan, Huang, Yue, Lin, Yuli, Wu, Siyuan, Wan, Yao, Sun, Lichao
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2401.17882
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916127714574336
author Li, Yuan
Huang, Yue
Lin, Yuli
Wu, Siyuan
Wan, Yao
Sun, Lichao
author_facet Li, Yuan
Huang, Yue
Lin, Yuli
Wu, Siyuan
Wan, Yao
Sun, Lichao
contents Do large language models (LLMs) exhibit any forms of awareness similar to humans? In this paper, we introduce AwareBench, a benchmark designed to evaluate awareness in LLMs. Drawing from theories in psychology and philosophy, we define awareness in LLMs as the ability to understand themselves as AI models and to exhibit social intelligence. Subsequently, we categorize awareness in LLMs into five dimensions, including capability, mission, emotion, culture, and perspective. Based on this taxonomy, we create a dataset called AwareEval, which contains binary, multiple-choice, and open-ended questions to assess LLMs' understandings of specific awareness dimensions. Our experiments, conducted on 13 LLMs, reveal that the majority of them struggle to fully recognize their capabilities and missions while demonstrating decent social intelligence. We conclude by connecting awareness of LLMs with AI alignment and safety, emphasizing its significance to the trustworthy and ethical development of LLMs. Our dataset and code are available at https://github.com/HowieHwong/Awareness-in-LLM.
format Preprint
id arxiv_https___arxiv_org_abs_2401_17882
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench
Li, Yuan
Huang, Yue
Lin, Yuli
Wu, Siyuan
Wan, Yao
Sun, Lichao
Computation and Language
Do large language models (LLMs) exhibit any forms of awareness similar to humans? In this paper, we introduce AwareBench, a benchmark designed to evaluate awareness in LLMs. Drawing from theories in psychology and philosophy, we define awareness in LLMs as the ability to understand themselves as AI models and to exhibit social intelligence. Subsequently, we categorize awareness in LLMs into five dimensions, including capability, mission, emotion, culture, and perspective. Based on this taxonomy, we create a dataset called AwareEval, which contains binary, multiple-choice, and open-ended questions to assess LLMs' understandings of specific awareness dimensions. Our experiments, conducted on 13 LLMs, reveal that the majority of them struggle to fully recognize their capabilities and missions while demonstrating decent social intelligence. We conclude by connecting awareness of LLMs with AI alignment and safety, emphasizing its significance to the trustworthy and ethical development of LLMs. Our dataset and code are available at https://github.com/HowieHwong/Awareness-in-LLM.
title I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench
topic Computation and Language
url https://arxiv.org/abs/2401.17882