Saved in:
Bibliographic Details
Main Authors: Choi, Minje, Pei, Jiaxin, Kumar, Sagar, Shu, Chang, Jurgens, David
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2305.14938
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916187250622464
author Choi, Minje
Pei, Jiaxin
Kumar, Sagar
Shu, Chang
Jurgens, David
author_facet Choi, Minje
Pei, Jiaxin
Kumar, Sagar
Shu, Chang
Jurgens, David
contents Large language models (LLMs) have been shown to perform well at a variety of syntactic, discourse, and reasoning tasks. While LLMs are increasingly deployed in many forms including conversational agents that interact with humans, we lack a grounded benchmark to measure how well LLMs understand \textit{social} language. Here, we introduce a new theory-driven benchmark, SocKET, that contains 58 NLP tasks testing social knowledge which we group into five categories: humor & sarcasm, offensiveness, sentiment & emotion, and trustworthiness. In tests on the benchmark, we demonstrate that current models attain only moderate performance but reveal significant potential for task transfer among different types and categories of tasks, which were predicted from theory. Through zero-shot evaluations, we show that pretrained models already possess some innate but limited capabilities of social language understanding and training on one category of tasks can improve zero-shot testing on others. Our benchmark provides a systematic way to analyze model performance on an important dimension of language and points to clear room for improvement to build more socially-aware LLMs. The associated resources are released at https://github.com/minjechoi/SOCKET.
format Preprint
id arxiv_https___arxiv_org_abs_2305_14938
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark
Choi, Minje
Pei, Jiaxin
Kumar, Sagar
Shu, Chang
Jurgens, David
Computation and Language
Artificial Intelligence
Large language models (LLMs) have been shown to perform well at a variety of syntactic, discourse, and reasoning tasks. While LLMs are increasingly deployed in many forms including conversational agents that interact with humans, we lack a grounded benchmark to measure how well LLMs understand \textit{social} language. Here, we introduce a new theory-driven benchmark, SocKET, that contains 58 NLP tasks testing social knowledge which we group into five categories: humor & sarcasm, offensiveness, sentiment & emotion, and trustworthiness. In tests on the benchmark, we demonstrate that current models attain only moderate performance but reveal significant potential for task transfer among different types and categories of tasks, which were predicted from theory. Through zero-shot evaluations, we show that pretrained models already possess some innate but limited capabilities of social language understanding and training on one category of tasks can improve zero-shot testing on others. Our benchmark provides a systematic way to analyze model performance on an important dimension of language and points to clear room for improvement to build more socially-aware LLMs. The associated resources are released at https://github.com/minjechoi/SOCKET.
title Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2305.14938