Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Choi, Minje, Pei, Jiaxin, Kumar, Sagar, Shu, Chang, Jurgens, David
Format:	Preprint
Published:	2023
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2305.14938
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916187250622464
author	Choi, Minje Pei, Jiaxin Kumar, Sagar Shu, Chang Jurgens, David
author_facet	Choi, Minje Pei, Jiaxin Kumar, Sagar Shu, Chang Jurgens, David
contents	Large language models (LLMs) have been shown to perform well at a variety of syntactic, discourse, and reasoning tasks. While LLMs are increasingly deployed in many forms including conversational agents that interact with humans, we lack a grounded benchmark to measure how well LLMs understand \textit{social} language. Here, we introduce a new theory-driven benchmark, SocKET, that contains 58 NLP tasks testing social knowledge which we group into five categories: humor & sarcasm, offensiveness, sentiment & emotion, and trustworthiness. In tests on the benchmark, we demonstrate that current models attain only moderate performance but reveal significant potential for task transfer among different types and categories of tasks, which were predicted from theory. Through zero-shot evaluations, we show that pretrained models already possess some innate but limited capabilities of social language understanding and training on one category of tasks can improve zero-shot testing on others. Our benchmark provides a systematic way to analyze model performance on an important dimension of language and points to clear room for improvement to build more socially-aware LLMs. The associated resources are released at https://github.com/minjechoi/SOCKET.
format	Preprint
id	arxiv_https___arxiv_org_abs_2305_14938
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark Choi, Minje Pei, Jiaxin Kumar, Sagar Shu, Chang Jurgens, David Computation and Language Artificial Intelligence Large language models (LLMs) have been shown to perform well at a variety of syntactic, discourse, and reasoning tasks. While LLMs are increasingly deployed in many forms including conversational agents that interact with humans, we lack a grounded benchmark to measure how well LLMs understand \textit{social} language. Here, we introduce a new theory-driven benchmark, SocKET, that contains 58 NLP tasks testing social knowledge which we group into five categories: humor & sarcasm, offensiveness, sentiment & emotion, and trustworthiness. In tests on the benchmark, we demonstrate that current models attain only moderate performance but reveal significant potential for task transfer among different types and categories of tasks, which were predicted from theory. Through zero-shot evaluations, we show that pretrained models already possess some innate but limited capabilities of social language understanding and training on one category of tasks can improve zero-shot testing on others. Our benchmark provides a systematic way to analyze model performance on an important dimension of language and points to clear room for improvement to build more socially-aware LLMs. The associated resources are released at https://github.com/minjechoi/SOCKET.
title	Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2305.14938

Similar Items