Saved in:
Bibliographic Details
Main Authors: Shi, Weiyan, Li, Ryan, Zhang, Yutong, Ziems, Caleb, yu, Chunhua, Horesh, Raya, de Paula, Rogério Abreu, Yang, Diyi
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2404.15238
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909179092926464
author Shi, Weiyan
Li, Ryan
Zhang, Yutong
Ziems, Caleb
yu, Chunhua
Horesh, Raya
de Paula, Rogério Abreu
Yang, Diyi
author_facet Shi, Weiyan
Li, Ryan
Zhang, Yutong
Ziems, Caleb
yu, Chunhua
Horesh, Raya
de Paula, Rogério Abreu
Yang, Diyi
contents To enhance language models' cultural awareness, we design a generalizable pipeline to construct cultural knowledge bases from different online communities on a massive scale. With the pipeline, we construct CultureBank, a knowledge base built upon users' self-narratives with 12K cultural descriptors sourced from TikTok and 11K from Reddit. Unlike previous cultural knowledge resources, CultureBank contains diverse views on cultural descriptors to allow flexible interpretation of cultural knowledge, and contextualized cultural scenarios to help grounded evaluation. With CultureBank, we evaluate different LLMs' cultural awareness, and identify areas for improvement. We also fine-tune a language model on CultureBank: experiments show that it achieves better performances on two downstream cultural tasks in a zero-shot setting. Finally, we offer recommendations based on our findings for future culturally aware language technologies. The project page is https://culturebank.github.io . The code and model is at https://github.com/SALT-NLP/CultureBank . The released CultureBank dataset is at https://huggingface.co/datasets/SALT-NLP/CultureBank .
format Preprint
id arxiv_https___arxiv_org_abs_2404_15238
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies
Shi, Weiyan
Li, Ryan
Zhang, Yutong
Ziems, Caleb
yu, Chunhua
Horesh, Raya
de Paula, Rogério Abreu
Yang, Diyi
Computation and Language
Artificial Intelligence
To enhance language models' cultural awareness, we design a generalizable pipeline to construct cultural knowledge bases from different online communities on a massive scale. With the pipeline, we construct CultureBank, a knowledge base built upon users' self-narratives with 12K cultural descriptors sourced from TikTok and 11K from Reddit. Unlike previous cultural knowledge resources, CultureBank contains diverse views on cultural descriptors to allow flexible interpretation of cultural knowledge, and contextualized cultural scenarios to help grounded evaluation. With CultureBank, we evaluate different LLMs' cultural awareness, and identify areas for improvement. We also fine-tune a language model on CultureBank: experiments show that it achieves better performances on two downstream cultural tasks in a zero-shot setting. Finally, we offer recommendations based on our findings for future culturally aware language technologies. The project page is https://culturebank.github.io . The code and model is at https://github.com/SALT-NLP/CultureBank . The released CultureBank dataset is at https://huggingface.co/datasets/SALT-NLP/CultureBank .
title CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2404.15238