Saved in:
Bibliographic Details
Main Authors: Kartiyasa, Adimulya, Cao, Bao Gia, Li, Boyang
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.12921
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914263432429568
author Kartiyasa, Adimulya
Cao, Bao Gia
Li, Boyang
author_facet Kartiyasa, Adimulya
Cao, Bao Gia
Li, Boyang
contents Recently there have been intensifying efforts to improve the understanding of Indonesian cultures by large language models (LLMs). An attractive source of cultural knowledge that has been largely overlooked is local journals of social science, which likely contain substantial cultural studies from a native perspective. We present a novel text dataset of journal article passages, created from 151 open-source Indonesian social science journals, called IndoSoSci. We demonstrate an effective recipe for injecting Indonesian cultural knowledge therein into LLMs: extracting the facts related to Indonesian culture, and apply retrieval-augmented generation (RAG) with LLM-generated hypothetical documents as queries during retrieval. The proposed recipe yields strong performance gains over several strong baselines on the IndoCulture benchmark. Additionally, by combining IndoSoSci with Indonesian Wikipedia, we set a new state-of-the-art accuracy on the IndoCulture benchmark.
format Preprint
id arxiv_https___arxiv_org_abs_2601_12921
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Injecting Knowledge from Social Science Journals to Improve Indonesian Cultural Understanding by LLMs
Kartiyasa, Adimulya
Cao, Bao Gia
Li, Boyang
Computation and Language
Recently there have been intensifying efforts to improve the understanding of Indonesian cultures by large language models (LLMs). An attractive source of cultural knowledge that has been largely overlooked is local journals of social science, which likely contain substantial cultural studies from a native perspective. We present a novel text dataset of journal article passages, created from 151 open-source Indonesian social science journals, called IndoSoSci. We demonstrate an effective recipe for injecting Indonesian cultural knowledge therein into LLMs: extracting the facts related to Indonesian culture, and apply retrieval-augmented generation (RAG) with LLM-generated hypothetical documents as queries during retrieval. The proposed recipe yields strong performance gains over several strong baselines on the IndoCulture benchmark. Additionally, by combining IndoSoSci with Indonesian Wikipedia, we set a new state-of-the-art accuracy on the IndoCulture benchmark.
title Injecting Knowledge from Social Science Journals to Improve Indonesian Cultural Understanding by LLMs
topic Computation and Language
url https://arxiv.org/abs/2601.12921