Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Takahashi, Kosuke, Omi, Takahiro, Arima, Kosuke, Ishigaki, Tatsuya
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence 68T50 I.2
Online Access:	https://arxiv.org/abs/2404.08262
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910686246862848
author	Takahashi, Kosuke Omi, Takahiro Arima, Kosuke Ishigaki, Tatsuya
author_facet	Takahashi, Kosuke Omi, Takahiro Arima, Kosuke Ishigaki, Tatsuya
contents	The development of Large Language Models (LLMs) in various languages has been advancing, but the combination of non-English languages with domain-specific contexts remains underexplored. This paper presents our findings from training and evaluating a Japanese business domain-specific LLM designed to better understand business-related documents, such as the news on current affairs, technical reports, and patents. Additionally, LLMs in this domain require regular updates to incorporate the most recent knowledge. Therefore, we also report our findings from the first experiments and evaluations involving updates to this LLM using the latest article data, which is an important problem setting that has not been addressed in previous research. From our experiments on a newly created benchmark dataset for question answering in the target domain, we found that (1) our pretrained model improves QA accuracy without losing general knowledge, and (2) a proper mixture of the latest and older texts in the training data for the update is necessary. Our pretrained model and business domain benchmark are publicly available to support further studies.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_08262
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Pretraining and Updates of Domain-Specific LLM: A Case Study in the Japanese Business Domain Takahashi, Kosuke Omi, Takahiro Arima, Kosuke Ishigaki, Tatsuya Computation and Language Artificial Intelligence 68T50 I.2 The development of Large Language Models (LLMs) in various languages has been advancing, but the combination of non-English languages with domain-specific contexts remains underexplored. This paper presents our findings from training and evaluating a Japanese business domain-specific LLM designed to better understand business-related documents, such as the news on current affairs, technical reports, and patents. Additionally, LLMs in this domain require regular updates to incorporate the most recent knowledge. Therefore, we also report our findings from the first experiments and evaluations involving updates to this LLM using the latest article data, which is an important problem setting that has not been addressed in previous research. From our experiments on a newly created benchmark dataset for question answering in the target domain, we found that (1) our pretrained model improves QA accuracy without losing general knowledge, and (2) a proper mixture of the latest and older texts in the training data for the update is necessary. Our pretrained model and business domain benchmark are publicly available to support further studies.
title	Pretraining and Updates of Domain-Specific LLM: A Case Study in the Japanese Business Domain
topic	Computation and Language Artificial Intelligence 68T50 I.2
url	https://arxiv.org/abs/2404.08262

Similar Items