Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zolkepli, Husein, Razak, Aisyah, Adha, Kamarul, Nazhan, Ariff
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2401.13565
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916114305384448
author	Zolkepli, Husein Razak, Aisyah Adha, Kamarul Nazhan, Ariff
author_facet	Zolkepli, Husein Razak, Aisyah Adha, Kamarul Nazhan, Ariff
contents	In this paper, we present significant advancements in the pretraining of Mistral 7B, a large-scale language model, using a dataset of 32.6 GB, equivalent to 1.1 billion tokens. We explore the impact of extending the context length, releasing models with context lengths of 4096 and 32768 tokens, and further refining performance with a specialized 16384 context length instruction-tuned model, we called it Malaysian Mistral. Our experiments demonstrate the efficacy of continue pretraining and the influence of extended context lengths on Mistral 7B's language understanding capabilities. Additionally, we release a model specifically tuned with a 16384 context length instruction, showcasing its potential for capturing nuanced language intricacies. Furthermore, our research contributes to the benchmarking of Malaysian Mistral against prominent language models, including ChatGPT3.5 and Claude 2. We present compelling results indicating Malaysian Mistral's superior performance on Tatabahasa (Malay grammar) test set, particularly when fine-tuned with instructions. All models released at https://huggingface.co/collections/mesolitica/malaysian-mistral-7b-6528f2ec825f4bba46c1700c
format	Preprint
id	arxiv_https___arxiv_org_abs_2401_13565
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding Zolkepli, Husein Razak, Aisyah Adha, Kamarul Nazhan, Ariff Computation and Language In this paper, we present significant advancements in the pretraining of Mistral 7B, a large-scale language model, using a dataset of 32.6 GB, equivalent to 1.1 billion tokens. We explore the impact of extending the context length, releasing models with context lengths of 4096 and 32768 tokens, and further refining performance with a specialized 16384 context length instruction-tuned model, we called it Malaysian Mistral. Our experiments demonstrate the efficacy of continue pretraining and the influence of extended context lengths on Mistral 7B's language understanding capabilities. Additionally, we release a model specifically tuned with a 16384 context length instruction, showcasing its potential for capturing nuanced language intricacies. Furthermore, our research contributes to the benchmarking of Malaysian Mistral against prominent language models, including ChatGPT3.5 and Claude 2. We present compelling results indicating Malaysian Mistral's superior performance on Tatabahasa (Malay grammar) test set, particularly when fine-tuned with instructions. All models released at https://huggingface.co/collections/mesolitica/malaysian-mistral-7b-6528f2ec825f4bba46c1700c
title	Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding
topic	Computation and Language
url	https://arxiv.org/abs/2401.13565

Similar Items