Saved in:
Bibliographic Details
Main Author: Vanroy, Bram
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2412.04092
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Language models have rapidly evolved, predominantly focusing on English while often neglecting extensive pretraining in other languages. This approach has required initiatives to adapt powerful, English-centric models to other linguistic contexts through finetuning. For Dutch, such a recent endeavour is ``GEITje'' a model originally derived from the English-based Mistral 7B. Building on this fundamental work, the current research extends the capabilities of GEITje by supervised finetuning on newly created high-quality synthetic conversational datasets, along with an additional preference alignment procedure on a synthetic feedback dataset. Both the developed models and the created datasets are openly available.