Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	El-Sheikh, Abdelrahman, Elmogtaba, Ahmed, Darwish, Kareem, Elmallah, Muhammad, Elneima, Ashraf, Sawaf, Hassan
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computation and Language
Online-Zugang:	https://arxiv.org/abs/2408.05882
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866909284623712256
author	El-Sheikh, Abdelrahman Elmogtaba, Ahmed Darwish, Kareem Elmallah, Muhammad Elneima, Ashraf Sawaf, Hassan
author_facet	El-Sheikh, Abdelrahman Elmogtaba, Ahmed Darwish, Kareem Elmallah, Muhammad Elneima, Ashraf Sawaf, Hassan
contents	The debut of chatGPT and BARD has popularized instruction following text generation using LLMs, where a user can interrogate an LLM using natural language requests and obtain natural language answers that matches their requests. Training LLMs to respond in this manner requires a large number of worked out examples of user requests (aka prompts) with corresponding gold responses. In this paper, we introduce two methods for creating such prompts for Arabic cheaply and quickly. The first methods entails automatically translating existing prompt datasets from English, such as PromptSource and Super-NaturalInstructions, and then using machine translation quality estimation to retain high quality translations only. The second method involves creating natural language prompts on top of existing Arabic NLP datasets. Using these two methods we were able to create more than 67.4 million Arabic prompts that cover a variety of tasks including summarization, headline generation, grammar checking, open/closed question answering, creative writing, etc. We show that fine tuning an open 7 billion parameter large language model, namely base Qwen2 7B, enables it to outperform a state-of-the-art 70 billion parameter instruction tuned model, namely Llama3 70B, in handling Arabic prompts.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_05882
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Creating Arabic LLM Prompts at Scale El-Sheikh, Abdelrahman Elmogtaba, Ahmed Darwish, Kareem Elmallah, Muhammad Elneima, Ashraf Sawaf, Hassan Computation and Language The debut of chatGPT and BARD has popularized instruction following text generation using LLMs, where a user can interrogate an LLM using natural language requests and obtain natural language answers that matches their requests. Training LLMs to respond in this manner requires a large number of worked out examples of user requests (aka prompts) with corresponding gold responses. In this paper, we introduce two methods for creating such prompts for Arabic cheaply and quickly. The first methods entails automatically translating existing prompt datasets from English, such as PromptSource and Super-NaturalInstructions, and then using machine translation quality estimation to retain high quality translations only. The second method involves creating natural language prompts on top of existing Arabic NLP datasets. Using these two methods we were able to create more than 67.4 million Arabic prompts that cover a variety of tasks including summarization, headline generation, grammar checking, open/closed question answering, creative writing, etc. We show that fine tuning an open 7 billion parameter large language model, namely base Qwen2 7B, enables it to outperform a state-of-the-art 70 billion parameter instruction tuned model, namely Llama3 70B, in handling Arabic prompts.
title	Creating Arabic LLM Prompts at Scale
topic	Computation and Language
url	https://arxiv.org/abs/2408.05882

Ähnliche Einträge