Table of Contents: :: Library Catalog

Gardado en:

Detalles Bibliográficos
Main Authors:	Misbah, Ahmed, Farouk, Mohamed, AbdulAzim, Mustafa
Formato:	Recurso digital
Idioma:	árabe
Publicado:	Zenodo 2025
Subjects:	Arabic Conversations Chatbots
Acceso en liña:	https://doi.org/10.5281/zenodo.17855012
Tags:	Engadir etiqueta Sen Etiquetas, Sexa o primeiro en etiquetar este rexistro!

Table of Contents:

A synthetic dataset of 43,316 conversations with mean conversation length of 14.038 turns (rounded to 3 decimal places), median of 12 turns, range of 5-111 turns, and a total of 608,052 utterances (where every turn is an utterance). Dataset is partitioned into training and test sets. An 80/20 split was adopted (34,653 training conversations / 8,663 test conversations). The synthetic data generation process systematically iterated over 93 topics and 151 countries, creating 14,043 unique topic-country combinations. The generation pipeline was configured to produce 5 conversations per combination. After rigorous processing and train/test split based on techniques to mitigate leakge risks, the end result was 43,316 conversations.

Títulos similares