Gardado en:
Detalles Bibliográficos
Main Authors: Misbah, Ahmed, Farouk, Mohamed, AbdulAzim, Mustafa
Formato: Recurso digital
Idioma:árabe
Publicado: Zenodo 2025
Subjects:
Acceso en liña:https://doi.org/10.5281/zenodo.17855012
Tags: Engadir etiqueta
Sen Etiquetas, Sexa o primeiro en etiquetar este rexistro!
Table of Contents:
  • <p>A synthetic dataset of 43,316 conversations with mean conversation length of 14.038 turns (rounded to 3 decimal places), median of 12 turns, range of 5-111 turns, and a total of 608,052 utterances (where every turn is an utterance).</p> <p>Dataset is partitioned into training and test sets. An 80/20 split was adopted (34,653 training conversations / 8,663 test conversations).</p> <p>The synthetic data generation process systematically iterated over 93 topics and 151 countries, creating 14,043 unique topic-country combinations. The generation pipeline was configured to produce 5 conversations per combination. After rigorous processing and train/test split based on techniques to mitigate leakge risks, the end result was 43,316 conversations.</p> <p> </p> <p> </p>