محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| التنسيق: | Recurso digital |
| اللغة: | |
| منشور في: |
Zenodo
2025
|
| الوصول للمادة أونلاين: | https://doi.org/10.5281/zenodo.17089173 |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
جدول المحتويات:
- <h2>Database Creation</h2> <p>In this work, two datasets were created: a <strong>real dataset</strong> and a <strong>synthetic dataset</strong>.</p> <ul> <li> <p>The <strong>real dataset</strong> was obtained through the collection of stock information for <strong>7,984 distinct products</strong> across <strong>177 different online stores (suppliers)</strong> in a marketplace. The data was collected between <strong>October 8 and October 11, 2023</strong>.</p> </li> <li> <p>The <strong>synthetic dataset</strong> includes all records from the real dataset, plus an expansion that ensures every supplier has all products from the base in stock. It was created to analyze the performance of solution approaches in a context with a significantly larger search space.</p> </li> </ul> <h3>Real Dataset</h3> <p>The real dataset contains stock records from <strong>177 suppliers</strong>, <strong>7,984 distinct products</strong>, and a total of <strong>635,723 stock records</strong>.</p> <p>To preserve the privacy of the suppliers, all identifiers related to stores and products were anonymized.<br>Each record contains:</p> <ul> <li> <p>Supplier name (anonymized)</p> </li> <li> <p>Product name (anonymized)</p> </li> <li> <p>Price</p> </li> <li> <p>Quantity in stock</p> </li> <li> <p>Date of last collection</p> </li> </ul> <p>Suppliers in this dataset vary significantly in terms of stock capacity. For example, only <strong>4 suppliers</strong> have more than <strong>7,000 distinct products</strong>, while <strong>50 suppliers</strong> have more than <strong>5,000 products</strong>, and <strong>20 suppliers</strong> have fewer than <strong>1,000 products</strong> in stock.</p> <h3>Synthetic Dataset</h3> <p>The synthetic dataset is an expansion of the real dataset, ensuring that <strong>every supplier provides all 7,984 products</strong>.<br>To build it, the real dataset was first replicated. Then, for each supplier–product pair that was originally missing, the following process was applied:</p> <ul> <li> <p><strong>Price generation</strong>: a non-negative random sample was drawn from a normal distribution defined by the mean and standard deviation of the prices of the same product across suppliers in the real dataset.</p> </li> <li> <p><strong>Quantity generation</strong>: a similar process was applied, using stock quantities instead of prices.</p> </li> </ul> <p>As a result, each of the 177 suppliers has all 7,984 products in stock, totaling <strong>1,413,168 records</strong>. This synthetic dataset enables experiments in scenarios where the availability of products is maximized, creating a larger and more challenging search space.</p>