Guardat en:
Dades bibliogràfiques
Autor principal: Velázquez Gutiérrez, Araceli
Format: Recurso digital
Idioma:espanyol
Publicat: Zenodo 2026
Matèries:
Accés en línia:https://doi.org/10.5281/zenodo.20174009
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!
Taula de continguts:
  • <h1>Dataset Description</h1> <p>This dataset contains an anonymized collection of Mexican electronic invoices compliant with the CFDI 4.0 standard (Comprobante Fiscal Digital por Internet), used as part of the research project focused on the design and evaluation of an autonomous intelligent agent for invoice generation and validation in ERP environments.</p> <p>The dataset was constructed from real operational data obtained from enterprise resource planning (ERP) systems used by micro and small businesses in Mexico. To preserve privacy and comply with ethical and legal considerations, all personally identifiable and fiscally sensitive information was anonymized or replaced through masking and transformation procedures. Fields such as taxpayer names, RFC identifiers, addresses, folio references, UUIDs and other sensitive attributes were modified or removed while preserving the structural, semantic and relational characteristics required for research purposes.</p> <p>The collection includes representative CFDI 4.0 XML structures and associated metadata useful for:</p> <ul> <li> <p>Natural language to invoice generation research</p> </li> <li> <p>Intelligent agents for fiscal assistance</p> </li> <li> <p>Validation and recommendation systems for CFDI 4.0</p> </li> <li> <p>ERP automation and integration</p> </li> <li> <p>Retrieval-Augmented Generation (RAG) experiments</p> </li> <li> <p>Semantic search and embeddings over fiscal documents</p> </li> <li> <p>Machine learning and LLM-based analysis of electronic invoicing patterns</p> </li> <li> <p>SAT catalog recommendation and inference tasks</p> </li> <li> <p>Research on autonomous administrative agents</p> </li> </ul> <p>The dataset preserves key attributes relevant to the invoicing process, including invoice concepts, product/service codes, units, tax structures, payment methods, fiscal regimes, CFDI usage categories, timestamps and operational patterns commonly found in Mexican electronic invoicing workflows.</p> <p>This resource was developed within the research project “Agente Autónomo para Facturación Electrónica CFDI 4.0 basado en Inteligencia Artificial y Procesamiento de Lenguaje Natural”, oriented toward the integration of Large Language Models (LLMs), conversational interfaces and autonomous decision support mechanisms into ERP systems for microenterprise environments.</p> <p>The dataset is intended exclusively for academic, scientific and educational purposes. Users are responsible for ensuring compliance with applicable legal and ethical regulations regarding the use of fiscal and administrative data.</p>