Salvato in:
| Autore principale: | |
|---|---|
| Natura: | Recurso digital |
| Lingua: | spagnolo |
| Pubblicazione: |
Zenodo
2026
|
| Soggetti: | |
| Accesso online: | https://doi.org/10.5281/zenodo.20174009 |
| Tags: |
Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
|
| _version_ | 1866901647951659008 |
|---|---|
| author | Velázquez Gutiérrez, Araceli |
| author_facet | Velázquez Gutiérrez, Araceli |
| contents | <h1>Dataset Description</h1> <p>This dataset contains an anonymized collection of Mexican electronic invoices compliant with the CFDI 4.0 standard (Comprobante Fiscal Digital por Internet), used as part of the research project focused on the design and evaluation of an autonomous intelligent agent for invoice generation and validation in ERP environments.</p> <p>The dataset was constructed from real operational data obtained from enterprise resource planning (ERP) systems used by micro and small businesses in Mexico. To preserve privacy and comply with ethical and legal considerations, all personally identifiable and fiscally sensitive information was anonymized or replaced through masking and transformation procedures. Fields such as taxpayer names, RFC identifiers, addresses, folio references, UUIDs and other sensitive attributes were modified or removed while preserving the structural, semantic and relational characteristics required for research purposes.</p> <p>The collection includes representative CFDI 4.0 XML structures and associated metadata useful for:</p> <ul> <li> <p>Natural language to invoice generation research</p> </li> <li> <p>Intelligent agents for fiscal assistance</p> </li> <li> <p>Validation and recommendation systems for CFDI 4.0</p> </li> <li> <p>ERP automation and integration</p> </li> <li> <p>Retrieval-Augmented Generation (RAG) experiments</p> </li> <li> <p>Semantic search and embeddings over fiscal documents</p> </li> <li> <p>Machine learning and LLM-based analysis of electronic invoicing patterns</p> </li> <li> <p>SAT catalog recommendation and inference tasks</p> </li> <li> <p>Research on autonomous administrative agents</p> </li> </ul> <p>The dataset preserves key attributes relevant to the invoicing process, including invoice concepts, product/service codes, units, tax structures, payment methods, fiscal regimes, CFDI usage categories, timestamps and operational patterns commonly found in Mexican electronic invoicing workflows.</p> <p>This resource was developed within the research project “Agente Autónomo para Facturación Electrónica CFDI 4.0 basado en Inteligencia Artificial y Procesamiento de Lenguaje Natural”, oriented toward the integration of Large Language Models (LLMs), conversational interfaces and autonomous decision support mechanisms into ERP systems for microenterprise environments.</p> <p>The dataset is intended exclusively for academic, scientific and educational purposes. Users are responsible for ensuring compliance with applicable legal and ethical regulations regarding the use of fiscal and administrative data.</p> |
| format | Recurso digital |
| id | zenodo_https___doi_org_10_5281_zenodo_20174009 |
| institution | Zenodo |
| language | spa |
| publishDate | 2026 |
| publisher | Zenodo |
| record_format | zenodo |
| spellingShingle | FACTURAS ELECTRONICAS CFDI4.0 ANONIMIZADAS PARA TESIS Velázquez Gutiérrez, Araceli CFDI 4.0, electronic invoicing, ERP, autonomous agents, intelligent agents, LLM, artificial intelligence, NLP, RAG, embeddings, SAT, fiscal technology, Mexico, invoice automation, conversational AI, machine learning, semantic search, enterprise systems, dataset, XML invoices <h1>Dataset Description</h1> <p>This dataset contains an anonymized collection of Mexican electronic invoices compliant with the CFDI 4.0 standard (Comprobante Fiscal Digital por Internet), used as part of the research project focused on the design and evaluation of an autonomous intelligent agent for invoice generation and validation in ERP environments.</p> <p>The dataset was constructed from real operational data obtained from enterprise resource planning (ERP) systems used by micro and small businesses in Mexico. To preserve privacy and comply with ethical and legal considerations, all personally identifiable and fiscally sensitive information was anonymized or replaced through masking and transformation procedures. Fields such as taxpayer names, RFC identifiers, addresses, folio references, UUIDs and other sensitive attributes were modified or removed while preserving the structural, semantic and relational characteristics required for research purposes.</p> <p>The collection includes representative CFDI 4.0 XML structures and associated metadata useful for:</p> <ul> <li> <p>Natural language to invoice generation research</p> </li> <li> <p>Intelligent agents for fiscal assistance</p> </li> <li> <p>Validation and recommendation systems for CFDI 4.0</p> </li> <li> <p>ERP automation and integration</p> </li> <li> <p>Retrieval-Augmented Generation (RAG) experiments</p> </li> <li> <p>Semantic search and embeddings over fiscal documents</p> </li> <li> <p>Machine learning and LLM-based analysis of electronic invoicing patterns</p> </li> <li> <p>SAT catalog recommendation and inference tasks</p> </li> <li> <p>Research on autonomous administrative agents</p> </li> </ul> <p>The dataset preserves key attributes relevant to the invoicing process, including invoice concepts, product/service codes, units, tax structures, payment methods, fiscal regimes, CFDI usage categories, timestamps and operational patterns commonly found in Mexican electronic invoicing workflows.</p> <p>This resource was developed within the research project “Agente Autónomo para Facturación Electrónica CFDI 4.0 basado en Inteligencia Artificial y Procesamiento de Lenguaje Natural”, oriented toward the integration of Large Language Models (LLMs), conversational interfaces and autonomous decision support mechanisms into ERP systems for microenterprise environments.</p> <p>The dataset is intended exclusively for academic, scientific and educational purposes. Users are responsible for ensuring compliance with applicable legal and ethical regulations regarding the use of fiscal and administrative data.</p> |
| title | FACTURAS ELECTRONICAS CFDI4.0 ANONIMIZADAS PARA TESIS |
| topic | CFDI 4.0, electronic invoicing, ERP, autonomous agents, intelligent agents, LLM, artificial intelligence, NLP, RAG, embeddings, SAT, fiscal technology, Mexico, invoice automation, conversational AI, machine learning, semantic search, enterprise systems, dataset, XML invoices |
| url | https://doi.org/10.5281/zenodo.20174009 |