Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Giovannini, Simone, Coppini, Fabio, Gemelli, Andrea, Marinai, Simone
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computation and Language Artificial Intelligence
Online-Zugang:	https://arxiv.org/abs/2501.03403
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866911313361371136
author	Giovannini, Simone Coppini, Fabio Gemelli, Andrea Marinai, Simone
author_facet	Giovannini, Simone Coppini, Fabio Gemelli, Andrea Marinai, Simone
contents	We present a unified dataset for document Question-Answering (QA), which is obtained combining several public datasets related to Document AI and visually rich document understanding (VRDU). Our main contribution is twofold: on the one hand we reformulate existing Document AI tasks, such as Information Extraction (IE), into a Question-Answering task, making it a suitable resource for training and evaluating Large Language Models; on the other hand, we release the OCR of all the documents and include the exact position of the answer to be found in the document image as a bounding box. Using this dataset, we explore the impact of different prompting techniques (that might include bounding box information) on the performance of open-weight models, identifying the most effective approaches for document comprehension.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_03403
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations Giovannini, Simone Coppini, Fabio Gemelli, Andrea Marinai, Simone Computation and Language Artificial Intelligence We present a unified dataset for document Question-Answering (QA), which is obtained combining several public datasets related to Document AI and visually rich document understanding (VRDU). Our main contribution is twofold: on the one hand we reformulate existing Document AI tasks, such as Information Extraction (IE), into a Question-Answering task, making it a suitable resource for training and evaluating Large Language Models; on the other hand, we release the OCR of all the documents and include the exact position of the answer to be found in the document image as a bounding box. Using this dataset, we explore the impact of different prompting techniques (that might include bounding box information) on the performance of open-weight models, identifying the most effective approaches for document comprehension.
title	BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2501.03403

Ähnliche Einträge