Saved in:
Bibliographic Details
Main Authors: Xia, Renqiu, Mao, Song, Yan, Xiangchao, Zhou, Hongbin, Zhang, Bo, Peng, Haoyang, Pi, Jiahao, Fu, Daocheng, Wu, Wenjie, Ye, Hancheng, Feng, Shiyang, Wang, Bin, Xu, Chao, He, Conghui, Cai, Pinlong, Dou, Min, Shi, Botian, Zhou, Sheng, Wang, Yongwei, Yan, Junchi, Wu, Fei, Qiao, Yu
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.11633
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913496564760576
author Xia, Renqiu
Mao, Song
Yan, Xiangchao
Zhou, Hongbin
Zhang, Bo
Peng, Haoyang
Pi, Jiahao
Fu, Daocheng
Wu, Wenjie
Ye, Hancheng
Feng, Shiyang
Wang, Bin
Xu, Chao
He, Conghui
Cai, Pinlong
Dou, Min
Shi, Botian
Zhou, Sheng
Wang, Yongwei
Wang, Bin
Yan, Junchi
Wu, Fei
Qiao, Yu
author_facet Xia, Renqiu
Mao, Song
Yan, Xiangchao
Zhou, Hongbin
Zhang, Bo
Peng, Haoyang
Pi, Jiahao
Fu, Daocheng
Wu, Wenjie
Ye, Hancheng
Feng, Shiyang
Wang, Bin
Xu, Chao
He, Conghui
Cai, Pinlong
Dou, Min
Shi, Botian
Zhou, Sheng
Wang, Yongwei
Wang, Bin
Yan, Junchi
Wu, Fei
Qiao, Yu
contents Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, large models still perform poorly on multi-page scientific document extraction and understanding tasks, and their capacity to process within-document data formats such as charts and equations remains under-explored. To address these issues, we present DocGenome, a structured document benchmark constructed by annotating 500K scientific documents from 153 disciplines in the arXiv open-access community, using our custom auto-labeling pipeline. DocGenome features four key characteristics: 1) Completeness: It is the first dataset to structure data from all modalities including 13 layout attributes along with their LaTeX source codes. 2) Logicality: It provides 6 logical relationships between different entities within each scientific document. 3) Diversity: It covers various document-oriented tasks, including document classification, visual grounding, document layout detection, document transformation, open-ended single-page QA and multi-page QA. 4) Correctness: It undergoes rigorous quality control checks conducted by a specialized team. We conduct extensive experiments to demonstrate the advantages of DocGenome and objectively evaluate the performance of large models on our benchmark.
format Preprint
id arxiv_https___arxiv_org_abs_2406_11633
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Xia, Renqiu
Mao, Song
Yan, Xiangchao
Zhou, Hongbin
Zhang, Bo
Peng, Haoyang
Pi, Jiahao
Fu, Daocheng
Wu, Wenjie
Ye, Hancheng
Feng, Shiyang
Wang, Bin
Xu, Chao
He, Conghui
Cai, Pinlong
Dou, Min
Shi, Botian
Zhou, Sheng
Wang, Yongwei
Wang, Bin
Yan, Junchi
Wu, Fei
Qiao, Yu
Computer Vision and Pattern Recognition
Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, large models still perform poorly on multi-page scientific document extraction and understanding tasks, and their capacity to process within-document data formats such as charts and equations remains under-explored. To address these issues, we present DocGenome, a structured document benchmark constructed by annotating 500K scientific documents from 153 disciplines in the arXiv open-access community, using our custom auto-labeling pipeline. DocGenome features four key characteristics: 1) Completeness: It is the first dataset to structure data from all modalities including 13 layout attributes along with their LaTeX source codes. 2) Logicality: It provides 6 logical relationships between different entities within each scientific document. 3) Diversity: It covers various document-oriented tasks, including document classification, visual grounding, document layout detection, document transformation, open-ended single-page QA and multi-page QA. 4) Correctness: It undergoes rigorous quality control checks conducted by a specialized team. We conduct extensive experiments to demonstrate the advantages of DocGenome and objectively evaluate the performance of large models on our benchmark.
title DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2406.11633