Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Shen, Jianhao, Yuan, Ye, Mirzoyan, Srbuhi, Zhang, Ming, Wang, Chenguang
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2402.17205
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914806335799296
author	Shen, Jianhao Yuan, Ye Mirzoyan, Srbuhi Zhang, Ming Wang, Chenguang
author_facet	Shen, Jianhao Yuan, Ye Mirzoyan, Srbuhi Zhang, Ming Wang, Chenguang
contents	We introduce a new challenge to test the STEM skills of neural models. The problems in the real world often require solutions, combining knowledge from STEM (science, technology, engineering, and math). Unlike existing datasets, our dataset requires the understanding of multimodal vision-language information of STEM. Our dataset features one of the largest and most comprehensive datasets for the challenge. It includes 448 skills and 1,073,146 questions spanning all STEM subjects. Compared to existing datasets that often focus on examining expert-level ability, our dataset includes fundamental skills and questions designed based on the K-12 curriculum. We also add state-of-the-art foundation models such as CLIP and GPT-3.5-Turbo to our benchmark. Results show that the recent model advances only help master a very limited number of lower grade-level skills (2.5% in the third grade) in our dataset. In fact, these models are still well below (averaging 54.7%) the performance of elementary students, not to mention near expert-level performance. To understand and increase the performance on our dataset, we teach the models on a training split of our dataset. Even though we observe improved performance, the model performance remains relatively low compared to average elementary students. To solve STEM problems, we will need novel algorithmic innovations from the community.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_17205
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Measuring Vision-Language STEM Skills of Neural Models Shen, Jianhao Yuan, Ye Mirzoyan, Srbuhi Zhang, Ming Wang, Chenguang Computation and Language Artificial Intelligence Machine Learning We introduce a new challenge to test the STEM skills of neural models. The problems in the real world often require solutions, combining knowledge from STEM (science, technology, engineering, and math). Unlike existing datasets, our dataset requires the understanding of multimodal vision-language information of STEM. Our dataset features one of the largest and most comprehensive datasets for the challenge. It includes 448 skills and 1,073,146 questions spanning all STEM subjects. Compared to existing datasets that often focus on examining expert-level ability, our dataset includes fundamental skills and questions designed based on the K-12 curriculum. We also add state-of-the-art foundation models such as CLIP and GPT-3.5-Turbo to our benchmark. Results show that the recent model advances only help master a very limited number of lower grade-level skills (2.5% in the third grade) in our dataset. In fact, these models are still well below (averaging 54.7%) the performance of elementary students, not to mention near expert-level performance. To understand and increase the performance on our dataset, we teach the models on a training split of our dataset. Even though we observe improved performance, the model performance remains relatively low compared to average elementary students. To solve STEM problems, we will need novel algorithmic innovations from the community.
title	Measuring Vision-Language STEM Skills of Neural Models
topic	Computation and Language Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2402.17205

Similar Items