Saved in:
Bibliographic Details
Main Authors: Duan, Haodong, Fang, Xinyu, Yang, Junming, Zhao, Xiangyu, Qiao, Yuxuan, Li, Mo, Agarwal, Amit, Chen, Zhe, Chen, Lin, Liu, Yuan, Ma, Yubo, Sun, Hailong, Zhang, Yifan, Lu, Shiyin, Wong, Tack Hwa, Wang, Weiyun, Zhou, Peiheng, Li, Xiaozhe, Fu, Chaoyou, Cui, Junbo, Chen, Jixuan, Song, Enxin, Mao, Song, Ding, Shengyuan, Liang, Tianhao, Zhang, Zicheng, Dong, Xiaoyi, Zang, Yuhang, Zhang, Pan, Wang, Jiaqi, Lin, Dahua, Chen, Kai
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.11691
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908506719780864
author Duan, Haodong
Fang, Xinyu
Yang, Junming
Zhao, Xiangyu
Qiao, Yuxuan
Li, Mo
Agarwal, Amit
Chen, Zhe
Chen, Lin
Liu, Yuan
Ma, Yubo
Sun, Hailong
Zhang, Yifan
Lu, Shiyin
Wong, Tack Hwa
Wang, Weiyun
Zhou, Peiheng
Li, Xiaozhe
Fu, Chaoyou
Cui, Junbo
Chen, Jixuan
Song, Enxin
Mao, Song
Ding, Shengyuan
Liang, Tianhao
Zhang, Zicheng
Dong, Xiaoyi
Zang, Yuhang
Zhang, Pan
Wang, Jiaqi
Lin, Dahua
Chen, Kai
author_facet Duan, Haodong
Fang, Xinyu
Yang, Junming
Zhao, Xiangyu
Qiao, Yuxuan
Li, Mo
Agarwal, Amit
Chen, Zhe
Chen, Lin
Liu, Yuan
Ma, Yubo
Sun, Hailong
Zhang, Yifan
Lu, Shiyin
Wong, Tack Hwa
Wang, Weiyun
Zhou, Peiheng
Li, Xiaozhe
Fu, Chaoyou
Cui, Junbo
Chen, Jixuan
Song, Enxin
Mao, Song
Ding, Shengyuan
Liang, Tianhao
Zhang, Zicheng
Dong, Xiaoyi
Zang, Yuhang
Zhang, Pan
Wang, Jiaqi
Lin, Dahua
Chen, Kai
contents We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework for researchers and developers to evaluate existing multi-modality models and publish reproducible evaluation results. In VLMEvalKit, we implement over 200+ different large multi-modality models, including both proprietary APIs and open-source models, as well as more than 80 different multi-modal benchmarks. By implementing a single interface, new models can be easily added to the toolkit, while the toolkit automatically handles the remaining workloads, including data preparation, distributed inference, prediction post-processing, and metric calculation. Although the toolkit is currently mainly used for evaluating large vision-language models, its design is compatible with future updates that incorporate additional modalities, such as audio and video. Based on the evaluation results obtained with the toolkit, we host OpenVLM Leaderboard, a comprehensive leaderboard to track the progress of multi-modality learning research. The toolkit is released on https://github.com/open-compass/VLMEvalKit and is actively maintained.
format Preprint
id arxiv_https___arxiv_org_abs_2407_11691
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Duan, Haodong
Fang, Xinyu
Yang, Junming
Zhao, Xiangyu
Qiao, Yuxuan
Li, Mo
Agarwal, Amit
Chen, Zhe
Chen, Lin
Liu, Yuan
Ma, Yubo
Sun, Hailong
Zhang, Yifan
Lu, Shiyin
Wong, Tack Hwa
Wang, Weiyun
Zhou, Peiheng
Li, Xiaozhe
Fu, Chaoyou
Cui, Junbo
Chen, Jixuan
Song, Enxin
Mao, Song
Ding, Shengyuan
Liang, Tianhao
Zhang, Zicheng
Dong, Xiaoyi
Zang, Yuhang
Zhang, Pan
Wang, Jiaqi
Lin, Dahua
Chen, Kai
Computer Vision and Pattern Recognition
We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework for researchers and developers to evaluate existing multi-modality models and publish reproducible evaluation results. In VLMEvalKit, we implement over 200+ different large multi-modality models, including both proprietary APIs and open-source models, as well as more than 80 different multi-modal benchmarks. By implementing a single interface, new models can be easily added to the toolkit, while the toolkit automatically handles the remaining workloads, including data preparation, distributed inference, prediction post-processing, and metric calculation. Although the toolkit is currently mainly used for evaluating large vision-language models, its design is compatible with future updates that incorporate additional modalities, such as audio and video. Based on the evaluation results obtained with the toolkit, we host OpenVLM Leaderboard, a comprehensive leaderboard to track the progress of multi-modality learning research. The toolkit is released on https://github.com/open-compass/VLMEvalKit and is actively maintained.
title VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2407.11691