Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Duan, Haodong, Fang, Xinyu, Yang, Junming, Zhao, Xiangyu, Qiao, Yuxuan, Li, Mo, Agarwal, Amit, Chen, Zhe, Chen, Lin, Liu, Yuan, Ma, Yubo, Sun, Hailong, Zhang, Yifan, Lu, Shiyin, Wong, Tack Hwa, Wang, Weiyun, Zhou, Peiheng, Li, Xiaozhe, Fu, Chaoyou, Cui, Junbo, Chen, Jixuan, Song, Enxin, Mao, Song, Ding, Shengyuan, Liang, Tianhao, Zhang, Zicheng, Dong, Xiaoyi, Zang, Yuhang, Zhang, Pan, Wang, Jiaqi, Lin, Dahua, Chen, Kai
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2407.11691
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908506719780864
author	Duan, Haodong Fang, Xinyu Yang, Junming Zhao, Xiangyu Qiao, Yuxuan Li, Mo Agarwal, Amit Chen, Zhe Chen, Lin Liu, Yuan Ma, Yubo Sun, Hailong Zhang, Yifan Lu, Shiyin Wong, Tack Hwa Wang, Weiyun Zhou, Peiheng Li, Xiaozhe Fu, Chaoyou Cui, Junbo Chen, Jixuan Song, Enxin Mao, Song Ding, Shengyuan Liang, Tianhao Zhang, Zicheng Dong, Xiaoyi Zang, Yuhang Zhang, Pan Wang, Jiaqi Lin, Dahua Chen, Kai
author_facet	Duan, Haodong Fang, Xinyu Yang, Junming Zhao, Xiangyu Qiao, Yuxuan Li, Mo Agarwal, Amit Chen, Zhe Chen, Lin Liu, Yuan Ma, Yubo Sun, Hailong Zhang, Yifan Lu, Shiyin Wong, Tack Hwa Wang, Weiyun Zhou, Peiheng Li, Xiaozhe Fu, Chaoyou Cui, Junbo Chen, Jixuan Song, Enxin Mao, Song Ding, Shengyuan Liang, Tianhao Zhang, Zicheng Dong, Xiaoyi Zang, Yuhang Zhang, Pan Wang, Jiaqi Lin, Dahua Chen, Kai
contents	We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework for researchers and developers to evaluate existing multi-modality models and publish reproducible evaluation results. In VLMEvalKit, we implement over 200+ different large multi-modality models, including both proprietary APIs and open-source models, as well as more than 80 different multi-modal benchmarks. By implementing a single interface, new models can be easily added to the toolkit, while the toolkit automatically handles the remaining workloads, including data preparation, distributed inference, prediction post-processing, and metric calculation. Although the toolkit is currently mainly used for evaluating large vision-language models, its design is compatible with future updates that incorporate additional modalities, such as audio and video. Based on the evaluation results obtained with the toolkit, we host OpenVLM Leaderboard, a comprehensive leaderboard to track the progress of multi-modality learning research. The toolkit is released on https://github.com/open-compass/VLMEvalKit and is actively maintained.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_11691
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models Duan, Haodong Fang, Xinyu Yang, Junming Zhao, Xiangyu Qiao, Yuxuan Li, Mo Agarwal, Amit Chen, Zhe Chen, Lin Liu, Yuan Ma, Yubo Sun, Hailong Zhang, Yifan Lu, Shiyin Wong, Tack Hwa Wang, Weiyun Zhou, Peiheng Li, Xiaozhe Fu, Chaoyou Cui, Junbo Chen, Jixuan Song, Enxin Mao, Song Ding, Shengyuan Liang, Tianhao Zhang, Zicheng Dong, Xiaoyi Zang, Yuhang Zhang, Pan Wang, Jiaqi Lin, Dahua Chen, Kai Computer Vision and Pattern Recognition We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework for researchers and developers to evaluate existing multi-modality models and publish reproducible evaluation results. In VLMEvalKit, we implement over 200+ different large multi-modality models, including both proprietary APIs and open-source models, as well as more than 80 different multi-modal benchmarks. By implementing a single interface, new models can be easily added to the toolkit, while the toolkit automatically handles the remaining workloads, including data preparation, distributed inference, prediction post-processing, and metric calculation. Although the toolkit is currently mainly used for evaluating large vision-language models, its design is compatible with future updates that incorporate additional modalities, such as audio and video. Based on the evaluation results obtained with the toolkit, we host OpenVLM Leaderboard, a comprehensive leaderboard to track the progress of multi-modality learning research. The toolkit is released on https://github.com/open-compass/VLMEvalKit and is actively maintained.
title	VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2407.11691

Similar Items