Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Zhiqi, Chen, Guo, Liu, Shilong, Wang, Shihao, VS, Vibashan, Ji, Yishen, Lan, Shiyi, Zhang, Hao, Zhao, Yilin, Radhakrishnan, Subhashree, Chang, Nadine, Sapra, Karan, Deshmukh, Amala Sanjay, Rintamaki, Tuomas, Le, Matthieu, Karmanov, Ilia, Voegtle, Lukas, Fischer, Philipp, Huang, De-An, Roman, Timo, Lu, Tong, Alvarez, Jose M., Catanzaro, Bryan, Kautz, Jan, Tao, Andrew, Liu, Guilin, Yu, Zhiding
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2501.14818
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917902500757504
author	Li, Zhiqi Chen, Guo Liu, Shilong Wang, Shihao VS, Vibashan Ji, Yishen Lan, Shiyi Zhang, Hao Zhao, Yilin Radhakrishnan, Subhashree Chang, Nadine Sapra, Karan Deshmukh, Amala Sanjay Rintamaki, Tuomas Le, Matthieu Karmanov, Ilia Voegtle, Lukas Fischer, Philipp Huang, De-An Roman, Timo Lu, Tong Alvarez, Jose M. Catanzaro, Bryan Kautz, Jan Tao, Andrew Liu, Guilin Yu, Zhiding
author_facet	Li, Zhiqi Chen, Guo Liu, Shilong Wang, Shihao VS, Vibashan Ji, Yishen Lan, Shiyi Zhang, Hao Zhao, Yilin Radhakrishnan, Subhashree Chang, Nadine Sapra, Karan Deshmukh, Amala Sanjay Rintamaki, Tuomas Le, Matthieu Karmanov, Ilia Voegtle, Lukas Fischer, Philipp Huang, De-An Roman, Timo Lu, Tong Alvarez, Jose M. Catanzaro, Bryan Kautz, Jan Tao, Andrew Liu, Guilin Yu, Zhiding
contents	Recently, promising progress has been made by open-source vision-language models (VLMs) in bringing their capabilities closer to those of proprietary frontier models. However, most open-source models only publish their final model weights, leaving the critical details of data strategies and implementation largely opaque. In this work, we address VLM post-training from a data-centric perspective, showing the key role of data strategy in developing frontier VLMs. By studying and building our post-training data strategy from scratch, we share detailed insights into the development processes, aiming to benefit the development of competitive models for the open-source community. Our introduced data strategy, together with training recipes and model design, leads to a family of performant VLMs named Eagle2. Specifically, Eagle2-9B achieves state-of-the-art results across various multimodal benchmarks, matching certain competitive models with up to 70B parameters.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_14818
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models Li, Zhiqi Chen, Guo Liu, Shilong Wang, Shihao VS, Vibashan Ji, Yishen Lan, Shiyi Zhang, Hao Zhao, Yilin Radhakrishnan, Subhashree Chang, Nadine Sapra, Karan Deshmukh, Amala Sanjay Rintamaki, Tuomas Le, Matthieu Karmanov, Ilia Voegtle, Lukas Fischer, Philipp Huang, De-An Roman, Timo Lu, Tong Alvarez, Jose M. Catanzaro, Bryan Kautz, Jan Tao, Andrew Liu, Guilin Yu, Zhiding Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning Recently, promising progress has been made by open-source vision-language models (VLMs) in bringing their capabilities closer to those of proprietary frontier models. However, most open-source models only publish their final model weights, leaving the critical details of data strategies and implementation largely opaque. In this work, we address VLM post-training from a data-centric perspective, showing the key role of data strategy in developing frontier VLMs. By studying and building our post-training data strategy from scratch, we share detailed insights into the development processes, aiming to benefit the development of competitive models for the open-source community. Our introduced data strategy, together with training recipes and model design, leads to a family of performant VLMs named Eagle2. Specifically, Eagle2-9B achieves state-of-the-art results across various multimodal benchmarks, matching certain competitive models with up to 70B parameters.
title	Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models
topic	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2501.14818

Similar Items