_version_ 1866917902500757504
author Li, Zhiqi
Chen, Guo
Liu, Shilong
Wang, Shihao
VS, Vibashan
Ji, Yishen
Lan, Shiyi
Zhang, Hao
Zhao, Yilin
Radhakrishnan, Subhashree
Chang, Nadine
Sapra, Karan
Deshmukh, Amala Sanjay
Rintamaki, Tuomas
Le, Matthieu
Karmanov, Ilia
Voegtle, Lukas
Fischer, Philipp
Huang, De-An
Roman, Timo
Lu, Tong
Alvarez, Jose M.
Catanzaro, Bryan
Kautz, Jan
Tao, Andrew
Liu, Guilin
Yu, Zhiding
author_facet Li, Zhiqi
Chen, Guo
Liu, Shilong
Wang, Shihao
VS, Vibashan
Ji, Yishen
Lan, Shiyi
Zhang, Hao
Zhao, Yilin
Radhakrishnan, Subhashree
Chang, Nadine
Sapra, Karan
Deshmukh, Amala Sanjay
Rintamaki, Tuomas
Le, Matthieu
Karmanov, Ilia
Voegtle, Lukas
Fischer, Philipp
Huang, De-An
Roman, Timo
Lu, Tong
Alvarez, Jose M.
Catanzaro, Bryan
Kautz, Jan
Tao, Andrew
Liu, Guilin
Yu, Zhiding
contents Recently, promising progress has been made by open-source vision-language models (VLMs) in bringing their capabilities closer to those of proprietary frontier models. However, most open-source models only publish their final model weights, leaving the critical details of data strategies and implementation largely opaque. In this work, we address VLM post-training from a data-centric perspective, showing the key role of data strategy in developing frontier VLMs. By studying and building our post-training data strategy from scratch, we share detailed insights into the development processes, aiming to benefit the development of competitive models for the open-source community. Our introduced data strategy, together with training recipes and model design, leads to a family of performant VLMs named Eagle2. Specifically, Eagle2-9B achieves state-of-the-art results across various multimodal benchmarks, matching certain competitive models with up to 70B parameters.
format Preprint
id arxiv_https___arxiv_org_abs_2501_14818
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models
Li, Zhiqi
Chen, Guo
Liu, Shilong
Wang, Shihao
VS, Vibashan
Ji, Yishen
Lan, Shiyi
Zhang, Hao
Zhao, Yilin
Radhakrishnan, Subhashree
Chang, Nadine
Sapra, Karan
Deshmukh, Amala Sanjay
Rintamaki, Tuomas
Le, Matthieu
Karmanov, Ilia
Voegtle, Lukas
Fischer, Philipp
Huang, De-An
Roman, Timo
Lu, Tong
Alvarez, Jose M.
Catanzaro, Bryan
Kautz, Jan
Tao, Andrew
Liu, Guilin
Yu, Zhiding
Computer Vision and Pattern Recognition
Artificial Intelligence
Machine Learning
Recently, promising progress has been made by open-source vision-language models (VLMs) in bringing their capabilities closer to those of proprietary frontier models. However, most open-source models only publish their final model weights, leaving the critical details of data strategies and implementation largely opaque. In this work, we address VLM post-training from a data-centric perspective, showing the key role of data strategy in developing frontier VLMs. By studying and building our post-training data strategy from scratch, we share detailed insights into the development processes, aiming to benefit the development of competitive models for the open-source community. Our introduced data strategy, together with training recipes and model design, leads to a family of performant VLMs named Eagle2. Specifically, Eagle2-9B achieves state-of-the-art results across various multimodal benchmarks, matching certain competitive models with up to 70B parameters.
title Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models
topic Computer Vision and Pattern Recognition
Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2501.14818