Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.14818 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866917902500757504 |
|---|---|
| author | Li, Zhiqi Chen, Guo Liu, Shilong Wang, Shihao VS, Vibashan Ji, Yishen Lan, Shiyi Zhang, Hao Zhao, Yilin Radhakrishnan, Subhashree Chang, Nadine Sapra, Karan Deshmukh, Amala Sanjay Rintamaki, Tuomas Le, Matthieu Karmanov, Ilia Voegtle, Lukas Fischer, Philipp Huang, De-An Roman, Timo Lu, Tong Alvarez, Jose M. Catanzaro, Bryan Kautz, Jan Tao, Andrew Liu, Guilin Yu, Zhiding |
| author_facet | Li, Zhiqi Chen, Guo Liu, Shilong Wang, Shihao VS, Vibashan Ji, Yishen Lan, Shiyi Zhang, Hao Zhao, Yilin Radhakrishnan, Subhashree Chang, Nadine Sapra, Karan Deshmukh, Amala Sanjay Rintamaki, Tuomas Le, Matthieu Karmanov, Ilia Voegtle, Lukas Fischer, Philipp Huang, De-An Roman, Timo Lu, Tong Alvarez, Jose M. Catanzaro, Bryan Kautz, Jan Tao, Andrew Liu, Guilin Yu, Zhiding |
| contents | Recently, promising progress has been made by open-source vision-language models (VLMs) in bringing their capabilities closer to those of proprietary frontier models. However, most open-source models only publish their final model weights, leaving the critical details of data strategies and implementation largely opaque. In this work, we address VLM post-training from a data-centric perspective, showing the key role of data strategy in developing frontier VLMs. By studying and building our post-training data strategy from scratch, we share detailed insights into the development processes, aiming to benefit the development of competitive models for the open-source community. Our introduced data strategy, together with training recipes and model design, leads to a family of performant VLMs named Eagle2. Specifically, Eagle2-9B achieves state-of-the-art results across various multimodal benchmarks, matching certain competitive models with up to 70B parameters. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2501_14818 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models Li, Zhiqi Chen, Guo Liu, Shilong Wang, Shihao VS, Vibashan Ji, Yishen Lan, Shiyi Zhang, Hao Zhao, Yilin Radhakrishnan, Subhashree Chang, Nadine Sapra, Karan Deshmukh, Amala Sanjay Rintamaki, Tuomas Le, Matthieu Karmanov, Ilia Voegtle, Lukas Fischer, Philipp Huang, De-An Roman, Timo Lu, Tong Alvarez, Jose M. Catanzaro, Bryan Kautz, Jan Tao, Andrew Liu, Guilin Yu, Zhiding Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning Recently, promising progress has been made by open-source vision-language models (VLMs) in bringing their capabilities closer to those of proprietary frontier models. However, most open-source models only publish their final model weights, leaving the critical details of data strategies and implementation largely opaque. In this work, we address VLM post-training from a data-centric perspective, showing the key role of data strategy in developing frontier VLMs. By studying and building our post-training data strategy from scratch, we share detailed insights into the development processes, aiming to benefit the development of competitive models for the open-source community. Our introduced data strategy, together with training recipes and model design, leads to a family of performant VLMs named Eagle2. Specifically, Eagle2-9B achieves state-of-the-art results across various multimodal benchmarks, matching certain competitive models with up to 70B parameters. |
| title | Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models |
| topic | Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning |
| url | https://arxiv.org/abs/2501.14818 |