Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Li, Zhiqi, Chen, Guo, Liu, Shilong, Wang, Shihao, VS, Vibashan, Ji, Yishen, Lan, Shiyi, Zhang, Hao, Zhao, Yilin, Radhakrishnan, Subhashree, Chang, Nadine, Sapra, Karan, Deshmukh, Amala Sanjay, Rintamaki, Tuomas, Le, Matthieu, Karmanov, Ilia, Voegtle, Lukas, Fischer, Philipp, Huang, De-An, Roman, Timo, Lu, Tong, Alvarez, Jose M., Catanzaro, Bryan, Kautz, Jan, Tao, Andrew, Liu, Guilin, Yu, Zhiding
Format: Preprint
Veröffentlicht: 2025
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2501.14818
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Inhaltsangabe:
  • Recently, promising progress has been made by open-source vision-language models (VLMs) in bringing their capabilities closer to those of proprietary frontier models. However, most open-source models only publish their final model weights, leaving the critical details of data strategies and implementation largely opaque. In this work, we address VLM post-training from a data-centric perspective, showing the key role of data strategy in developing frontier VLMs. By studying and building our post-training data strategy from scratch, we share detailed insights into the development processes, aiming to benefit the development of competitive models for the open-source community. Our introduced data strategy, together with training recipes and model design, leads to a family of performant VLMs named Eagle2. Specifically, Eagle2-9B achieves state-of-the-art results across various multimodal benchmarks, matching certain competitive models with up to 70B parameters.