Saved in:
Bibliographic Details
Main Authors: Li, Xiang, Yao, Yiqun, Jiang, Xin, Fang, Xuezhi, Wang, Chao, Liu, Xinzhang, Wang, Zihan, Zhao, Yu, Wang, Xin, Huang, Yuyao, Song, Shuangyong, Li, Yongxiang, Zhang, Zheng, Zhao, Bo, Sun, Aixin, Wang, Yequan, He, Zhongjiang, Wang, Zhongyuan, Li, Xuelong, Huang, Tiejun
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2404.16645
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications. However, there is a notable paucity of detailed, open-sourced methodologies on efficiently scaling LLMs beyond 50 billion parameters with minimum trial-and-error cost and computational resources. In this report, we introduce Tele-FLM (aka FLM-2), a 52B open-sourced multilingual large language model that features a stable, efficient pre-training paradigm and enhanced factual judgment capabilities. Tele-FLM demonstrates superior multilingual language modeling abilities, measured by BPB on textual corpus. Besides, in both English and Chinese foundation model evaluation, it is comparable to strong open-sourced models that involve larger pre-training FLOPs, such as Llama2-70B and DeepSeek-67B. In addition to the model weights, we share the core designs, engineering practices, and training details, which we expect to benefit both the academic and industrial communities.