Enregistré dans:
Détails bibliographiques
Auteurs principaux: Li, Xiang, Yao, Yiqun, Jiang, Xin, Fang, Xuezhi, Wang, Chao, Liu, Xinzhang, Wang, Zihan, Zhao, Yu, Wang, Xin, Huang, Yuyao, Song, Shuangyong, Li, Yongxiang, Zhang, Zheng, Zhao, Bo, Sun, Aixin, Wang, Yequan, He, Zhongjiang, Wang, Zhongyuan, Li, Xuelong, Huang, Tiejun
Format: Preprint
Publié: 2024
Sujets:
Accès en ligne:https://arxiv.org/abs/2407.02783
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866909240652726272
author Li, Xiang
Yao, Yiqun
Jiang, Xin
Fang, Xuezhi
Wang, Chao
Liu, Xinzhang
Wang, Zihan
Zhao, Yu
Wang, Xin
Huang, Yuyao
Song, Shuangyong
Li, Yongxiang
Zhang, Zheng
Zhao, Bo
Sun, Aixin
Wang, Yequan
He, Zhongjiang
Wang, Zhongyuan
Li, Xuelong
Huang, Tiejun
author_facet Li, Xiang
Yao, Yiqun
Jiang, Xin
Fang, Xuezhi
Wang, Chao
Liu, Xinzhang
Wang, Zihan
Zhao, Yu
Wang, Xin
Huang, Yuyao
Song, Shuangyong
Li, Yongxiang
Zhang, Zheng
Zhao, Bo
Sun, Aixin
Wang, Yequan
He, Zhongjiang
Wang, Zhongyuan
Li, Xuelong
Huang, Tiejun
contents Large Language Models (LLMs) represent a significant stride toward Artificial General Intelligence. As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacities exceeding 50 billion parameters. This technical report builds on our prior work with Tele-FLM (also known as FLM-2), a publicly available 52-billion-parameter model. We delve into two primary areas: we first discuss our observation of Supervised Fine-tuning (SFT) on Tele-FLM-52B, which supports the "less is more" approach for SFT data construction; second, we demonstrate our experiments and analyses on the best practices for progressively growing a model from 52 billion to 102 billion, and subsequently to 1 trillion parameters. We will open-source a 1T model checkpoint, namely Tele-FLM-1T, to advance further training and research.
format Preprint
id arxiv_https___arxiv_org_abs_2407_02783
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle 52B to 1T: Lessons Learned via Tele-FLM Series
Li, Xiang
Yao, Yiqun
Jiang, Xin
Fang, Xuezhi
Wang, Chao
Liu, Xinzhang
Wang, Zihan
Zhao, Yu
Wang, Xin
Huang, Yuyao
Song, Shuangyong
Li, Yongxiang
Zhang, Zheng
Zhao, Bo
Sun, Aixin
Wang, Yequan
He, Zhongjiang
Wang, Zhongyuan
Li, Xuelong
Huang, Tiejun
Computation and Language
Artificial Intelligence
Large Language Models (LLMs) represent a significant stride toward Artificial General Intelligence. As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacities exceeding 50 billion parameters. This technical report builds on our prior work with Tele-FLM (also known as FLM-2), a publicly available 52-billion-parameter model. We delve into two primary areas: we first discuss our observation of Supervised Fine-tuning (SFT) on Tele-FLM-52B, which supports the "less is more" approach for SFT data construction; second, we demonstrate our experiments and analyses on the best practices for progressively growing a model from 52 billion to 102 billion, and subsequently to 1 trillion parameters. We will open-source a 1T model checkpoint, namely Tele-FLM-1T, to advance further training and research.
title 52B to 1T: Lessons Learned via Tele-FLM Series
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2407.02783