Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Li, Xiang, Yao, Yiqun, Jiang, Xin, Fang, Xuezhi, Wang, Chao, Liu, Xinzhang, Wang, Zihan, Zhao, Yu, Wang, Xin, Huang, Yuyao, Song, Shuangyong, Li, Yongxiang, Zhang, Zheng, Zhao, Bo, Sun, Aixin, Wang, Yequan, He, Zhongjiang, Wang, Zhongyuan, Li, Xuelong, Huang, Tiejun
Format:	Preprint
Publié:	2024
Sujets:	Computation and Language Artificial Intelligence
Accès en ligne:	https://arxiv.org/abs/2407.02783
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866909240652726272
author	Li, Xiang Yao, Yiqun Jiang, Xin Fang, Xuezhi Wang, Chao Liu, Xinzhang Wang, Zihan Zhao, Yu Wang, Xin Huang, Yuyao Song, Shuangyong Li, Yongxiang Zhang, Zheng Zhao, Bo Sun, Aixin Wang, Yequan He, Zhongjiang Wang, Zhongyuan Li, Xuelong Huang, Tiejun
author_facet	Li, Xiang Yao, Yiqun Jiang, Xin Fang, Xuezhi Wang, Chao Liu, Xinzhang Wang, Zihan Zhao, Yu Wang, Xin Huang, Yuyao Song, Shuangyong Li, Yongxiang Zhang, Zheng Zhao, Bo Sun, Aixin Wang, Yequan He, Zhongjiang Wang, Zhongyuan Li, Xuelong Huang, Tiejun
contents	Large Language Models (LLMs) represent a significant stride toward Artificial General Intelligence. As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacities exceeding 50 billion parameters. This technical report builds on our prior work with Tele-FLM (also known as FLM-2), a publicly available 52-billion-parameter model. We delve into two primary areas: we first discuss our observation of Supervised Fine-tuning (SFT) on Tele-FLM-52B, which supports the "less is more" approach for SFT data construction; second, we demonstrate our experiments and analyses on the best practices for progressively growing a model from 52 billion to 102 billion, and subsequently to 1 trillion parameters. We will open-source a 1T model checkpoint, namely Tele-FLM-1T, to advance further training and research.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_02783
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	52B to 1T: Lessons Learned via Tele-FLM Series Li, Xiang Yao, Yiqun Jiang, Xin Fang, Xuezhi Wang, Chao Liu, Xinzhang Wang, Zihan Zhao, Yu Wang, Xin Huang, Yuyao Song, Shuangyong Li, Yongxiang Zhang, Zheng Zhao, Bo Sun, Aixin Wang, Yequan He, Zhongjiang Wang, Zhongyuan Li, Xuelong Huang, Tiejun Computation and Language Artificial Intelligence Large Language Models (LLMs) represent a significant stride toward Artificial General Intelligence. As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacities exceeding 50 billion parameters. This technical report builds on our prior work with Tele-FLM (also known as FLM-2), a publicly available 52-billion-parameter model. We delve into two primary areas: we first discuss our observation of Supervised Fine-tuning (SFT) on Tele-FLM-52B, which supports the "less is more" approach for SFT data construction; second, we demonstrate our experiments and analyses on the best practices for progressively growing a model from 52 billion to 102 billion, and subsequently to 1 trillion parameters. We will open-source a 1T model checkpoint, namely Tele-FLM-1T, to advance further training and research.
title	52B to 1T: Lessons Learned via Tele-FLM Series
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2407.02783

Documents similaires