Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Tian, Junfeng, Wang, Rui, Li, Cong, Zhou, Yudong, Liu, Jun, Wang, Jun
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2404.15702
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914768493740032
author	Tian, Junfeng Wang, Rui Li, Cong Zhou, Yudong Liu, Jun Wang, Jun
author_facet	Tian, Junfeng Wang, Rui Li, Cong Zhou, Yudong Liu, Jun Wang, Jun
contents	This report details the development and key achievements of our latest language model designed for custom large language models. The advancements introduced include a novel Online Data Scheduler that supports flexible training data adjustments and curriculum learning. The model's architecture is fortified with state-of-the-art techniques such as Rotary Positional Embeddings, QK-LayerNorm, and a specially crafted multilingual tokenizer to enhance stability and performance. Moreover, our robust training framework incorporates advanced monitoring and rapid recovery features to ensure optimal efficiency. Our Wonton 7B model has demonstrated competitive performance on a range of multilingual and English benchmarks. Future developments will prioritize narrowing the performance gap with more extensively trained models, thereby enhancing the model's real-world efficacy and adaptability.GitHub: \url{https://github.com/nyonicai/nyonic-public}
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_15702
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Nyonic Technical Report Tian, Junfeng Wang, Rui Li, Cong Zhou, Yudong Liu, Jun Wang, Jun Computation and Language This report details the development and key achievements of our latest language model designed for custom large language models. The advancements introduced include a novel Online Data Scheduler that supports flexible training data adjustments and curriculum learning. The model's architecture is fortified with state-of-the-art techniques such as Rotary Positional Embeddings, QK-LayerNorm, and a specially crafted multilingual tokenizer to enhance stability and performance. Moreover, our robust training framework incorporates advanced monitoring and rapid recovery features to ensure optimal efficiency. Our Wonton 7B model has demonstrated competitive performance on a range of multilingual and English benchmarks. Future developments will prioritize narrowing the performance gap with more extensively trained models, thereby enhancing the model's real-world efficacy and adaptability.GitHub: \url{https://github.com/nyonicai/nyonic-public}
title	Nyonic Technical Report
topic	Computation and Language
url	https://arxiv.org/abs/2404.15702

Similar Items