Saved in:
Bibliographic Details
Main Authors: Tian, Junfeng, Wang, Rui, Li, Cong, Zhou, Yudong, Liu, Jun, Wang, Jun
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2404.15702
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914768493740032
author Tian, Junfeng
Wang, Rui
Li, Cong
Zhou, Yudong
Liu, Jun
Wang, Jun
author_facet Tian, Junfeng
Wang, Rui
Li, Cong
Zhou, Yudong
Liu, Jun
Wang, Jun
contents This report details the development and key achievements of our latest language model designed for custom large language models. The advancements introduced include a novel Online Data Scheduler that supports flexible training data adjustments and curriculum learning. The model's architecture is fortified with state-of-the-art techniques such as Rotary Positional Embeddings, QK-LayerNorm, and a specially crafted multilingual tokenizer to enhance stability and performance. Moreover, our robust training framework incorporates advanced monitoring and rapid recovery features to ensure optimal efficiency. Our Wonton 7B model has demonstrated competitive performance on a range of multilingual and English benchmarks. Future developments will prioritize narrowing the performance gap with more extensively trained models, thereby enhancing the model's real-world efficacy and adaptability.GitHub: \url{https://github.com/nyonicai/nyonic-public}
format Preprint
id arxiv_https___arxiv_org_abs_2404_15702
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Nyonic Technical Report
Tian, Junfeng
Wang, Rui
Li, Cong
Zhou, Yudong
Liu, Jun
Wang, Jun
Computation and Language
This report details the development and key achievements of our latest language model designed for custom large language models. The advancements introduced include a novel Online Data Scheduler that supports flexible training data adjustments and curriculum learning. The model's architecture is fortified with state-of-the-art techniques such as Rotary Positional Embeddings, QK-LayerNorm, and a specially crafted multilingual tokenizer to enhance stability and performance. Moreover, our robust training framework incorporates advanced monitoring and rapid recovery features to ensure optimal efficiency. Our Wonton 7B model has demonstrated competitive performance on a range of multilingual and English benchmarks. Future developments will prioritize narrowing the performance gap with more extensively trained models, thereby enhancing the model's real-world efficacy and adaptability.GitHub: \url{https://github.com/nyonicai/nyonic-public}
title Nyonic Technical Report
topic Computation and Language
url https://arxiv.org/abs/2404.15702