Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Pfeiffer, Pascal, Singer, Philipp, Babakhin, Yauhen, Fodor, Gabor, Dhankhar, Nischay, Ambati, Sri Satish
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computation and Language Machine Learning
Online-Zugang:	https://arxiv.org/abs/2407.09276
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866916320668286976
author	Pfeiffer, Pascal Singer, Philipp Babakhin, Yauhen Fodor, Gabor Dhankhar, Nischay Ambati, Sri Satish
author_facet	Pfeiffer, Pascal Singer, Philipp Babakhin, Yauhen Fodor, Gabor Dhankhar, Nischay Ambati, Sri Satish
contents	We present H2O-Danube3, a series of small language models consisting of H2O-Danube3-4B, trained on 6T tokens and H2O-Danube3-500M, trained on 4T tokens. Our models are pre-trained on high quality Web data consisting of primarily English tokens in three stages with different data mixes before final supervised tuning for chat version. The models exhibit highly competitive metrics across a multitude of academic, chat, and fine-tuning benchmarks. Thanks to its compact architecture, H2O-Danube3 can be efficiently run on a modern smartphone, enabling local inference and rapid processing capabilities even on mobile devices. We make all models openly available under Apache 2.0 license further democratizing LLMs to a wider audience economically.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_09276
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	H2O-Danube3 Technical Report Pfeiffer, Pascal Singer, Philipp Babakhin, Yauhen Fodor, Gabor Dhankhar, Nischay Ambati, Sri Satish Computation and Language Machine Learning We present H2O-Danube3, a series of small language models consisting of H2O-Danube3-4B, trained on 6T tokens and H2O-Danube3-500M, trained on 4T tokens. Our models are pre-trained on high quality Web data consisting of primarily English tokens in three stages with different data mixes before final supervised tuning for chat version. The models exhibit highly competitive metrics across a multitude of academic, chat, and fine-tuning benchmarks. Thanks to its compact architecture, H2O-Danube3 can be efficiently run on a modern smartphone, enabling local inference and rapid processing capabilities even on mobile devices. We make all models openly available under Apache 2.0 license further democratizing LLMs to a wider audience economically.
title	H2O-Danube3 Technical Report
topic	Computation and Language Machine Learning
url	https://arxiv.org/abs/2407.09276

Ähnliche Einträge