Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Parmar, Jupinder, Prabhumoye, Shrimai, Jennings, Joseph, Patwary, Mostofa, Subramanian, Sandeep, Su, Dan, Zhu, Chen, Narayanan, Deepak, Jhunjhunwala, Aastha, Dattagupta, Ayush, Jawa, Vibhu, Liu, Jiwei, Mahabaleshwarkar, Ameya, Nitski, Osvald, Brundyn, Annika, Maki, James, Martinez, Miguel, You, Jiaxuan, Kamalu, John, LeGresley, Patrick, Fridman, Denys, Casper, Jared, Aithal, Ashwath, Kuchaiev, Oleksii, Shoeybi, Mohammad, Cohen, Jonathan, Catanzaro, Bryan
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2402.16819
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910345016115200
author	Parmar, Jupinder Prabhumoye, Shrimai Jennings, Joseph Patwary, Mostofa Subramanian, Sandeep Su, Dan Zhu, Chen Narayanan, Deepak Jhunjhunwala, Aastha Dattagupta, Ayush Jawa, Vibhu Liu, Jiwei Mahabaleshwarkar, Ameya Nitski, Osvald Brundyn, Annika Maki, James Martinez, Miguel You, Jiaxuan Kamalu, John LeGresley, Patrick Fridman, Denys Casper, Jared Aithal, Ashwath Kuchaiev, Oleksii Shoeybi, Mohammad Cohen, Jonathan Catanzaro, Bryan
author_facet	Parmar, Jupinder Prabhumoye, Shrimai Jennings, Joseph Patwary, Mostofa Subramanian, Sandeep Su, Dan Zhu, Chen Narayanan, Deepak Jhunjhunwala, Aastha Dattagupta, Ayush Jawa, Vibhu Liu, Jiwei Mahabaleshwarkar, Ameya Nitski, Osvald Brundyn, Annika Maki, James Martinez, Miguel You, Jiaxuan Kamalu, John LeGresley, Patrick Fridman, Denys Casper, Jared Aithal, Ashwath Kuchaiev, Oleksii Shoeybi, Mohammad Cohen, Jonathan Catanzaro, Bryan
contents	We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_16819
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Nemotron-4 15B Technical Report Parmar, Jupinder Prabhumoye, Shrimai Jennings, Joseph Patwary, Mostofa Subramanian, Sandeep Su, Dan Zhu, Chen Narayanan, Deepak Jhunjhunwala, Aastha Dattagupta, Ayush Jawa, Vibhu Liu, Jiwei Mahabaleshwarkar, Ameya Nitski, Osvald Brundyn, Annika Maki, James Martinez, Miguel You, Jiaxuan Kamalu, John LeGresley, Patrick Fridman, Denys Casper, Jared Aithal, Ashwath Kuchaiev, Oleksii Shoeybi, Mohammad Cohen, Jonathan Catanzaro, Bryan Computation and Language Artificial Intelligence Machine Learning We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks.
title	Nemotron-4 15B Technical Report
topic	Computation and Language Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2402.16819

Similar Items