_version_ 1866910345016115200
author Parmar, Jupinder
Prabhumoye, Shrimai
Jennings, Joseph
Patwary, Mostofa
Subramanian, Sandeep
Su, Dan
Zhu, Chen
Narayanan, Deepak
Jhunjhunwala, Aastha
Dattagupta, Ayush
Jawa, Vibhu
Liu, Jiwei
Mahabaleshwarkar, Ameya
Nitski, Osvald
Brundyn, Annika
Maki, James
Martinez, Miguel
You, Jiaxuan
Kamalu, John
LeGresley, Patrick
Fridman, Denys
Casper, Jared
Aithal, Ashwath
Kuchaiev, Oleksii
Shoeybi, Mohammad
Cohen, Jonathan
Catanzaro, Bryan
author_facet Parmar, Jupinder
Prabhumoye, Shrimai
Jennings, Joseph
Patwary, Mostofa
Subramanian, Sandeep
Su, Dan
Zhu, Chen
Narayanan, Deepak
Jhunjhunwala, Aastha
Dattagupta, Ayush
Jawa, Vibhu
Liu, Jiwei
Mahabaleshwarkar, Ameya
Nitski, Osvald
Brundyn, Annika
Maki, James
Martinez, Miguel
You, Jiaxuan
Kamalu, John
LeGresley, Patrick
Fridman, Denys
Casper, Jared
Aithal, Ashwath
Kuchaiev, Oleksii
Shoeybi, Mohammad
Cohen, Jonathan
Catanzaro, Bryan
contents We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks.
format Preprint
id arxiv_https___arxiv_org_abs_2402_16819
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Nemotron-4 15B Technical Report
Parmar, Jupinder
Prabhumoye, Shrimai
Jennings, Joseph
Patwary, Mostofa
Subramanian, Sandeep
Su, Dan
Zhu, Chen
Narayanan, Deepak
Jhunjhunwala, Aastha
Dattagupta, Ayush
Jawa, Vibhu
Liu, Jiwei
Mahabaleshwarkar, Ameya
Nitski, Osvald
Brundyn, Annika
Maki, James
Martinez, Miguel
You, Jiaxuan
Kamalu, John
LeGresley, Patrick
Fridman, Denys
Casper, Jared
Aithal, Ashwath
Kuchaiev, Oleksii
Shoeybi, Mohammad
Cohen, Jonathan
Catanzaro, Bryan
Computation and Language
Artificial Intelligence
Machine Learning
We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks.
title Nemotron-4 15B Technical Report
topic Computation and Language
Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2402.16819