Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.16819 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866910345016115200 |
|---|---|
| author | Parmar, Jupinder Prabhumoye, Shrimai Jennings, Joseph Patwary, Mostofa Subramanian, Sandeep Su, Dan Zhu, Chen Narayanan, Deepak Jhunjhunwala, Aastha Dattagupta, Ayush Jawa, Vibhu Liu, Jiwei Mahabaleshwarkar, Ameya Nitski, Osvald Brundyn, Annika Maki, James Martinez, Miguel You, Jiaxuan Kamalu, John LeGresley, Patrick Fridman, Denys Casper, Jared Aithal, Ashwath Kuchaiev, Oleksii Shoeybi, Mohammad Cohen, Jonathan Catanzaro, Bryan |
| author_facet | Parmar, Jupinder Prabhumoye, Shrimai Jennings, Joseph Patwary, Mostofa Subramanian, Sandeep Su, Dan Zhu, Chen Narayanan, Deepak Jhunjhunwala, Aastha Dattagupta, Ayush Jawa, Vibhu Liu, Jiwei Mahabaleshwarkar, Ameya Nitski, Osvald Brundyn, Annika Maki, James Martinez, Miguel You, Jiaxuan Kamalu, John LeGresley, Patrick Fridman, Denys Casper, Jared Aithal, Ashwath Kuchaiev, Oleksii Shoeybi, Mohammad Cohen, Jonathan Catanzaro, Bryan |
| contents | We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2402_16819 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Nemotron-4 15B Technical Report Parmar, Jupinder Prabhumoye, Shrimai Jennings, Joseph Patwary, Mostofa Subramanian, Sandeep Su, Dan Zhu, Chen Narayanan, Deepak Jhunjhunwala, Aastha Dattagupta, Ayush Jawa, Vibhu Liu, Jiwei Mahabaleshwarkar, Ameya Nitski, Osvald Brundyn, Annika Maki, James Martinez, Miguel You, Jiaxuan Kamalu, John LeGresley, Patrick Fridman, Denys Casper, Jared Aithal, Ashwath Kuchaiev, Oleksii Shoeybi, Mohammad Cohen, Jonathan Catanzaro, Bryan Computation and Language Artificial Intelligence Machine Learning We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks. |
| title | Nemotron-4 15B Technical Report |
| topic | Computation and Language Artificial Intelligence Machine Learning |
| url | https://arxiv.org/abs/2402.16819 |