Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.11704 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866913460340654080 |
|---|---|
| author | Nvidia : Adler, Bo Agarwal, Niket Aithal, Ashwath Anh, Dong H. Bhattacharya, Pallab Brundyn, Annika Casper, Jared Catanzaro, Bryan Clay, Sharon Cohen, Jonathan Das, Sirshak Dattagupta, Ayush Delalleau, Olivier Derczynski, Leon Dong, Yi Egert, Daniel Evans, Ellie Ficek, Aleksander Fridman, Denys Ghosh, Shaona Ginsburg, Boris Gitman, Igor Grzegorzek, Tomasz Hero, Robert Huang, Jining Jawa, Vibhu Jennings, Joseph Jhunjhunwala, Aastha Kamalu, John Khan, Sadaf Kuchaiev, Oleksii LeGresley, Patrick Li, Hui Liu, Jiwei Liu, Zihan Long, Eileen Mahabaleshwarkar, Ameya Sunil Majumdar, Somshubra Maki, James Martinez, Miguel de Melo, Maer Rodrigues Moshkov, Ivan Narayanan, Deepak Narenthiran, Sean Navarro, Jesus Nguyen, Phong Nitski, Osvald Noroozi, Vahid Nutheti, Guruprasad Parisien, Christopher Parmar, Jupinder Patwary, Mostofa Pawelec, Krzysztof Ping, Wei Prabhumoye, Shrimai Roy, Rajarshi Saar, Trisha Sabavat, Vasanth Rao Naik Satheesh, Sanjeev Scowcroft, Jane Polak Sewall, Jason Shamis, Pavel Shen, Gerald Shoeybi, Mohammad Sizer, Dave Smelyanskiy, Misha Soares, Felipe Sreedhar, Makesh Narsimhan Su, Dan Subramanian, Sandeep Sun, Shengyang Toshniwal, Shubham Wang, Hao Wang, Zhilin You, Jiaxuan Zeng, Jiaqi Zhang, Jimmy Zhang, Jing Zhang, Vivienne Zhang, Yian Zhu, Chen |
| author_facet | Nvidia : Adler, Bo Agarwal, Niket Aithal, Ashwath Anh, Dong H. Bhattacharya, Pallab Brundyn, Annika Casper, Jared Catanzaro, Bryan Clay, Sharon Cohen, Jonathan Das, Sirshak Dattagupta, Ayush Delalleau, Olivier Derczynski, Leon Dong, Yi Egert, Daniel Evans, Ellie Ficek, Aleksander Fridman, Denys Ghosh, Shaona Ginsburg, Boris Gitman, Igor Grzegorzek, Tomasz Hero, Robert Huang, Jining Jawa, Vibhu Jennings, Joseph Jhunjhunwala, Aastha Kamalu, John Khan, Sadaf Kuchaiev, Oleksii LeGresley, Patrick Li, Hui Liu, Jiwei Liu, Zihan Long, Eileen Mahabaleshwarkar, Ameya Sunil Majumdar, Somshubra Maki, James Martinez, Miguel de Melo, Maer Rodrigues Moshkov, Ivan Narayanan, Deepak Narenthiran, Sean Navarro, Jesus Nguyen, Phong Nitski, Osvald Noroozi, Vahid Nutheti, Guruprasad Parisien, Christopher Parmar, Jupinder Patwary, Mostofa Pawelec, Krzysztof Ping, Wei Prabhumoye, Shrimai Roy, Rajarshi Saar, Trisha Sabavat, Vasanth Rao Naik Satheesh, Sanjeev Scowcroft, Jane Polak Sewall, Jason Shamis, Pavel Shen, Gerald Shoeybi, Mohammad Sizer, Dave Smelyanskiy, Misha Soares, Felipe Sreedhar, Makesh Narsimhan Su, Dan Subramanian, Sandeep Sun, Shengyang Toshniwal, Shubham Wang, Hao Wang, Zhilin You, Jiaxuan Zeng, Jiaqi Zhang, Jimmy Zhang, Jing Zhang, Vivienne Zhang, Yian Zhu, Chen |
| contents | We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitate model development, we are also open-sourcing the synthetic data generation pipeline used in our model alignment process. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2406_11704 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Nemotron-4 340B Technical Report Nvidia : Adler, Bo Agarwal, Niket Aithal, Ashwath Anh, Dong H. Bhattacharya, Pallab Brundyn, Annika Casper, Jared Catanzaro, Bryan Clay, Sharon Cohen, Jonathan Das, Sirshak Dattagupta, Ayush Delalleau, Olivier Derczynski, Leon Dong, Yi Egert, Daniel Evans, Ellie Ficek, Aleksander Fridman, Denys Ghosh, Shaona Ginsburg, Boris Gitman, Igor Grzegorzek, Tomasz Hero, Robert Huang, Jining Jawa, Vibhu Jennings, Joseph Jhunjhunwala, Aastha Kamalu, John Khan, Sadaf Kuchaiev, Oleksii LeGresley, Patrick Li, Hui Liu, Jiwei Liu, Zihan Long, Eileen Mahabaleshwarkar, Ameya Sunil Majumdar, Somshubra Maki, James Martinez, Miguel de Melo, Maer Rodrigues Moshkov, Ivan Narayanan, Deepak Narenthiran, Sean Navarro, Jesus Nguyen, Phong Nitski, Osvald Noroozi, Vahid Nutheti, Guruprasad Parisien, Christopher Parmar, Jupinder Patwary, Mostofa Pawelec, Krzysztof Ping, Wei Prabhumoye, Shrimai Roy, Rajarshi Saar, Trisha Sabavat, Vasanth Rao Naik Satheesh, Sanjeev Scowcroft, Jane Polak Sewall, Jason Shamis, Pavel Shen, Gerald Shoeybi, Mohammad Sizer, Dave Smelyanskiy, Misha Soares, Felipe Sreedhar, Makesh Narsimhan Su, Dan Subramanian, Sandeep Sun, Shengyang Toshniwal, Shubham Wang, Hao Wang, Zhilin You, Jiaxuan Zeng, Jiaqi Zhang, Jimmy Zhang, Jing Zhang, Vivienne Zhang, Yian Zhu, Chen Computation and Language Artificial Intelligence Machine Learning We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitate model development, we are also open-sourcing the synthetic data generation pipeline used in our model alignment process. |
| title | Nemotron-4 340B Technical Report |
| topic | Computation and Language Artificial Intelligence Machine Learning |
| url | https://arxiv.org/abs/2406.11704 |