Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kayzer, Noam, Revital, Dan, Joseph, Ori Bar, Arvatz, Smadar, Levi, Or, Geva, Tal, Shmidman, Shaltiel, Cohen, Amir DN, Ordan, Noam, Baruch, Omer, Zinkovskaia, Kate, Apini, Zevi, Weinberger, Sarel
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2605.11255
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909034226909184
author	Kayzer, Noam Revital, Dan Joseph, Ori Bar Arvatz, Smadar Levi, Or Geva, Tal Shmidman, Shaltiel Cohen, Amir DN Ordan, Noam Baruch, Omer Zinkovskaia, Kate Apini, Zevi Weinberger, Sarel
author_facet	Kayzer, Noam Revital, Dan Joseph, Ori Bar Arvatz, Smadar Levi, Or Geva, Tal Shmidman, Shaltiel Cohen, Amir DN Ordan, Noam Baruch, Omer Zinkovskaia, Kate Apini, Zevi Weinberger, Sarel
contents	We present Hebatron, a Hebrew-specialized open-weight large language model built on the NVIDIA Nemotron-3 sparse Mixture-of-Experts architecture. Training employs a three-phase easy-to-hard curriculum with continuous anti-forgetting anchoring, followed by supervised fine-tuning on 2 million bilingual Hebrew--English samples. The curriculum ordering alone yields a 3-point aggregate benchmark gain over the reversed configuration. Hebatron achieves a Hebrew reasoning average of 73.8\%, outperforming DictaLM-3.0-24B-Thinking (68.9\%) and remaining competitive with Gemma-3-27B-IT on GSM8K-HE and Israeli Trivia, while activating only 3B parameters per forward pass across a 30B-parameter model, delivering approximately 9 times higher inference throughput at native context lengths up to 65,536 tokens. To our knowledge, this is the first language-specific adaptation of the Nemotron-3 architecture for any target language, and the first open-weight Hebrew-specialized MoE model with native long-context support. Model weights are released openly to support further research in Hebrew and Semitic-language NLP.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_11255
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model Kayzer, Noam Revital, Dan Joseph, Ori Bar Arvatz, Smadar Levi, Or Geva, Tal Shmidman, Shaltiel Cohen, Amir DN Ordan, Noam Baruch, Omer Zinkovskaia, Kate Apini, Zevi Weinberger, Sarel Computation and Language We present Hebatron, a Hebrew-specialized open-weight large language model built on the NVIDIA Nemotron-3 sparse Mixture-of-Experts architecture. Training employs a three-phase easy-to-hard curriculum with continuous anti-forgetting anchoring, followed by supervised fine-tuning on 2 million bilingual Hebrew--English samples. The curriculum ordering alone yields a 3-point aggregate benchmark gain over the reversed configuration. Hebatron achieves a Hebrew reasoning average of 73.8\%, outperforming DictaLM-3.0-24B-Thinking (68.9\%) and remaining competitive with Gemma-3-27B-IT on GSM8K-HE and Israeli Trivia, while activating only 3B parameters per forward pass across a 30B-parameter model, delivering approximately 9 times higher inference throughput at native context lengths up to 65,536 tokens. To our knowledge, this is the first language-specific adaptation of the Nemotron-3 architecture for any target language, and the first open-weight Hebrew-specialized MoE model with native long-context support. Model weights are released openly to support further research in Hebrew and Semitic-language NLP.
title	HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model
topic	Computation and Language
url	https://arxiv.org/abs/2605.11255

Similar Items