Saved in:
Bibliographic Details
Main Authors: Kayzer, Noam, Revital, Dan, Joseph, Ori Bar, Arvatz, Smadar, Levi, Or, Geva, Tal, Shmidman, Shaltiel, Cohen, Amir DN, Ordan, Noam, Baruch, Omer, Zinkovskaia, Kate, Apini, Zevi, Weinberger, Sarel
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.11255
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909034226909184
author Kayzer, Noam
Revital, Dan
Joseph, Ori Bar
Arvatz, Smadar
Levi, Or
Geva, Tal
Shmidman, Shaltiel
Cohen, Amir DN
Ordan, Noam
Baruch, Omer
Zinkovskaia, Kate
Apini, Zevi
Weinberger, Sarel
author_facet Kayzer, Noam
Revital, Dan
Joseph, Ori Bar
Arvatz, Smadar
Levi, Or
Geva, Tal
Shmidman, Shaltiel
Cohen, Amir DN
Ordan, Noam
Baruch, Omer
Zinkovskaia, Kate
Apini, Zevi
Weinberger, Sarel
contents We present Hebatron, a Hebrew-specialized open-weight large language model built on the NVIDIA Nemotron-3 sparse Mixture-of-Experts architecture. Training employs a three-phase easy-to-hard curriculum with continuous anti-forgetting anchoring, followed by supervised fine-tuning on 2 million bilingual Hebrew--English samples. The curriculum ordering alone yields a 3-point aggregate benchmark gain over the reversed configuration. Hebatron achieves a Hebrew reasoning average of 73.8\%, outperforming DictaLM-3.0-24B-Thinking (68.9\%) and remaining competitive with Gemma-3-27B-IT on GSM8K-HE and Israeli Trivia, while activating only 3B parameters per forward pass across a 30B-parameter model, delivering approximately 9 times higher inference throughput at native context lengths up to 65,536 tokens. To our knowledge, this is the first language-specific adaptation of the Nemotron-3 architecture for any target language, and the first open-weight Hebrew-specialized MoE model with native long-context support. Model weights are released openly to support further research in Hebrew and Semitic-language NLP.
format Preprint
id arxiv_https___arxiv_org_abs_2605_11255
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model
Kayzer, Noam
Revital, Dan
Joseph, Ori Bar
Arvatz, Smadar
Levi, Or
Geva, Tal
Shmidman, Shaltiel
Cohen, Amir DN
Ordan, Noam
Baruch, Omer
Zinkovskaia, Kate
Apini, Zevi
Weinberger, Sarel
Computation and Language
We present Hebatron, a Hebrew-specialized open-weight large language model built on the NVIDIA Nemotron-3 sparse Mixture-of-Experts architecture. Training employs a three-phase easy-to-hard curriculum with continuous anti-forgetting anchoring, followed by supervised fine-tuning on 2 million bilingual Hebrew--English samples. The curriculum ordering alone yields a 3-point aggregate benchmark gain over the reversed configuration. Hebatron achieves a Hebrew reasoning average of 73.8\%, outperforming DictaLM-3.0-24B-Thinking (68.9\%) and remaining competitive with Gemma-3-27B-IT on GSM8K-HE and Israeli Trivia, while activating only 3B parameters per forward pass across a 30B-parameter model, delivering approximately 9 times higher inference throughput at native context lengths up to 65,536 tokens. To our knowledge, this is the first language-specific adaptation of the Nemotron-3 architecture for any target language, and the first open-weight Hebrew-specialized MoE model with native long-context support. Model weights are released openly to support further research in Hebrew and Semitic-language NLP.
title HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model
topic Computation and Language
url https://arxiv.org/abs/2605.11255