_version_ 1866908557866172416
author Networks, Preferred
:
Chubachi, Kaizaburo
Fujita, Yasuhiro
Hemmi, Shinichi
Hirokawa, Yuta
Imajo, Kentaro
Kataoka, Toshiki
Kobayashi, Goro
Maehashi, Kenichi
Metzger, Calvin
Mikami, Hiroaki
Murai, Shogo
Nishino, Daisuke
Nozawa, Kento
Ogawa, Toru
Okada, Shintarou
Okanohara, Daisuke
Saito, Shunta
Sano, Shotaro
Suzuki, Shuji
Takahashi, Kuniyuki
Tanaka, Daisuke
Ummadisingu, Avinash
Wang, Hanqin
Wang, Sixue
Xu, Tianqi
author_facet Networks, Preferred
:
Chubachi, Kaizaburo
Fujita, Yasuhiro
Hemmi, Shinichi
Hirokawa, Yuta
Imajo, Kentaro
Kataoka, Toshiki
Kobayashi, Goro
Maehashi, Kenichi
Metzger, Calvin
Mikami, Hiroaki
Murai, Shogo
Nishino, Daisuke
Nozawa, Kento
Ogawa, Toru
Okada, Shintarou
Okanohara, Daisuke
Saito, Shunta
Sano, Shotaro
Suzuki, Shuji
Takahashi, Kuniyuki
Tanaka, Daisuke
Ummadisingu, Avinash
Wang, Hanqin
Wang, Sixue
Xu, Tianqi
contents In this report, we introduce PLaMo 2, a series of Japanese-focused large language models featuring a hybrid Samba-based architecture that transitions to full attention via continual pre-training to support 32K token contexts. Training leverages extensive synthetic corpora to overcome data scarcity, while computational efficiency is achieved through weight reuse and structured pruning. This efficient pruning methodology produces an 8B model that achieves performance comparable to our previous 100B model. Post-training further refines the models using a pipeline of supervised fine-tuning (SFT) and direct preference optimization (DPO), enhanced by synthetic Japanese instruction data and model merging techniques. Optimized for inference using vLLM and quantization with minimal accuracy loss, the PLaMo 2 models achieve state-of-the-art results on Japanese benchmarks, outperforming similarly-sized open models in instruction-following, language fluency, and Japanese-specific knowledge.
format Preprint
id arxiv_https___arxiv_org_abs_2509_04897
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle PLaMo 2 Technical Report
Networks, Preferred
:
Chubachi, Kaizaburo
Fujita, Yasuhiro
Hemmi, Shinichi
Hirokawa, Yuta
Imajo, Kentaro
Kataoka, Toshiki
Kobayashi, Goro
Maehashi, Kenichi
Metzger, Calvin
Mikami, Hiroaki
Murai, Shogo
Nishino, Daisuke
Nozawa, Kento
Ogawa, Toru
Okada, Shintarou
Okanohara, Daisuke
Saito, Shunta
Sano, Shotaro
Suzuki, Shuji
Takahashi, Kuniyuki
Tanaka, Daisuke
Ummadisingu, Avinash
Wang, Hanqin
Wang, Sixue
Xu, Tianqi
Computation and Language
Artificial Intelligence
Machine Learning
In this report, we introduce PLaMo 2, a series of Japanese-focused large language models featuring a hybrid Samba-based architecture that transitions to full attention via continual pre-training to support 32K token contexts. Training leverages extensive synthetic corpora to overcome data scarcity, while computational efficiency is achieved through weight reuse and structured pruning. This efficient pruning methodology produces an 8B model that achieves performance comparable to our previous 100B model. Post-training further refines the models using a pipeline of supervised fine-tuning (SFT) and direct preference optimization (DPO), enhanced by synthetic Japanese instruction data and model merging techniques. Optimized for inference using vLLM and quantization with minimal accuracy loss, the PLaMo 2 models achieve state-of-the-art results on Japanese benchmarks, outperforming similarly-sized open models in instruction-following, language fluency, and Japanese-specific knowledge.
title PLaMo 2 Technical Report
topic Computation and Language
Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2509.04897