Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Savdhariya, Maharshi
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Signal Processing E.4; B.4.1; I.2.6
Online Access:	https://arxiv.org/abs/2604.03336
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

BitNet b1.58 (Ma et al., 2024) demonstrates that large language models can operate entirely on ternary weights {-1, 0, +1}, yet no native binary wire format exists for such models. NativeTernary closes this gap. Benchmarked against GGUF on the real BitNet b1.58 2B4T architecture (24 layers, ~170 tensors, 2B parameters): NativeTernary encodes ternary weights at exactly 2.000 bits per weight -- 1.31x smaller than GGUF Q2_K and 4.0x smaller than GGUF int8 -- while reducing boundary and framing overhead by 460x (91 bytes vs ~42KB of GGUF tensor headers). Encode throughput: 47--69 MB/s. Decode throughput: 35--45 MB/s on commodity hardware. The decoder is a 10-line stateless state machine resilient to bitstream corruption.

Similar Items