Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Hu, Chen, Tai, Yintao, Vergari, Antonio, Keller, Frank, Suglia, Alessandro
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2604.11575
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910125477855232
author	Hu, Chen Tai, Yintao Vergari, Antonio Keller, Frank Suglia, Alessandro
author_facet	Hu, Chen Tai, Yintao Vergari, Antonio Keller, Frank Suglia, Alessandro
contents	Pixel-based language models are gaining momentum as alternatives to traditional token-based approaches, promising to circumvent tokenization challenges. However, the inherent perceptual diversity across languages poses a significant hurdle for multilingual generalization in pixel space. This paper introduces MIXAR, the first generative pixel-based language model trained on eight different languages utilizing a range of different scripts. We empirically evaluate MIXAR against previous pixel-based models as well as comparable tokenizer-based models, demonstrating substantial performance improvement on discriminative and generative multilingual tasks. Additionally, we show how MIXAR is robust to languages never seen during the training. These results are further strengthened when scaling the model to 0.5B parameters which not only improves its capabilities in generative tasks like LAMBADA but also its robustness when challenged with input perturbations such as orthographic attacks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_11575
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	MIXAR: Scaling Autoregressive Pixel-based Language Models to Multiple Languages and Scripts Hu, Chen Tai, Yintao Vergari, Antonio Keller, Frank Suglia, Alessandro Computation and Language Pixel-based language models are gaining momentum as alternatives to traditional token-based approaches, promising to circumvent tokenization challenges. However, the inherent perceptual diversity across languages poses a significant hurdle for multilingual generalization in pixel space. This paper introduces MIXAR, the first generative pixel-based language model trained on eight different languages utilizing a range of different scripts. We empirically evaluate MIXAR against previous pixel-based models as well as comparable tokenizer-based models, demonstrating substantial performance improvement on discriminative and generative multilingual tasks. Additionally, we show how MIXAR is robust to languages never seen during the training. These results are further strengthened when scaling the model to 0.5B parameters which not only improves its capabilities in generative tasks like LAMBADA but also its robustness when challenged with input perturbations such as orthographic attacks.
title	MIXAR: Scaling Autoregressive Pixel-based Language Models to Multiple Languages and Scripts
topic	Computation and Language
url	https://arxiv.org/abs/2604.11575

Similar Items