Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kang, Lei, Fu, Xuanshuo, Souibgui, Mohamed Ali, Barsky, Andrey, Gomez, Lluis, Vazquez-Corral, Javier, Fornés, Alicia, Valveny, Ernest, Karatzas, Dimosthenis
Format:	Recurso digital
Language:	English
Published:	Zenodo 2025
Subjects:	Pixel Language Models visual symbolic reasoning GridMNIST-Sudoku benchmark structured spatial prior Explainable AI
Online Access:	https://doi.org/10.3390/math13172851
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866901480246607872
author	Kang, Lei Fu, Xuanshuo Souibgui, Mohamed Ali Barsky, Andrey Gomez, Lluis Vazquez-Corral, Javier Fornés, Alicia Valveny, Ernest Karatzas, Dimosthenis
author_facet	Kang, Lei Fu, Xuanshuo Souibgui, Mohamed Ali Barsky, Andrey Gomez, Lluis Vazquez-Corral, Javier Fornés, Alicia Valveny, Ernest Karatzas, Dimosthenis
contents	<p>Grid structured visual data such as forms, tables, and game boards require models that pair pixel level perception with symbolic consistency under global constraints. Recent Pixel Language Models (PLMs) map images to token sequences with promising flexibility, yet we find they generalize poorly when observable evidence becomes sparse or corrupted. We present GridMNIST-Sudoku, a benchmark that renders large numbers of Sudoku instances with style diverse handwritten digits and provides parameterized stress tracks for two tasks: Completion (predict missing cells) and Correction (detect and repair incorrect cells) across difficulty levels ranging from 1 to 90 altered positions in a 9 × 9 grid. Attention diagnostics on PLMs trained with conventional one dimensional positional encodings reveal weak structure awareness outside the natural Sudoku sparsity band. Motivated by these findings, we propose a lightweight Row-Column-Box (RCB) positional prior that injects grid aligned coordinates and combine it with simple sparsity and corruption augmentations. Trained only on the natural distribution, the resulting model substantially improves out of distribution accuracy across wide sparsity and corruption ranges while maintaining strong in distribution performance.</p> <p>More details can be found in https://www.mdpi.com/2227-7390/13/17/2851.</p>
format	Recurso digital
id	zenodo_https___doi_org_10_3390_math13172851
institution	Zenodo
language	eng
publishDate	2025
publisher	Zenodo
record_format	zenodo
spellingShingle	A Benchmark for Symbolic Reasoning from Pixel Sequences: Grid-Level Visual Completion and Correction Kang, Lei Fu, Xuanshuo Souibgui, Mohamed Ali Barsky, Andrey Gomez, Lluis Vazquez-Corral, Javier Fornés, Alicia Valveny, Ernest Karatzas, Dimosthenis Pixel Language Models visual symbolic reasoning GridMNIST-Sudoku benchmark structured spatial prior Explainable AI <p>Grid structured visual data such as forms, tables, and game boards require models that pair pixel level perception with symbolic consistency under global constraints. Recent Pixel Language Models (PLMs) map images to token sequences with promising flexibility, yet we find they generalize poorly when observable evidence becomes sparse or corrupted. We present GridMNIST-Sudoku, a benchmark that renders large numbers of Sudoku instances with style diverse handwritten digits and provides parameterized stress tracks for two tasks: Completion (predict missing cells) and Correction (detect and repair incorrect cells) across difficulty levels ranging from 1 to 90 altered positions in a 9 × 9 grid. Attention diagnostics on PLMs trained with conventional one dimensional positional encodings reveal weak structure awareness outside the natural Sudoku sparsity band. Motivated by these findings, we propose a lightweight Row-Column-Box (RCB) positional prior that injects grid aligned coordinates and combine it with simple sparsity and corruption augmentations. Trained only on the natural distribution, the resulting model substantially improves out of distribution accuracy across wide sparsity and corruption ranges while maintaining strong in distribution performance.</p> <p>More details can be found in https://www.mdpi.com/2227-7390/13/17/2851.</p>
title	A Benchmark for Symbolic Reasoning from Pixel Sequences: Grid-Level Visual Completion and Correction
topic	Pixel Language Models visual symbolic reasoning GridMNIST-Sudoku benchmark structured spatial prior Explainable AI
url	https://doi.org/10.3390/math13172851

Similar Items