Saved in:
Bibliographic Details
Main Authors: Kang, Lei, Fu, Xuanshuo, Souibgui, Mohamed Ali, Barsky, Andrey, Gomez, Lluis, Vazquez-Corral, Javier, Fornés, Alicia, Valveny, Ernest, Karatzas, Dimosthenis
Format: Recurso digital
Language:English
Published: Zenodo 2025
Subjects:
Online Access:https://doi.org/10.3390/math13172851
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866901480246607872
author Kang, Lei
Fu, Xuanshuo
Souibgui, Mohamed Ali
Barsky, Andrey
Gomez, Lluis
Vazquez-Corral, Javier
Fornés, Alicia
Valveny, Ernest
Karatzas, Dimosthenis
author_facet Kang, Lei
Fu, Xuanshuo
Souibgui, Mohamed Ali
Barsky, Andrey
Gomez, Lluis
Vazquez-Corral, Javier
Fornés, Alicia
Valveny, Ernest
Karatzas, Dimosthenis
contents <p>Grid structured visual data such as forms, tables, and game boards require models that pair pixel level perception with symbolic consistency under global constraints. Recent Pixel Language Models (PLMs) map images to token sequences with promising flexibility, yet we find they generalize poorly when observable evidence becomes sparse or corrupted. We present GridMNIST-Sudoku, a benchmark that renders large numbers of Sudoku instances with style diverse handwritten digits and provides parameterized stress tracks for two tasks: Completion (predict missing cells) and Correction (detect and repair incorrect cells) across difficulty levels ranging from 1 to 90 altered positions in a 9 × 9 grid. Attention diagnostics on PLMs trained with conventional one dimensional positional encodings reveal weak structure awareness outside the natural Sudoku sparsity band. Motivated by these findings, we propose a lightweight Row-Column-Box (RCB) positional prior that injects grid aligned coordinates and combine it with simple sparsity and corruption augmentations. Trained only on the natural distribution, the resulting model substantially improves out of distribution accuracy across wide sparsity and corruption ranges while maintaining strong in distribution performance.</p> <p>More details can be found in https://www.mdpi.com/2227-7390/13/17/2851.</p>
format Recurso digital
id zenodo_https___doi_org_10_3390_math13172851
institution Zenodo
language eng
publishDate 2025
publisher Zenodo
record_format zenodo
spellingShingle A Benchmark for Symbolic Reasoning from Pixel Sequences: Grid-Level Visual Completion and Correction
Kang, Lei
Fu, Xuanshuo
Souibgui, Mohamed Ali
Barsky, Andrey
Gomez, Lluis
Vazquez-Corral, Javier
Fornés, Alicia
Valveny, Ernest
Karatzas, Dimosthenis
Pixel Language Models
visual symbolic reasoning
GridMNIST-Sudoku benchmark
structured spatial prior
Explainable AI
<p>Grid structured visual data such as forms, tables, and game boards require models that pair pixel level perception with symbolic consistency under global constraints. Recent Pixel Language Models (PLMs) map images to token sequences with promising flexibility, yet we find they generalize poorly when observable evidence becomes sparse or corrupted. We present GridMNIST-Sudoku, a benchmark that renders large numbers of Sudoku instances with style diverse handwritten digits and provides parameterized stress tracks for two tasks: Completion (predict missing cells) and Correction (detect and repair incorrect cells) across difficulty levels ranging from 1 to 90 altered positions in a 9 × 9 grid. Attention diagnostics on PLMs trained with conventional one dimensional positional encodings reveal weak structure awareness outside the natural Sudoku sparsity band. Motivated by these findings, we propose a lightweight Row-Column-Box (RCB) positional prior that injects grid aligned coordinates and combine it with simple sparsity and corruption augmentations. Trained only on the natural distribution, the resulting model substantially improves out of distribution accuracy across wide sparsity and corruption ranges while maintaining strong in distribution performance.</p> <p>More details can be found in https://www.mdpi.com/2227-7390/13/17/2851.</p>
title A Benchmark for Symbolic Reasoning from Pixel Sequences: Grid-Level Visual Completion and Correction
topic Pixel Language Models
visual symbolic reasoning
GridMNIST-Sudoku benchmark
structured spatial prior
Explainable AI
url https://doi.org/10.3390/math13172851