Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Recurso digital |
| Language: | English |
| Published: |
Zenodo
2025
|
| Subjects: | |
| Online Access: | https://doi.org/10.3390/math13172851 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866901480246607872 |
|---|---|
| author | Kang, Lei Fu, Xuanshuo Souibgui, Mohamed Ali Barsky, Andrey Gomez, Lluis Vazquez-Corral, Javier Fornés, Alicia Valveny, Ernest Karatzas, Dimosthenis |
| author_facet | Kang, Lei Fu, Xuanshuo Souibgui, Mohamed Ali Barsky, Andrey Gomez, Lluis Vazquez-Corral, Javier Fornés, Alicia Valveny, Ernest Karatzas, Dimosthenis |
| contents | <p>Grid structured visual data such as forms, tables, and game boards require models that pair pixel level perception with symbolic consistency under global constraints. Recent Pixel Language Models (PLMs) map images to token sequences with promising flexibility, yet we find they generalize poorly when observable evidence becomes sparse or corrupted. We present GridMNIST-Sudoku, a benchmark that renders large numbers of Sudoku instances with style diverse handwritten digits and provides parameterized stress tracks for two tasks: Completion (predict missing cells) and Correction (detect and repair incorrect cells) across difficulty levels ranging from 1 to 90 altered positions in a 9 × 9 grid. Attention diagnostics on PLMs trained with conventional one dimensional positional encodings reveal weak structure awareness outside the natural Sudoku sparsity band. Motivated by these findings, we propose a lightweight Row-Column-Box (RCB) positional prior that injects grid aligned coordinates and combine it with simple sparsity and corruption augmentations. Trained only on the natural distribution, the resulting model substantially improves out of distribution accuracy across wide sparsity and corruption ranges while maintaining strong in distribution performance.</p> <p>More details can be found in https://www.mdpi.com/2227-7390/13/17/2851.</p> |
| format | Recurso digital |
| id | zenodo_https___doi_org_10_3390_math13172851 |
| institution | Zenodo |
| language | eng |
| publishDate | 2025 |
| publisher | Zenodo |
| record_format | zenodo |
| spellingShingle | A Benchmark for Symbolic Reasoning from Pixel Sequences: Grid-Level Visual Completion and Correction Kang, Lei Fu, Xuanshuo Souibgui, Mohamed Ali Barsky, Andrey Gomez, Lluis Vazquez-Corral, Javier Fornés, Alicia Valveny, Ernest Karatzas, Dimosthenis Pixel Language Models visual symbolic reasoning GridMNIST-Sudoku benchmark structured spatial prior Explainable AI <p>Grid structured visual data such as forms, tables, and game boards require models that pair pixel level perception with symbolic consistency under global constraints. Recent Pixel Language Models (PLMs) map images to token sequences with promising flexibility, yet we find they generalize poorly when observable evidence becomes sparse or corrupted. We present GridMNIST-Sudoku, a benchmark that renders large numbers of Sudoku instances with style diverse handwritten digits and provides parameterized stress tracks for two tasks: Completion (predict missing cells) and Correction (detect and repair incorrect cells) across difficulty levels ranging from 1 to 90 altered positions in a 9 × 9 grid. Attention diagnostics on PLMs trained with conventional one dimensional positional encodings reveal weak structure awareness outside the natural Sudoku sparsity band. Motivated by these findings, we propose a lightweight Row-Column-Box (RCB) positional prior that injects grid aligned coordinates and combine it with simple sparsity and corruption augmentations. Trained only on the natural distribution, the resulting model substantially improves out of distribution accuracy across wide sparsity and corruption ranges while maintaining strong in distribution performance.</p> <p>More details can be found in https://www.mdpi.com/2227-7390/13/17/2851.</p> |
| title | A Benchmark for Symbolic Reasoning from Pixel Sequences: Grid-Level Visual Completion and Correction |
| topic | Pixel Language Models visual symbolic reasoning GridMNIST-Sudoku benchmark structured spatial prior Explainable AI |
| url | https://doi.org/10.3390/math13172851 |