Службовий вигляд:

Збережено в:

Бібліографічні деталі
Автори:	Erata, Ferhat, Kouteili, Sam, Typaldos, Thanos, Antonopoulos, Timos, Jones, Robert B., Cook, Byron, Piskac, Ruzica
Формат:	Preprint
Опубліковано:	2026
Предмети:	Machine Learning Artificial Intelligence Logic in Computer Science I.2.6; I.2.8; F.4.1
Онлайн доступ:	https://arxiv.org/abs/2605.16632
Теги:	Додати тег Немає тегів, Будьте першим, хто поставить тег для цього запису!

_version_	1866914571420172288
author	Erata, Ferhat Kouteili, Sam Typaldos, Thanos Antonopoulos, Timos Jones, Robert B. Cook, Byron Piskac, Ruzica
author_facet	Erata, Ferhat Kouteili, Sam Typaldos, Thanos Antonopoulos, Timos Jones, Robert B. Cook, Byron Piskac, Ruzica
contents	Despite the effectiveness of Cube-and-Conquer (C&C) for solving challenging Boolean Satisfiability (SAT) problems, no prior work has shown that transformer-based models can learn effective cubing heuristics. We introduce a neuro-symbolic post-training framework for this task. We design an MCTS-based data curation pipeline that uses symbolic heuristics to explore splitting decisions over SAT competition formulas, producing preference data grounded in solver statistics and augmented with reasoning traces from a teacher model. Our two-stage post-training, supervised fine-tuning (SFT) followed by direct preference optimization (DPO), enables a 4B-parameter model to achieve a pass@5 score of 53 on 100 SAT competition benchmarks, surpassing frontier LLMs such as Claude-Sonnet-4 (50) and matching the best symbolic heuristic (53). Ablations show that SFT alone improves pass@5 from 46 to 51, with DPO adding 2 additional benchmarks; an entropy/agreement ablation on realized first-cube decisions further shows that SFT, not DPO, accounts for the root-level decision diversity that produces complementary per-run coverage over deterministic symbolic methods. This demonstrates that transformers can be trained to make effective cubing decisions in a domain traditionally dominated by symbolic methods.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_16632
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Learning How to Cube Erata, Ferhat Kouteili, Sam Typaldos, Thanos Antonopoulos, Timos Jones, Robert B. Cook, Byron Piskac, Ruzica Machine Learning Artificial Intelligence Logic in Computer Science I.2.6; I.2.8; F.4.1 Despite the effectiveness of Cube-and-Conquer (C&C) for solving challenging Boolean Satisfiability (SAT) problems, no prior work has shown that transformer-based models can learn effective cubing heuristics. We introduce a neuro-symbolic post-training framework for this task. We design an MCTS-based data curation pipeline that uses symbolic heuristics to explore splitting decisions over SAT competition formulas, producing preference data grounded in solver statistics and augmented with reasoning traces from a teacher model. Our two-stage post-training, supervised fine-tuning (SFT) followed by direct preference optimization (DPO), enables a 4B-parameter model to achieve a pass@5 score of 53 on 100 SAT competition benchmarks, surpassing frontier LLMs such as Claude-Sonnet-4 (50) and matching the best symbolic heuristic (53). Ablations show that SFT alone improves pass@5 from 46 to 51, with DPO adding 2 additional benchmarks; an entropy/agreement ablation on realized first-cube decisions further shows that SFT, not DPO, accounts for the root-level decision diversity that produces complementary per-run coverage over deterministic symbolic methods. This demonstrates that transformers can be trained to make effective cubing decisions in a domain traditionally dominated by symbolic methods.
title	Learning How to Cube
topic	Machine Learning Artificial Intelligence Logic in Computer Science I.2.6; I.2.8; F.4.1
url	https://arxiv.org/abs/2605.16632

Схожі ресурси