Збережено в:
Бібліографічні деталі
Автори: Erata, Ferhat, Kouteili, Sam, Typaldos, Thanos, Antonopoulos, Timos, Jones, Robert B., Cook, Byron, Piskac, Ruzica
Формат: Preprint
Опубліковано: 2026
Предмети:
Онлайн доступ:https://arxiv.org/abs/2605.16632
Теги: Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
_version_ 1866914571420172288
author Erata, Ferhat
Kouteili, Sam
Typaldos, Thanos
Antonopoulos, Timos
Jones, Robert B.
Cook, Byron
Piskac, Ruzica
author_facet Erata, Ferhat
Kouteili, Sam
Typaldos, Thanos
Antonopoulos, Timos
Jones, Robert B.
Cook, Byron
Piskac, Ruzica
contents Despite the effectiveness of Cube-and-Conquer (C&C) for solving challenging Boolean Satisfiability (SAT) problems, no prior work has shown that transformer-based models can learn effective cubing heuristics. We introduce a neuro-symbolic post-training framework for this task. We design an MCTS-based data curation pipeline that uses symbolic heuristics to explore splitting decisions over SAT competition formulas, producing preference data grounded in solver statistics and augmented with reasoning traces from a teacher model. Our two-stage post-training, supervised fine-tuning (SFT) followed by direct preference optimization (DPO), enables a 4B-parameter model to achieve a pass@5 score of 53 on 100 SAT competition benchmarks, surpassing frontier LLMs such as Claude-Sonnet-4 (50) and matching the best symbolic heuristic (53). Ablations show that SFT alone improves pass@5 from 46 to 51, with DPO adding 2 additional benchmarks; an entropy/agreement ablation on realized first-cube decisions further shows that SFT, not DPO, accounts for the root-level decision diversity that produces complementary per-run coverage over deterministic symbolic methods. This demonstrates that transformers can be trained to make effective cubing decisions in a domain traditionally dominated by symbolic methods.
format Preprint
id arxiv_https___arxiv_org_abs_2605_16632
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Learning How to Cube
Erata, Ferhat
Kouteili, Sam
Typaldos, Thanos
Antonopoulos, Timos
Jones, Robert B.
Cook, Byron
Piskac, Ruzica
Machine Learning
Artificial Intelligence
Logic in Computer Science
I.2.6; I.2.8; F.4.1
Despite the effectiveness of Cube-and-Conquer (C&C) for solving challenging Boolean Satisfiability (SAT) problems, no prior work has shown that transformer-based models can learn effective cubing heuristics. We introduce a neuro-symbolic post-training framework for this task. We design an MCTS-based data curation pipeline that uses symbolic heuristics to explore splitting decisions over SAT competition formulas, producing preference data grounded in solver statistics and augmented with reasoning traces from a teacher model. Our two-stage post-training, supervised fine-tuning (SFT) followed by direct preference optimization (DPO), enables a 4B-parameter model to achieve a pass@5 score of 53 on 100 SAT competition benchmarks, surpassing frontier LLMs such as Claude-Sonnet-4 (50) and matching the best symbolic heuristic (53). Ablations show that SFT alone improves pass@5 from 46 to 51, with DPO adding 2 additional benchmarks; an entropy/agreement ablation on realized first-cube decisions further shows that SFT, not DPO, accounts for the root-level decision diversity that produces complementary per-run coverage over deterministic symbolic methods. This demonstrates that transformers can be trained to make effective cubing decisions in a domain traditionally dominated by symbolic methods.
title Learning How to Cube
topic Machine Learning
Artificial Intelligence
Logic in Computer Science
I.2.6; I.2.8; F.4.1
url https://arxiv.org/abs/2605.16632