Збережено в:
| Автори: | , , , , , , |
|---|---|
| Формат: | Preprint |
| Опубліковано: |
2026
|
| Предмети: | |
| Онлайн доступ: | https://arxiv.org/abs/2605.16632 |
| Теги: |
Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
|
| _version_ | 1866914571420172288 |
|---|---|
| author | Erata, Ferhat Kouteili, Sam Typaldos, Thanos Antonopoulos, Timos Jones, Robert B. Cook, Byron Piskac, Ruzica |
| author_facet | Erata, Ferhat Kouteili, Sam Typaldos, Thanos Antonopoulos, Timos Jones, Robert B. Cook, Byron Piskac, Ruzica |
| contents | Despite the effectiveness of Cube-and-Conquer (C&C) for solving challenging Boolean Satisfiability (SAT) problems, no prior work has shown that transformer-based models can learn effective cubing heuristics. We introduce a neuro-symbolic post-training framework for this task. We design an MCTS-based data curation pipeline that uses symbolic heuristics to explore splitting decisions over SAT competition formulas, producing preference data grounded in solver statistics and augmented with reasoning traces from a teacher model. Our two-stage post-training, supervised fine-tuning (SFT) followed by direct preference optimization (DPO), enables a 4B-parameter model to achieve a pass@5 score of 53 on 100 SAT competition benchmarks, surpassing frontier LLMs such as Claude-Sonnet-4 (50) and matching the best symbolic heuristic (53). Ablations show that SFT alone improves pass@5 from 46 to 51, with DPO adding 2 additional benchmarks; an entropy/agreement ablation on realized first-cube decisions further shows that SFT, not DPO, accounts for the root-level decision diversity that produces complementary per-run coverage over deterministic symbolic methods. This demonstrates that transformers can be trained to make effective cubing decisions in a domain traditionally dominated by symbolic methods. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2605_16632 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Learning How to Cube Erata, Ferhat Kouteili, Sam Typaldos, Thanos Antonopoulos, Timos Jones, Robert B. Cook, Byron Piskac, Ruzica Machine Learning Artificial Intelligence Logic in Computer Science I.2.6; I.2.8; F.4.1 Despite the effectiveness of Cube-and-Conquer (C&C) for solving challenging Boolean Satisfiability (SAT) problems, no prior work has shown that transformer-based models can learn effective cubing heuristics. We introduce a neuro-symbolic post-training framework for this task. We design an MCTS-based data curation pipeline that uses symbolic heuristics to explore splitting decisions over SAT competition formulas, producing preference data grounded in solver statistics and augmented with reasoning traces from a teacher model. Our two-stage post-training, supervised fine-tuning (SFT) followed by direct preference optimization (DPO), enables a 4B-parameter model to achieve a pass@5 score of 53 on 100 SAT competition benchmarks, surpassing frontier LLMs such as Claude-Sonnet-4 (50) and matching the best symbolic heuristic (53). Ablations show that SFT alone improves pass@5 from 46 to 51, with DPO adding 2 additional benchmarks; an entropy/agreement ablation on realized first-cube decisions further shows that SFT, not DPO, accounts for the root-level decision diversity that produces complementary per-run coverage over deterministic symbolic methods. This demonstrates that transformers can be trained to make effective cubing decisions in a domain traditionally dominated by symbolic methods. |
| title | Learning How to Cube |
| topic | Machine Learning Artificial Intelligence Logic in Computer Science I.2.6; I.2.8; F.4.1 |
| url | https://arxiv.org/abs/2605.16632 |