Saved in:
Bibliographic Details
Main Authors: Mishra, Kshitij, Aubakirov, Mirat, Takac, Martin, Lukas, Nils, Lahlou, Salem
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.21600
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917231446720512
author Mishra, Kshitij
Aubakirov, Mirat
Takac, Martin
Lukas, Nils
Lahlou, Salem
author_facet Mishra, Kshitij
Aubakirov, Mirat
Takac, Martin
Lukas, Nils
Lahlou, Salem
contents Large language models exhibit complementary reasoning errors: on the same instance, one model may succeed with a particular decomposition while another fails. We propose Collaborative Reasoning (CORE), a training-time collaboration framework that converts peer success into a learning signal via a cross-teaching protocol. Each problem is solved in two stages: a cold round of independent sampling, followed by a contexted rescue round in which models that failed receive hint extracted from a successful peer. CORE optimizes a combined reward that balances (i) correctness, (ii) a lightweight DPP-inspired diversity term to reduce error overlap, and (iii) an explicit rescue bonus for successful recovery. We evaluate CORE across four standard reasoning datasets GSM8K, MATH, AIME, and GPQA. With only 1,000 training examples, a pair of small open source models (3B+4B) reaches Pass@2 of 99.54% on GSM8K and 92.08% on MATH, compared to 82.50% and 74.82% for single-model training. On harder datasets, the 3B+4B pair reaches Pass@2 of 77.34% on GPQA (trained on 348 examples) and 79.65% on AIME (trained on 792 examples), using a training-time budget of at most 1536 context tokens and 3072 generated tokens. Overall, these results show that training-time collaboration can reliably convert model complementarity into large gains without scaling model size.
format Preprint
id arxiv_https___arxiv_org_abs_2601_21600
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle CORE: Collaborative Reasoning via Cross Teaching
Mishra, Kshitij
Aubakirov, Mirat
Takac, Martin
Lukas, Nils
Lahlou, Salem
Artificial Intelligence
Large language models exhibit complementary reasoning errors: on the same instance, one model may succeed with a particular decomposition while another fails. We propose Collaborative Reasoning (CORE), a training-time collaboration framework that converts peer success into a learning signal via a cross-teaching protocol. Each problem is solved in two stages: a cold round of independent sampling, followed by a contexted rescue round in which models that failed receive hint extracted from a successful peer. CORE optimizes a combined reward that balances (i) correctness, (ii) a lightweight DPP-inspired diversity term to reduce error overlap, and (iii) an explicit rescue bonus for successful recovery. We evaluate CORE across four standard reasoning datasets GSM8K, MATH, AIME, and GPQA. With only 1,000 training examples, a pair of small open source models (3B+4B) reaches Pass@2 of 99.54% on GSM8K and 92.08% on MATH, compared to 82.50% and 74.82% for single-model training. On harder datasets, the 3B+4B pair reaches Pass@2 of 77.34% on GPQA (trained on 348 examples) and 79.65% on AIME (trained on 792 examples), using a training-time budget of at most 1536 context tokens and 3072 generated tokens. Overall, these results show that training-time collaboration can reliably convert model complementarity into large gains without scaling model size.
title CORE: Collaborative Reasoning via Cross Teaching
topic Artificial Intelligence
url https://arxiv.org/abs/2601.21600