Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Mishra, Kshitij, Aubakirov, Mirat, Takac, Martin, Lukas, Nils, Lahlou, Salem
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.21600
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917231446720512
author	Mishra, Kshitij Aubakirov, Mirat Takac, Martin Lukas, Nils Lahlou, Salem
author_facet	Mishra, Kshitij Aubakirov, Mirat Takac, Martin Lukas, Nils Lahlou, Salem
contents	Large language models exhibit complementary reasoning errors: on the same instance, one model may succeed with a particular decomposition while another fails. We propose Collaborative Reasoning (CORE), a training-time collaboration framework that converts peer success into a learning signal via a cross-teaching protocol. Each problem is solved in two stages: a cold round of independent sampling, followed by a contexted rescue round in which models that failed receive hint extracted from a successful peer. CORE optimizes a combined reward that balances (i) correctness, (ii) a lightweight DPP-inspired diversity term to reduce error overlap, and (iii) an explicit rescue bonus for successful recovery. We evaluate CORE across four standard reasoning datasets GSM8K, MATH, AIME, and GPQA. With only 1,000 training examples, a pair of small open source models (3B+4B) reaches Pass@2 of 99.54% on GSM8K and 92.08% on MATH, compared to 82.50% and 74.82% for single-model training. On harder datasets, the 3B+4B pair reaches Pass@2 of 77.34% on GPQA (trained on 348 examples) and 79.65% on AIME (trained on 792 examples), using a training-time budget of at most 1536 context tokens and 3072 generated tokens. Overall, these results show that training-time collaboration can reliably convert model complementarity into large gains without scaling model size.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_21600
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	CORE: Collaborative Reasoning via Cross Teaching Mishra, Kshitij Aubakirov, Mirat Takac, Martin Lukas, Nils Lahlou, Salem Artificial Intelligence Large language models exhibit complementary reasoning errors: on the same instance, one model may succeed with a particular decomposition while another fails. We propose Collaborative Reasoning (CORE), a training-time collaboration framework that converts peer success into a learning signal via a cross-teaching protocol. Each problem is solved in two stages: a cold round of independent sampling, followed by a contexted rescue round in which models that failed receive hint extracted from a successful peer. CORE optimizes a combined reward that balances (i) correctness, (ii) a lightweight DPP-inspired diversity term to reduce error overlap, and (iii) an explicit rescue bonus for successful recovery. We evaluate CORE across four standard reasoning datasets GSM8K, MATH, AIME, and GPQA. With only 1,000 training examples, a pair of small open source models (3B+4B) reaches Pass@2 of 99.54% on GSM8K and 92.08% on MATH, compared to 82.50% and 74.82% for single-model training. On harder datasets, the 3B+4B pair reaches Pass@2 of 77.34% on GPQA (trained on 348 examples) and 79.65% on AIME (trained on 792 examples), using a training-time budget of at most 1536 context tokens and 3072 generated tokens. Overall, these results show that training-time collaboration can reliably convert model complementarity into large gains without scaling model size.
title	CORE: Collaborative Reasoning via Cross Teaching
topic	Artificial Intelligence
url	https://arxiv.org/abs/2601.21600

Similar Items