Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.23252 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866918391573381120 |
|---|---|
| author | Anantha, Raviteja Hor, Soheil Antoniu, Teodor Nicola Price, Layne C. |
| author_facet | Anantha, Raviteja Hor, Soheil Antoniu, Teodor Nicola Price, Layne C. |
| contents | We present NanoFlux, a novel adversarial framework for generating targeted training data to improve LLM reasoning, where adversarially-generated datasets containing fewer than 200 examples outperform conventional fine-tuning approaches. The framework employs a competitive dynamic between models alternating as Attacker and Defender, supervised by a tool-augmented Judge, synthesizing multi-step questions with explanatory annotations that target specific reasoning capabilities. Fine-tuning a 4B-parameter model on NanoFlux-generated data yields performance gains across diverse domains compared to full-benchmark fine-tuning: +5.9% on mathematical reasoning (GSMHard), +3.6% on scientific reasoning (GenomeBench), and +16.6% on medical reasoning (MultiMedQA), while reducing computational requirements by 3-14x. Ablation studies reveal a non-monotonic relationship between dataset characteristics and model performance, uncovering domain-specific optimal points for question complexity and reasoning quality. NanoFlux automates training data generation through embedding-based novelty filtering, tool-augmented evaluation, and multi-hop reasoning, suggesting that future model improvements may lie in the intelligent synthesis of small, precisely targeted training datasets. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2509_23252 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | NanoFlux: Adversarial Dual-LLM Evaluation and Distillation For Multi-Domain Reasoning Anantha, Raviteja Hor, Soheil Antoniu, Teodor Nicola Price, Layne C. Machine Learning We present NanoFlux, a novel adversarial framework for generating targeted training data to improve LLM reasoning, where adversarially-generated datasets containing fewer than 200 examples outperform conventional fine-tuning approaches. The framework employs a competitive dynamic between models alternating as Attacker and Defender, supervised by a tool-augmented Judge, synthesizing multi-step questions with explanatory annotations that target specific reasoning capabilities. Fine-tuning a 4B-parameter model on NanoFlux-generated data yields performance gains across diverse domains compared to full-benchmark fine-tuning: +5.9% on mathematical reasoning (GSMHard), +3.6% on scientific reasoning (GenomeBench), and +16.6% on medical reasoning (MultiMedQA), while reducing computational requirements by 3-14x. Ablation studies reveal a non-monotonic relationship between dataset characteristics and model performance, uncovering domain-specific optimal points for question complexity and reasoning quality. NanoFlux automates training data generation through embedding-based novelty filtering, tool-augmented evaluation, and multi-hop reasoning, suggesting that future model improvements may lie in the intelligent synthesis of small, precisely targeted training datasets. |
| title | NanoFlux: Adversarial Dual-LLM Evaluation and Distillation For Multi-Domain Reasoning |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2509.23252 |