Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Zijian, Huang, Tiancheng, Li, Hanqi, Ma, Da, Chen, Lu, Yu, Kai
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2601.12988
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911384694947840
author	Wang, Zijian Huang, Tiancheng Li, Hanqi Ma, Da Chen, Lu Yu, Kai
author_facet	Wang, Zijian Huang, Tiancheng Li, Hanqi Ma, Da Chen, Lu Yu, Kai
contents	The accelerating growth of the scientific literature makes it increasingly difficult for researchers to track new advances through manual reading alone. Recent progress in large language models (LLMs) has therefore spurred interest in autonomous agents that can read scientific papers and extract task-relevant information. However, most existing approaches rely either on heavily engineered prompting or on a conventional SFT-RL training pipeline, both of which often lead to excessive and low-yield exploration. Drawing inspiration from cognitive science, we propose PaperCompass, a framework that mitigates these issues by separating high-level planning from fine-grained execution. PaperCompass first drafts an explicit plan that outlines the intended sequence of actions, and then performs detailed reasoning to instantiate each step by selecting the parameters for the corresponding function calls. To train such behavior, we introduce Draft-and-Follow Policy Optimization (DFPO), a tailored RL method that jointly optimizes both the draft plan and the final solution. DFPO can be viewed as a lightweight form of hierarchical reinforcement learning, aimed at narrowing the `knowing-doing' gap in LLMs. We provide a theoretical analysis that establishes DFPO's favorable optimization properties, supporting a stable and reliable training process. Experiments on paper-based question answering (Paper-QA) benchmarks show that PaperCompass improves efficiency over strong baselines without sacrificing performance, achieving results comparable to much larger models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_12988
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	PaperGuide: Making Small Language-Model Paper-Reading Agents More Efficient Wang, Zijian Huang, Tiancheng Li, Hanqi Ma, Da Chen, Lu Yu, Kai Machine Learning The accelerating growth of the scientific literature makes it increasingly difficult for researchers to track new advances through manual reading alone. Recent progress in large language models (LLMs) has therefore spurred interest in autonomous agents that can read scientific papers and extract task-relevant information. However, most existing approaches rely either on heavily engineered prompting or on a conventional SFT-RL training pipeline, both of which often lead to excessive and low-yield exploration. Drawing inspiration from cognitive science, we propose PaperCompass, a framework that mitigates these issues by separating high-level planning from fine-grained execution. PaperCompass first drafts an explicit plan that outlines the intended sequence of actions, and then performs detailed reasoning to instantiate each step by selecting the parameters for the corresponding function calls. To train such behavior, we introduce Draft-and-Follow Policy Optimization (DFPO), a tailored RL method that jointly optimizes both the draft plan and the final solution. DFPO can be viewed as a lightweight form of hierarchical reinforcement learning, aimed at narrowing the `knowing-doing' gap in LLMs. We provide a theoretical analysis that establishes DFPO's favorable optimization properties, supporting a stable and reliable training process. Experiments on paper-based question answering (Paper-QA) benchmarks show that PaperCompass improves efficiency over strong baselines without sacrificing performance, achieving results comparable to much larger models.
title	PaperGuide: Making Small Language-Model Paper-Reading Agents More Efficient
topic	Machine Learning
url	https://arxiv.org/abs/2601.12988

Similar Items