Saved in:
| Main Authors: | Yin, Ming, Qu, Yuanhao, Yang, Ling, Cong, Le, Wang, Mengdi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.19501 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Toward Better EHR Reasoning in LLMs: Reinforcement Learning with Expert Attention Guidance
by: Fang, Yue, et al.
Published: (2025)
by: Fang, Yue, et al.
Published: (2025)
On the Statistical Complexity for Offline and Low-Adaptive Reinforcement Learning with Structures
by: Yin, Ming, et al.
Published: (2025)
by: Yin, Ming, et al.
Published: (2025)
DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs
by: Li, Yuanhao, et al.
Published: (2025)
by: Li, Yuanhao, et al.
Published: (2025)
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
by: Yang, Ling, et al.
Published: (2025)
by: Yang, Ling, et al.
Published: (2025)
QuantumQA: Enhancing Scientific Reasoning via Physics-Consistent Dataset and Verification-Aware Reinforcement Learning
by: Qu, Songxin, et al.
Published: (2026)
by: Qu, Songxin, et al.
Published: (2026)
MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning
by: Yuan, Qianhao, et al.
Published: (2025)
by: Yuan, Qianhao, et al.
Published: (2025)
Selective Forgetting for Large Reasoning Models
by: Le, Tuan, et al.
Published: (2026)
by: Le, Tuan, et al.
Published: (2026)
CLARity: Reasoning Consistency Alone Can Teach Reinforced Experts
by: Lin, Jiuheng, et al.
Published: (2025)
by: Lin, Jiuheng, et al.
Published: (2025)
Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning? A Theoretical Perspective
by: Zhao, Lei, et al.
Published: (2023)
by: Zhao, Lei, et al.
Published: (2023)
TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning
by: Zhang, Junru, et al.
Published: (2025)
by: Zhang, Junru, et al.
Published: (2025)
DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning
by: Zhao, Hanyang, et al.
Published: (2025)
by: Zhao, Hanyang, et al.
Published: (2025)
Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning
by: Wang, Jiayu, et al.
Published: (2025)
by: Wang, Jiayu, et al.
Published: (2025)
KDRL: Post-Training Reasoning LLMs via Unified Knowledge Distillation and Reinforcement Learning
by: Xu, Hongling, et al.
Published: (2025)
by: Xu, Hongling, et al.
Published: (2025)
R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning
by: Li, Yuan, et al.
Published: (2025)
by: Li, Yuan, et al.
Published: (2025)
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
by: Chen, Mingyang, et al.
Published: (2025)
by: Chen, Mingyang, et al.
Published: (2025)
Grounding LLMs in Scientific Discovery via Embodied Actions
by: Zhang, Bo, et al.
Published: (2026)
by: Zhang, Bo, et al.
Published: (2026)
Learning Reasoning Rewards from Expert Demonstrations with Inverse Reinforcement Learning
by: Fanconi, Claudio, et al.
Published: (2025)
by: Fanconi, Claudio, et al.
Published: (2025)
STELLA: Self-Evolving LLM Agent for Biomedical Research
by: Jin, Ruofan, et al.
Published: (2025)
by: Jin, Ruofan, et al.
Published: (2025)
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
by: Singh, Joykirat, et al.
Published: (2025)
by: Singh, Joykirat, et al.
Published: (2025)
FLEx: Personalized Federated Learning for Mixture-of-Experts LLMs via Expert Grafting
by: Liu, Fan, et al.
Published: (2025)
by: Liu, Fan, et al.
Published: (2025)
Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference
by: Le-Cong, Thanh, et al.
Published: (2025)
by: Le-Cong, Thanh, et al.
Published: (2025)
Towards Open-Ended Emotional Support Conversations in LLMs via Reinforcement Learning with Future-Oriented Rewards
by: Yang, Ting, et al.
Published: (2025)
by: Yang, Ting, et al.
Published: (2025)
CRISPR-GPT for Agentic Automation of Gene-editing Experiments
by: Qu, Yuanhao, et al.
Published: (2024)
by: Qu, Yuanhao, et al.
Published: (2024)
AI Knowledge and Reasoning: Emulating Expert Creativity in Scientific Research
by: Mukherjee, Anirban, et al.
Published: (2024)
by: Mukherjee, Anirban, et al.
Published: (2024)
ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning
by: Xu, Wanghan, et al.
Published: (2026)
by: Xu, Wanghan, et al.
Published: (2026)
NumCoKE: Ordinal-Aware Numerical Reasoning over Knowledge Graphs with Mixture-of-Experts and Contrastive Learning
by: Yin, Ming, et al.
Published: (2024)
by: Yin, Ming, et al.
Published: (2024)
Towards User-level Private Reinforcement Learning with Human Feedback
by: Zhang, Jiaming, et al.
Published: (2025)
by: Zhang, Jiaming, et al.
Published: (2025)
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
by: Ling Team, et al.
Published: (2025)
by: Ling Team, et al.
Published: (2025)
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
by: Jin, Bowen, et al.
Published: (2025)
by: Jin, Bowen, et al.
Published: (2025)
Strat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Games
by: He, Yidong, et al.
Published: (2026)
by: He, Yidong, et al.
Published: (2026)
Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs
by: Jiang, Zishang, et al.
Published: (2025)
by: Jiang, Zishang, et al.
Published: (2025)
OpenClaw-RL: Train Any Agent Simply by Talking
by: Wang, Yinjie, et al.
Published: (2026)
by: Wang, Yinjie, et al.
Published: (2026)
An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing
by: Chai, Ziwei, et al.
Published: (2024)
by: Chai, Ziwei, et al.
Published: (2024)
PanelTR: Zero-Shot Table Reasoning Framework Through Multi-Agent Scientific Discussion
by: Ma, Yiran Rex
Published: (2025)
by: Ma, Yiran Rex
Published: (2025)
CaptchaMind: Training CAPTCHA Solvers via Reinforcement Learning with Explicit Reasoning Supervision
by: Wang, Pengcheng, et al.
Published: (2026)
by: Wang, Pengcheng, et al.
Published: (2026)
AQuA -- Combining Experts' and Non-Experts' Views To Assess Deliberation Quality in Online Discussions Using LLMs
by: Behrendt, Maike, et al.
Published: (2024)
by: Behrendt, Maike, et al.
Published: (2024)
Lost in Tokenization: Context as the Key to Unlocking Biomolecular Understanding in Scientific LLMs
by: Zhuang, Kai, et al.
Published: (2025)
by: Zhuang, Kai, et al.
Published: (2025)
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
by: Wang, Haozhe, et al.
Published: (2025)
by: Wang, Haozhe, et al.
Published: (2025)
G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning
by: Guo, Xiaojun, et al.
Published: (2025)
by: Guo, Xiaojun, et al.
Published: (2025)
Thinking-Based Non-Thinking: Solving the Reward Hacking Problem in Training Hybrid Reasoning Models via Reinforcement Learning
by: Gan, Siyuan, et al.
Published: (2026)
by: Gan, Siyuan, et al.
Published: (2026)
Similar Items
-
Toward Better EHR Reasoning in LLMs: Reinforcement Learning with Expert Attention Guidance
by: Fang, Yue, et al.
Published: (2025) -
On the Statistical Complexity for Offline and Low-Adaptive Reinforcement Learning with Structures
by: Yin, Ming, et al.
Published: (2025) -
DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs
by: Li, Yuanhao, et al.
Published: (2025) -
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
by: Yang, Ling, et al.
Published: (2025) -
QuantumQA: Enhancing Scientific Reasoning via Physics-Consistent Dataset and Verification-Aware Reinforcement Learning
by: Qu, Songxin, et al.
Published: (2026)