Saved in:
| Main Authors: | Lu, Rui, Hou, Zhenyu, Wang, Zihan, Zhang, Hanchen, Liu, Xiao, Li, Yujiang, Feng, Shi, Tang, Jie, Dong, Yuxiao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.10446 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search
by: Hou, Zhenyu, et al.
Published: (2025)
by: Hou, Zhenyu, et al.
Published: (2025)
AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework
by: Zhang, Hanchen, et al.
Published: (2025)
by: Zhang, Hanchen, et al.
Published: (2025)
DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning
by: Shi, Wenxuan, et al.
Published: (2025)
by: Shi, Wenxuan, et al.
Published: (2025)
T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
by: Hou, Zhenyu, et al.
Published: (2025)
by: Hou, Zhenyu, et al.
Published: (2025)
DeepDive: A deep dive into the physics of the first massive quiescent galaxies in the Universe
by: Ito, K., et al.
Published: (2025)
by: Ito, K., et al.
Published: (2025)
DeepDive: Simultaneous Formation of Massive Quiescent Galaxies in High-Redshift Galaxy Proto-clusters
by: Kakimoto, Takumi, et al.
Published: (2026)
by: Kakimoto, Takumi, et al.
Published: (2026)
GraphAlign: Pretraining One Graph Neural Network on Multiple Graphs via Feature Alignment
by: Hou, Zhenyu, et al.
Published: (2024)
by: Hou, Zhenyu, et al.
Published: (2024)
SWE-Dev: Building Software Engineering Agents with Training and Inference Scaling
by: Wang, Haoran, et al.
Published: (2025)
by: Wang, Haoran, et al.
Published: (2025)
ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
by: Lai, Hanyu, et al.
Published: (2025)
by: Lai, Hanyu, et al.
Published: (2025)
GraphTracer: Graph-Guided Failure Tracing in LLM Agents for Robust Multi-Turn Deep Search
by: Zhang, Heng, et al.
Published: (2025)
by: Zhang, Heng, et al.
Published: (2025)
DeepDive: Tracing the early quenching pathways of massive quiescent galaxies at $z>3$ from their star-formation histories and chemical abundances
by: Hamadouche, Massissilia L., et al.
Published: (2026)
by: Hamadouche, Massissilia L., et al.
Published: (2026)
DEEPMED: Building a Medical DeepResearch Agent via Multi-hop Med-Search Data and Turn-Controlled Agentic Training & Inference
by: Wang, Zihan, et al.
Published: (2026)
by: Wang, Zihan, et al.
Published: (2026)
MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents
by: Xu, Yifan, et al.
Published: (2025)
by: Xu, Yifan, et al.
Published: (2025)
TSR: Trajectory-Search Rollouts for Multi-Turn RL of LLM Agents
by: Djuhera, Aladin, et al.
Published: (2026)
by: Djuhera, Aladin, et al.
Published: (2026)
ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents
by: Zhang, Hao, et al.
Published: (2026)
by: Zhang, Hao, et al.
Published: (2026)
Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window
by: Tang, Qiaoyu, et al.
Published: (2025)
by: Tang, Qiaoyu, et al.
Published: (2025)
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers
by: Xin, Ran, et al.
Published: (2025)
by: Xin, Ran, et al.
Published: (2025)
SceneGenAgent: Precise Industrial Scene Generation with Coding Agent
by: Xia, Xiao, et al.
Published: (2024)
by: Xia, Xiao, et al.
Published: (2024)
Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction
by: Xu, Jun, et al.
Published: (2025)
by: Xu, Jun, et al.
Published: (2025)
MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning
by: Xia, Peng, et al.
Published: (2025)
by: Xia, Peng, et al.
Published: (2025)
Unsupervised Discovery of Steerable Factors When Graph Deep Generative Models Are Entangled
by: Liu, Shengchao, et al.
Published: (2024)
by: Liu, Shengchao, et al.
Published: (2024)
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
by: Li, Junbo, et al.
Published: (2025)
by: Li, Junbo, et al.
Published: (2025)
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
by: Liu, Zihe, et al.
Published: (2025)
by: Liu, Zihe, et al.
Published: (2025)
A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula
by: Sancaktar, Cansu, et al.
Published: (2026)
by: Sancaktar, Cansu, et al.
Published: (2026)
AutoWebGLM: A Large Language Model-based Web Navigating Agent
by: Lai, Hanyu, et al.
Published: (2024)
by: Lai, Hanyu, et al.
Published: (2024)
EICAP: Deep Dive in Assessment and Enhancement of Large Language Models in Emotional Intelligence through Multi-Turn Conversations
by: Nazar, Nizi, et al.
Published: (2025)
by: Nazar, Nizi, et al.
Published: (2025)
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
by: Xu, Yifan, et al.
Published: (2024)
by: Xu, Yifan, et al.
Published: (2024)
Extensive Self-Contrast Enables Feedback-Free Language Model Alignment
by: Liu, Xiao, et al.
Published: (2024)
by: Liu, Xiao, et al.
Published: (2024)
Turning Search into Knowledge Management.
by: Kaufman, David
Published: (2002)
by: Kaufman, David
Published: (2002)
Direct Multi-Turn Preference Optimization for Language Agents
by: Shi, Wentao, et al.
Published: (2024)
by: Shi, Wentao, et al.
Published: (2024)
Dynamic Deep Factor Graph for Multi-Agent Reinforcement Learning
by: Shi, Yuchen, et al.
Published: (2024)
by: Shi, Yuchen, et al.
Published: (2024)
Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents
by: Huang, Shijue, et al.
Published: (2026)
by: Huang, Shijue, et al.
Published: (2026)
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards
by: Zhang, Jiajie, et al.
Published: (2026)
by: Zhang, Jiajie, et al.
Published: (2026)
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
by: Zhou, Yifei, et al.
Published: (2025)
by: Zhou, Yifei, et al.
Published: (2025)
Learning on Graphs with Large Language Models(LLMs): A Deep Dive into Model Robustness
by: Guo, Kai, et al.
Published: (2024)
by: Guo, Kai, et al.
Published: (2024)
Expectation Confirmation Preference Optimization for Multi-Turn Conversational Recommendation Agent
by: Feng, Xueyang, et al.
Published: (2025)
by: Feng, Xueyang, et al.
Published: (2025)
KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph Enrichment
by: Lu, Yuxing, et al.
Published: (2025)
by: Lu, Yuxing, et al.
Published: (2025)
AndroidGen: Building an Android Language Agent under Data Scarcity
by: Lai, Hanyu, et al.
Published: (2025)
by: Lai, Hanyu, et al.
Published: (2025)
Consolidation via Policy Information Regularization in Deep RL for Multi-Agent Games
by: Malloy, Tailia, et al.
Published: (2020)
by: Malloy, Tailia, et al.
Published: (2020)
Does RLHF Scale? Exploring the Impacts From Data, Model, and Method
by: Hou, Zhenyu, et al.
Published: (2024)
by: Hou, Zhenyu, et al.
Published: (2024)
Similar Items
-
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search
by: Hou, Zhenyu, et al.
Published: (2025) -
AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework
by: Zhang, Hanchen, et al.
Published: (2025) -
DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning
by: Shi, Wenxuan, et al.
Published: (2025) -
T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
by: Hou, Zhenyu, et al.
Published: (2025) -
DeepDive: A deep dive into the physics of the first massive quiescent galaxies in the Universe
by: Ito, K., et al.
Published: (2025)