Saved in:
| Main Authors: | Chai, Jiajun, Yin, Guojun, Xu, Zekun, Yue, Chuhuai, Jia, Yi, Xia, Siyu, Wang, Xiaohan, Jiang, Jiwen, Li, Xiaoguang, Dong, Chengqi, He, Hang, Lin, Wei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.06980 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL
by: Xu, Zekun, et al.
Published: (2025)
by: Xu, Zekun, et al.
Published: (2025)
Training Multi-Image Vision Agents via End2End Reinforcement Learning
by: Dong, Chengqi, et al.
Published: (2025)
by: Dong, Chengqi, et al.
Published: (2025)
Promoting Efficient Reasoning with Verifiable Stepwise Reward
by: Yue, Chuhuai, et al.
Published: (2025)
by: Yue, Chuhuai, et al.
Published: (2025)
From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory
by: Xia, Siyu, et al.
Published: (2025)
by: Xia, Siyu, et al.
Published: (2025)
Joint Training of Multi-Token Prediction in Reinforcement Learning via Optimal Coefficient Calibration
by: Wang, Zili, et al.
Published: (2026)
by: Wang, Zili, et al.
Published: (2026)
ResT: Reshaping Token-Level Policy Gradients for Tool-Use Large Language Models
by: Lin, Zihan, et al.
Published: (2025)
by: Lin, Zihan, et al.
Published: (2025)
LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services
by: He, Hang, et al.
Published: (2025)
by: He, Hang, et al.
Published: (2025)
ZipRL: Adaptive Multi-Turn Context Compression with Hindsight Response Replay
by: Hu, Zhexin, et al.
Published: (2026)
by: Hu, Zhexin, et al.
Published: (2026)
AWPO: Enhancing Tool-Use of Large Language Models through Adaptive Integration of Reasoning Rewards
by: Lin, Zihan, et al.
Published: (2025)
by: Lin, Zihan, et al.
Published: (2025)
Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning
by: Wang, Li, et al.
Published: (2026)
by: Wang, Li, et al.
Published: (2026)
$π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data
by: Zhang, Yaocheng, et al.
Published: (2026)
by: Zhang, Yaocheng, et al.
Published: (2026)
ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning
by: Lin, Zihan, et al.
Published: (2026)
by: Lin, Zihan, et al.
Published: (2026)
SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training
by: Zhang, Qi, et al.
Published: (2026)
by: Zhang, Qi, et al.
Published: (2026)
ToolForge: A Data Synthesis Pipeline for Multi-Hop Search without Real-World APIs
by: Chen, Hao, et al.
Published: (2025)
by: Chen, Hao, et al.
Published: (2025)
When Self-Belief Misleads: Active Label Acquisition for Reinforcement Learning with Verifiable Rewards
by: Wang, Li, et al.
Published: (2026)
by: Wang, Li, et al.
Published: (2026)
360-LLaMA-Factory: Plug & Play Sequence Parallelism for Long Post-Training
by: Zou, Haosheng, et al.
Published: (2025)
by: Zou, Haosheng, et al.
Published: (2025)
Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
by: Lu, Xiaodong, et al.
Published: (2026)
by: Lu, Xiaodong, et al.
Published: (2026)
AutoSearch: Adaptive Search Depth for Efficient Agentic RAG via Reinforcement Learning
by: Sun, Jingbo, et al.
Published: (2026)
by: Sun, Jingbo, et al.
Published: (2026)
RLAE: Reinforcement Learning-Assisted Ensemble for LLMs
by: Fu, Yuqian, et al.
Published: (2025)
by: Fu, Yuqian, et al.
Published: (2025)
Plug-and-Play Training Framework for Preference Optimization
by: Ma, Jingyuan, et al.
Published: (2024)
by: Ma, Jingyuan, et al.
Published: (2024)
AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment
by: Wei, Zhenlin, et al.
Published: (2026)
by: Wei, Zhenlin, et al.
Published: (2026)
A Plug-and-Play Framework for Volumetric Light-Sheet Image Reconstruction
by: Gong, Yi, et al.
Published: (2025)
by: Gong, Yi, et al.
Published: (2025)
MoTo: A Zero-shot Plug-in Interaction-aware Navigation for General Mobile Manipulation
by: Wu, Zhenyu, et al.
Published: (2025)
by: Wu, Zhenyu, et al.
Published: (2025)
PlugSI: Plug-and-Play Test-Time Graph Adaptation for Spatial Interpolation
by: Wu, Xuhang, et al.
Published: (2026)
by: Wu, Xuhang, et al.
Published: (2026)
SDFP: Speculative Decoding with FIT-Pruned Models for Training-Free and Plug-and-Play LLM Acceleration
by: Wei, Hanyu, et al.
Published: (2026)
by: Wei, Hanyu, et al.
Published: (2026)
Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation
by: Chen, Sixu, et al.
Published: (2026)
by: Chen, Sixu, et al.
Published: (2026)
Rethinking Personalization in Large Language Models at the Token Level
by: Zhang, Chenheng, et al.
Published: (2026)
by: Zhang, Chenheng, et al.
Published: (2026)
CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling
by: Liu, Dengcan, et al.
Published: (2026)
by: Liu, Dengcan, et al.
Published: (2026)
Are Full Rollouts Necessary for On-Policy Distillation?
by: Zhang, Yaocheng, et al.
Published: (2026)
by: Zhang, Yaocheng, et al.
Published: (2026)
UN-DETR: Promoting Objectness Learning via Joint Supervision for Unknown Object Detection
by: Liu, Haomiao, et al.
Published: (2024)
by: Liu, Haomiao, et al.
Published: (2024)
RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation
by: Yuan, Chengbo, et al.
Published: (2025)
by: Yuan, Chengbo, et al.
Published: (2025)
Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use
by: Xi, Jiajun, et al.
Published: (2024)
by: Xi, Jiajun, et al.
Published: (2024)
MemOrb: A Plug-and-Play Verbal-Reinforcement Memory Layer for E-Commerce Customer Service
by: Huang, Yizhe, et al.
Published: (2025)
by: Huang, Yizhe, et al.
Published: (2025)
Plug-and-Play Diffusion Distillation
by: Hsiao, Yi-Ting, et al.
Published: (2024)
by: Hsiao, Yi-Ting, et al.
Published: (2024)
NGM: A Plug-and-Play Training-Free Memory Module for LLMs
by: Qu, Yuwen, et al.
Published: (2026)
by: Qu, Yuwen, et al.
Published: (2026)
Overcoming Distribution Shifts in Plug-and-Play Methods with Test-Time Training
by: Chandler, Edward P., et al.
Published: (2024)
by: Chandler, Edward P., et al.
Published: (2024)
Training Plug-n-Play Knowledge Modules with Deep Context Distillation
by: Caccia, Lucas, et al.
Published: (2025)
by: Caccia, Lucas, et al.
Published: (2025)
SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation
by: Yin, Hang, et al.
Published: (2024)
by: Yin, Hang, et al.
Published: (2024)
Imprompter: Tricking LLM Agents into Improper Tool Use
by: Fu, Xiaohan, et al.
Published: (2024)
by: Fu, Xiaohan, et al.
Published: (2024)
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
by: Fu, Yuqian, et al.
Published: (2025)
by: Fu, Yuqian, et al.
Published: (2025)
Similar Items
-
MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL
by: Xu, Zekun, et al.
Published: (2025) -
Training Multi-Image Vision Agents via End2End Reinforcement Learning
by: Dong, Chengqi, et al.
Published: (2025) -
Promoting Efficient Reasoning with Verifiable Stepwise Reward
by: Yue, Chuhuai, et al.
Published: (2025) -
From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory
by: Xia, Siyu, et al.
Published: (2025) -
Joint Training of Multi-Token Prediction in Reinforcement Learning via Optimal Coefficient Calibration
by: Wang, Zili, et al.
Published: (2026)