:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chai, Jiajun, Yin, Guojun, Xu, Zekun, Yue, Chuhuai, Jia, Yi, Xia, Siyu, Wang, Xiaohan, Jiang, Jiwen, Li, Xiaoguang, Dong, Chengqi, He, Hang, Lin, Wei
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.06980
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL
by: Xu, Zekun, et al.
Published: (2025)

Training Multi-Image Vision Agents via End2End Reinforcement Learning
by: Dong, Chengqi, et al.
Published: (2025)

Promoting Efficient Reasoning with Verifiable Stepwise Reward
by: Yue, Chuhuai, et al.
Published: (2025)

From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory
by: Xia, Siyu, et al.
Published: (2025)

Joint Training of Multi-Token Prediction in Reinforcement Learning via Optimal Coefficient Calibration
by: Wang, Zili, et al.
Published: (2026)

ResT: Reshaping Token-Level Policy Gradients for Tool-Use Large Language Models
by: Lin, Zihan, et al.
Published: (2025)

LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services
by: He, Hang, et al.
Published: (2025)

ZipRL: Adaptive Multi-Turn Context Compression with Hindsight Response Replay
by: Hu, Zhexin, et al.
Published: (2026)

AWPO: Enhancing Tool-Use of Large Language Models through Adaptive Integration of Reasoning Rewards
by: Lin, Zihan, et al.
Published: (2025)

Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning
by: Wang, Li, et al.
Published: (2026)

$π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data
by: Zhang, Yaocheng, et al.
Published: (2026)

ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning
by: Lin, Zihan, et al.
Published: (2026)

SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training
by: Zhang, Qi, et al.
Published: (2026)

ToolForge: A Data Synthesis Pipeline for Multi-Hop Search without Real-World APIs
by: Chen, Hao, et al.
Published: (2025)

When Self-Belief Misleads: Active Label Acquisition for Reinforcement Learning with Verifiable Rewards
by: Wang, Li, et al.
Published: (2026)

360-LLaMA-Factory: Plug & Play Sequence Parallelism for Long Post-Training
by: Zou, Haosheng, et al.
Published: (2025)

Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
by: Lu, Xiaodong, et al.
Published: (2026)

AutoSearch: Adaptive Search Depth for Efficient Agentic RAG via Reinforcement Learning
by: Sun, Jingbo, et al.
Published: (2026)

RLAE: Reinforcement Learning-Assisted Ensemble for LLMs
by: Fu, Yuqian, et al.
Published: (2025)

Plug-and-Play Training Framework for Preference Optimization
by: Ma, Jingyuan, et al.
Published: (2024)

AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment
by: Wei, Zhenlin, et al.
Published: (2026)

A Plug-and-Play Framework for Volumetric Light-Sheet Image Reconstruction
by: Gong, Yi, et al.
Published: (2025)

MoTo: A Zero-shot Plug-in Interaction-aware Navigation for General Mobile Manipulation
by: Wu, Zhenyu, et al.
Published: (2025)

PlugSI: Plug-and-Play Test-Time Graph Adaptation for Spatial Interpolation
by: Wu, Xuhang, et al.
Published: (2026)

SDFP: Speculative Decoding with FIT-Pruned Models for Training-Free and Plug-and-Play LLM Acceleration
by: Wei, Hanyu, et al.
Published: (2026)

Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation
by: Chen, Sixu, et al.
Published: (2026)

Rethinking Personalization in Large Language Models at the Token Level
by: Zhang, Chenheng, et al.
Published: (2026)

CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling
by: Liu, Dengcan, et al.
Published: (2026)

Are Full Rollouts Necessary for On-Policy Distillation?
by: Zhang, Yaocheng, et al.
Published: (2026)

UN-DETR: Promoting Objectness Learning via Joint Supervision for Unknown Object Detection
by: Liu, Haomiao, et al.
Published: (2024)

RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation
by: Yuan, Chengbo, et al.
Published: (2025)

Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use
by: Xi, Jiajun, et al.
Published: (2024)

MemOrb: A Plug-and-Play Verbal-Reinforcement Memory Layer for E-Commerce Customer Service
by: Huang, Yizhe, et al.
Published: (2025)

Plug-and-Play Diffusion Distillation
by: Hsiao, Yi-Ting, et al.
Published: (2024)

NGM: A Plug-and-Play Training-Free Memory Module for LLMs
by: Qu, Yuwen, et al.
Published: (2026)

Overcoming Distribution Shifts in Plug-and-Play Methods with Test-Time Training
by: Chandler, Edward P., et al.
Published: (2024)

Training Plug-n-Play Knowledge Modules with Deep Context Distillation
by: Caccia, Lucas, et al.
Published: (2025)

SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation
by: Yin, Hang, et al.
Published: (2024)

Imprompter: Tricking LLM Agents into Improper Tool Use
by: Fu, Xiaohan, et al.
Published: (2024)

SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
by: Fu, Yuqian, et al.
Published: (2025)