Saved in:
| Main Authors: | Dong, Chengqi, Yue, Chuhuai, He, Hang, Mao, Rongge, Tang, Fenghe, Zhou, S Kevin, Xu, Zekun, Wang, Xiaohan, Chai, Jiajun, Yin, Guojun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.08980 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use
by: Chai, Jiajun, et al.
Published: (2025)
by: Chai, Jiajun, et al.
Published: (2025)
LGMSNet: Thinning a medical image segmentation model via dual-level multiscale fusion
by: Dong, Chengqi, et al.
Published: (2025)
by: Dong, Chengqi, et al.
Published: (2025)
Promoting Efficient Reasoning with Verifiable Stepwise Reward
by: Yue, Chuhuai, et al.
Published: (2025)
by: Yue, Chuhuai, et al.
Published: (2025)
MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL
by: Xu, Zekun, et al.
Published: (2025)
by: Xu, Zekun, et al.
Published: (2025)
Joint Training of Multi-Token Prediction in Reinforcement Learning via Optimal Coefficient Calibration
by: Wang, Zili, et al.
Published: (2026)
by: Wang, Zili, et al.
Published: (2026)
LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services
by: He, Hang, et al.
Published: (2025)
by: He, Hang, et al.
Published: (2025)
From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory
by: Xia, Siyu, et al.
Published: (2025)
by: Xia, Siyu, et al.
Published: (2025)
Hi-End-MAE: Hierarchical encoder-driven masked autoencoders are stronger vision learners for medical image segmentation
by: Tang, Fenghe, et al.
Published: (2025)
by: Tang, Fenghe, et al.
Published: (2025)
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
by: Wei, Zhepei, et al.
Published: (2025)
by: Wei, Zhepei, et al.
Published: (2025)
SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training
by: Zhang, Qi, et al.
Published: (2026)
by: Zhang, Qi, et al.
Published: (2026)
ZipRL: Adaptive Multi-Turn Context Compression with Hindsight Response Replay
by: Hu, Zhexin, et al.
Published: (2026)
by: Hu, Zhexin, et al.
Published: (2026)
ToolForge: A Data Synthesis Pipeline for Multi-Hop Search without Real-World APIs
by: Chen, Hao, et al.
Published: (2025)
by: Chen, Hao, et al.
Published: (2025)
MASteer: Multi-Agent Adaptive Steer Strategy for End-to-End LLM Trustworthiness Repair
by: Li, Changqing, et al.
Published: (2025)
by: Li, Changqing, et al.
Published: (2025)
ResT: Reshaping Token-Level Policy Gradients for Tool-Use Large Language Models
by: Lin, Zihan, et al.
Published: (2025)
by: Lin, Zihan, et al.
Published: (2025)
ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning
by: Lin, Zihan, et al.
Published: (2026)
by: Lin, Zihan, et al.
Published: (2026)
When Self-Belief Misleads: Active Label Acquisition for Reinforcement Learning with Verifiable Rewards
by: Wang, Li, et al.
Published: (2026)
by: Wang, Li, et al.
Published: (2026)
EVA: Efficient Reinforcement Learning for End-to-End Video Agent
by: Zhang, Yaolun, et al.
Published: (2026)
by: Zhang, Yaolun, et al.
Published: (2026)
$π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data
by: Zhang, Yaocheng, et al.
Published: (2026)
by: Zhang, Yaocheng, et al.
Published: (2026)
MedReason-R1: Learning to Reason for CT Diagnosis with Reinforcement Learning and Local Zoom
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
A Novel End-To-End Event Geolocation Method Leveraging Hyperbolic Space and Toponym Hierarchies
by: Qiao, Yaqiong, et al.
Published: (2024)
by: Qiao, Yaqiong, et al.
Published: (2024)
Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster
by: Tang, Fenghe, et al.
Published: (2025)
by: Tang, Fenghe, et al.
Published: (2025)
HabitatAgent: An End-to-End Multi-Agent System for Housing Consultation
by: Yang, Hongyang, et al.
Published: (2026)
by: Yang, Hongyang, et al.
Published: (2026)
ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
by: Lai, Hanyu, et al.
Published: (2025)
by: Lai, Hanyu, et al.
Published: (2025)
Concept-to-Pixel: Prompt-Free Universal Medical Image Segmentation
by: Chen, Haoyun, et al.
Published: (2026)
by: Chen, Haoyun, et al.
Published: (2026)
VoxelPrompt: A Vision Agent for End-to-End Medical Image Analysis
by: Hoopes, Andrew, et al.
Published: (2024)
by: Hoopes, Andrew, et al.
Published: (2024)
AWPO: Enhancing Tool-Use of Large Language Models through Adaptive Integration of Reasoning Rewards
by: Lin, Zihan, et al.
Published: (2025)
by: Lin, Zihan, et al.
Published: (2025)
Catching Spinning Table Tennis Balls in Simulation with End-to-End Curriculum Reinforcement Learning
by: Hu, Xiaoyi, et al.
Published: (2025)
by: Hu, Xiaoyi, et al.
Published: (2025)
Mobile U-ViT: Revisiting large kernel and U-shaped ViT for efficient medical image segmentation
by: Tang, Fenghe, et al.
Published: (2025)
by: Tang, Fenghe, et al.
Published: (2025)
Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
by: Lu, Xiaodong, et al.
Published: (2026)
by: Lu, Xiaodong, et al.
Published: (2026)
Vision-Proprioception Fusion with Mamba2 in End-to-End Reinforcement Learning for Motion Control
by: Tao, Xiaowen, et al.
Published: (2025)
by: Tao, Xiaowen, et al.
Published: (2025)
UCAD: Uncertainty-guided Contour-aware Displacement for semi-supervised medical image segmentation
by: Ding, Chengbo, et al.
Published: (2026)
by: Ding, Chengbo, et al.
Published: (2026)
OneVision: An End-to-End Generative Framework for Multi-view E-commerce Vision Search
by: Zheng, Zexin, et al.
Published: (2025)
by: Zheng, Zexin, et al.
Published: (2025)
SMTrack: End-to-End Trained Spiking Neural Networks for Multi-Object Tracking in RGB Videos
by: Zhong, Pengzhi, et al.
Published: (2025)
by: Zhong, Pengzhi, et al.
Published: (2025)
AutoSearch: Adaptive Search Depth for Efficient Agentic RAG via Reinforcement Learning
by: Sun, Jingbo, et al.
Published: (2026)
by: Sun, Jingbo, et al.
Published: (2026)
RLAE: Reinforcement Learning-Assisted Ensemble for LLMs
by: Fu, Yuqian, et al.
Published: (2025)
by: Fu, Yuqian, et al.
Published: (2025)
Multi-Agent End-to-End Vulnerability Management for Mitigating Recurring Vulnerabilities
by: Zheng, Zelong, et al.
Published: (2026)
by: Zheng, Zelong, et al.
Published: (2026)
Poutine: Vision-Language-Trajectory Pre-Training and Reinforcement Learning Post-Training Enable Robust End-to-End Autonomous Driving
by: Rowe, Luke, et al.
Published: (2025)
by: Rowe, Luke, et al.
Published: (2025)
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
by: Li, Weizhen, et al.
Published: (2025)
by: Li, Weizhen, et al.
Published: (2025)
PanopticSplatting: End-to-End Panoptic Gaussian Splatting
by: Xie, Yuxuan, et al.
Published: (2025)
by: Xie, Yuxuan, et al.
Published: (2025)
EAR-Net: Pursuing End-to-End Absolute Rotations from Multi-View Images
by: Liu, Yuzhen, et al.
Published: (2023)
by: Liu, Yuzhen, et al.
Published: (2023)
Similar Items
-
RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use
by: Chai, Jiajun, et al.
Published: (2025) -
LGMSNet: Thinning a medical image segmentation model via dual-level multiscale fusion
by: Dong, Chengqi, et al.
Published: (2025) -
Promoting Efficient Reasoning with Verifiable Stepwise Reward
by: Yue, Chuhuai, et al.
Published: (2025) -
MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL
by: Xu, Zekun, et al.
Published: (2025) -
Joint Training of Multi-Token Prediction in Reinforcement Learning via Optimal Coefficient Calibration
by: Wang, Zili, et al.
Published: (2026)