:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dong, Chengqi, Yue, Chuhuai, He, Hang, Mao, Rongge, Tang, Fenghe, Zhou, S Kevin, Xu, Zekun, Wang, Xiaohan, Chai, Jiajun, Yin, Guojun
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2512.08980
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use
by: Chai, Jiajun, et al.
Published: (2025)

LGMSNet: Thinning a medical image segmentation model via dual-level multiscale fusion
by: Dong, Chengqi, et al.
Published: (2025)

Promoting Efficient Reasoning with Verifiable Stepwise Reward
by: Yue, Chuhuai, et al.
Published: (2025)

MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL
by: Xu, Zekun, et al.
Published: (2025)

Joint Training of Multi-Token Prediction in Reinforcement Learning via Optimal Coefficient Calibration
by: Wang, Zili, et al.
Published: (2026)

LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services
by: He, Hang, et al.
Published: (2025)

From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory
by: Xia, Siyu, et al.
Published: (2025)

Hi-End-MAE: Hierarchical encoder-driven masked autoencoders are stronger vision learners for medical image segmentation
by: Tang, Fenghe, et al.
Published: (2025)

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
by: Wei, Zhepei, et al.
Published: (2025)

SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training
by: Zhang, Qi, et al.
Published: (2026)

ZipRL: Adaptive Multi-Turn Context Compression with Hindsight Response Replay
by: Hu, Zhexin, et al.
Published: (2026)

ToolForge: A Data Synthesis Pipeline for Multi-Hop Search without Real-World APIs
by: Chen, Hao, et al.
Published: (2025)

MASteer: Multi-Agent Adaptive Steer Strategy for End-to-End LLM Trustworthiness Repair
by: Li, Changqing, et al.
Published: (2025)

ResT: Reshaping Token-Level Policy Gradients for Tool-Use Large Language Models
by: Lin, Zihan, et al.
Published: (2025)

ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning
by: Lin, Zihan, et al.
Published: (2026)

When Self-Belief Misleads: Active Label Acquisition for Reinforcement Learning with Verifiable Rewards
by: Wang, Li, et al.
Published: (2026)

EVA: Efficient Reinforcement Learning for End-to-End Video Agent
by: Zhang, Yaolun, et al.
Published: (2026)

$π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data
by: Zhang, Yaocheng, et al.
Published: (2026)

MedReason-R1: Learning to Reason for CT Diagnosis with Reinforcement Learning and Local Zoom
by: Li, Yifan, et al.
Published: (2025)

A Novel End-To-End Event Geolocation Method Leveraging Hyperbolic Space and Toponym Hierarchies
by: Qiao, Yaqiong, et al.
Published: (2024)

Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster
by: Tang, Fenghe, et al.
Published: (2025)

HabitatAgent: An End-to-End Multi-Agent System for Housing Consultation
by: Yang, Hongyang, et al.
Published: (2026)

ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
by: Lai, Hanyu, et al.
Published: (2025)

Concept-to-Pixel: Prompt-Free Universal Medical Image Segmentation
by: Chen, Haoyun, et al.
Published: (2026)

VoxelPrompt: A Vision Agent for End-to-End Medical Image Analysis
by: Hoopes, Andrew, et al.
Published: (2024)

AWPO: Enhancing Tool-Use of Large Language Models through Adaptive Integration of Reasoning Rewards
by: Lin, Zihan, et al.
Published: (2025)

Catching Spinning Table Tennis Balls in Simulation with End-to-End Curriculum Reinforcement Learning
by: Hu, Xiaoyi, et al.
Published: (2025)

Mobile U-ViT: Revisiting large kernel and U-shaped ViT for efficient medical image segmentation
by: Tang, Fenghe, et al.
Published: (2025)

Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
by: Lu, Xiaodong, et al.
Published: (2026)

Vision-Proprioception Fusion with Mamba2 in End-to-End Reinforcement Learning for Motion Control
by: Tao, Xiaowen, et al.
Published: (2025)

UCAD: Uncertainty-guided Contour-aware Displacement for semi-supervised medical image segmentation
by: Ding, Chengbo, et al.
Published: (2026)

OneVision: An End-to-End Generative Framework for Multi-view E-commerce Vision Search
by: Zheng, Zexin, et al.
Published: (2025)

SMTrack: End-to-End Trained Spiking Neural Networks for Multi-Object Tracking in RGB Videos
by: Zhong, Pengzhi, et al.
Published: (2025)

AutoSearch: Adaptive Search Depth for Efficient Agentic RAG via Reinforcement Learning
by: Sun, Jingbo, et al.
Published: (2026)

RLAE: Reinforcement Learning-Assisted Ensemble for LLMs
by: Fu, Yuqian, et al.
Published: (2025)

Multi-Agent End-to-End Vulnerability Management for Mitigating Recurring Vulnerabilities
by: Zheng, Zelong, et al.
Published: (2026)

Poutine: Vision-Language-Trajectory Pre-Training and Reinforcement Learning Post-Training Enable Robust End-to-End Autonomous Driving
by: Rowe, Luke, et al.
Published: (2025)

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
by: Li, Weizhen, et al.
Published: (2025)

PanopticSplatting: End-to-End Panoptic Gaussian Splatting
by: Xie, Yuxuan, et al.
Published: (2025)

EAR-Net: Pursuing End-to-End Absolute Rotations from Multi-View Images
by: Liu, Yuzhen, et al.
Published: (2023)