Saved in:
| Main Authors: | Chen, Yuhan, Liu, Yuxuan, Zhang, Long, Gao, Pengzhi, Luan, Jian, Liu, Wei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.13091 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation
by: Chen, Yuhan, et al.
Published: (2024)
by: Chen, Yuhan, et al.
Published: (2024)
Revisiting Entropy in Reinforcement Learning for Large Reasoning Models
by: Jin, Renren, et al.
Published: (2025)
by: Jin, Renren, et al.
Published: (2025)
BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism
by: Wu, Qinzhuo, et al.
Published: (2025)
by: Wu, Qinzhuo, et al.
Published: (2025)
COPO: Consistency-Aware Policy Optimization
by: Han, Jinghang, et al.
Published: (2025)
by: Han, Jinghang, et al.
Published: (2025)
More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives
by: Zhang, Xiaoqing, et al.
Published: (2025)
by: Zhang, Xiaoqing, et al.
Published: (2025)
Mixture of Diverse Size Experts
by: Sun, Manxi, et al.
Published: (2024)
by: Sun, Manxi, et al.
Published: (2024)
Group Sequence Policy Optimization
by: Zheng, Chujie, et al.
Published: (2025)
by: Zheng, Chujie, et al.
Published: (2025)
Soft Adaptive Policy Optimization
by: Gao, Chang, et al.
Published: (2025)
by: Gao, Chang, et al.
Published: (2025)
MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning
by: Huang, Kun, et al.
Published: (2025)
by: Huang, Kun, et al.
Published: (2025)
One STEP at a time: Language Agents are Stepwise Planners
by: Nguyen, Minh, et al.
Published: (2024)
by: Nguyen, Minh, et al.
Published: (2024)
PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning
by: Wu, Feijie, et al.
Published: (2025)
by: Wu, Feijie, et al.
Published: (2025)
MobileBench-OL: A Comprehensive Chinese Benchmark for Evaluating Mobile GUI Agents in Real-World Environment
by: Wu, Qinzhuo, et al.
Published: (2026)
by: Wu, Qinzhuo, et al.
Published: (2026)
Evaluating the Effectiveness of Large Language Models in Representing and Understanding Movement Trajectories
by: Ji, Yuhan, et al.
Published: (2024)
by: Ji, Yuhan, et al.
Published: (2024)
Mobile-Bench-v2: A More Realistic and Comprehensive Benchmark for VLM-based Mobile Agents
by: Xu, Weikai, et al.
Published: (2025)
by: Xu, Weikai, et al.
Published: (2025)
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
by: Gao, Zhaolin, et al.
Published: (2024)
by: Gao, Zhaolin, et al.
Published: (2024)
Replay Failures as Successes: Sample-Efficient Reinforcement Learning for Instruction Following
by: Zhang, Kongcheng, et al.
Published: (2025)
by: Zhang, Kongcheng, et al.
Published: (2025)
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
by: Chen, Mengzhao, et al.
Published: (2024)
by: Chen, Mengzhao, et al.
Published: (2024)
Training-Trajectory-Aware Token Selection
by: Shen, Zhanming, et al.
Published: (2026)
by: Shen, Zhanming, et al.
Published: (2026)
MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning
by: Wang, Hongjun, et al.
Published: (2026)
by: Wang, Hongjun, et al.
Published: (2026)
HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization
by: Huang, Chengyu, et al.
Published: (2025)
by: Huang, Chengyu, et al.
Published: (2025)
Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference
by: Liu, Baihui, et al.
Published: (2026)
by: Liu, Baihui, et al.
Published: (2026)
BinaryPPO: Efficient Policy Optimization for Binary Classification
by: Pandey, Punya Syon, et al.
Published: (2026)
by: Pandey, Punya Syon, et al.
Published: (2026)
DeepImagine: Learning Biomedical Reasoning via Successive Counterfactual Imagining
by: Zheng, Youze, et al.
Published: (2026)
by: Zheng, Youze, et al.
Published: (2026)
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
by: Qi, Penghui, et al.
Published: (2025)
by: Qi, Penghui, et al.
Published: (2025)
Agentic Reinforced Policy Optimization
by: Dong, Guanting, et al.
Published: (2025)
by: Dong, Guanting, et al.
Published: (2025)
DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models
by: Zhang, Yuxuan, et al.
Published: (2024)
by: Zhang, Yuxuan, et al.
Published: (2024)
R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning
by: Chen, Zhuokun, et al.
Published: (2025)
by: Chen, Zhuokun, et al.
Published: (2025)
Fibration Policy Optimization
by: Li, Chang, et al.
Published: (2026)
by: Li, Chang, et al.
Published: (2026)
BAGEN: Are LLM Agents Budget-Aware?
by: Lin, Yuxiang, et al.
Published: (2026)
by: Lin, Yuxiang, et al.
Published: (2026)
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
by: Liu, Shih-Yang, et al.
Published: (2026)
by: Liu, Shih-Yang, et al.
Published: (2026)
Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models
by: Lv, Ang, et al.
Published: (2024)
by: Lv, Ang, et al.
Published: (2024)
Research on Optimization of Natural Language Processing Model Based on Multimodal Deep Learning
by: Sun, Dan, et al.
Published: (2024)
by: Sun, Dan, et al.
Published: (2024)
Hindsight-Anchored Policy Optimization: Turning Failure into Feedback in Sparse Reward Settings
by: Wu, Yuning, et al.
Published: (2026)
by: Wu, Yuning, et al.
Published: (2026)
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
by: Xi, Zhiheng, et al.
Published: (2025)
by: Xi, Zhiheng, et al.
Published: (2025)
EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training
by: Pan, Chengjun, et al.
Published: (2026)
by: Pan, Chengjun, et al.
Published: (2026)
Position-Aware Parameter Efficient Fine-Tuning Approach for Reducing Positional Bias in LLMs
by: Zhang, Zheng, et al.
Published: (2024)
by: Zhang, Zheng, et al.
Published: (2024)
Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration
by: Chen, Zhipeng, et al.
Published: (2026)
by: Chen, Zhipeng, et al.
Published: (2026)
Bridging the Knowledge Void: Inference-time Acquisition of Unfamiliar Programming Languages for Coding Tasks
by: Shen, Chen, et al.
Published: (2026)
by: Shen, Chen, et al.
Published: (2026)
EBPO: Empirical Bayes Shrinkage for Stabilizing Group-Relative Policy Optimization
by: Han, Kevin, et al.
Published: (2026)
by: Han, Kevin, et al.
Published: (2026)
Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
by: Du, Yihan, et al.
Published: (2024)
by: Du, Yihan, et al.
Published: (2024)
Similar Items
-
HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation
by: Chen, Yuhan, et al.
Published: (2024) -
Revisiting Entropy in Reinforcement Learning for Large Reasoning Models
by: Jin, Renren, et al.
Published: (2025) -
BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism
by: Wu, Qinzhuo, et al.
Published: (2025) -
COPO: Consistency-Aware Policy Optimization
by: Han, Jinghang, et al.
Published: (2025) -
More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives
by: Zhang, Xiaoqing, et al.
Published: (2025)