Saved in:
| Main Authors: | Li, Chenyi, Zhang, Yuan, Wang, Bo, Ma, Guoqing, Tang, Wei, Huang, Haoyang, Duan, Nan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.01062 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives
by: Wang, Bo, et al.
Published: (2025)
by: Wang, Bo, et al.
Published: (2025)
Generative Pre-trained Autoregressive Diffusion Transformer
by: Zhang, Yuan, et al.
Published: (2025)
by: Zhang, Yuan, et al.
Published: (2025)
PrAg-PO: Prompt Augmented Policy Optimization for Robust and Diverse Mathematical Reasoning
by: Lu, Wenquan, et al.
Published: (2026)
by: Lu, Wenquan, et al.
Published: (2026)
MemPO: Self-Memory Policy Optimization for Long-Horizon Agents
by: Li, Ruoran, et al.
Published: (2026)
by: Li, Ruoran, et al.
Published: (2026)
Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning
by: Li, Xuan, et al.
Published: (2026)
by: Li, Xuan, et al.
Published: (2026)
Mars-PO: Multi-Agent Reasoning System Preference Optimization
by: Lou, Xiaoxuan, et al.
Published: (2024)
by: Lou, Xiaoxuan, et al.
Published: (2024)
Breaking $\textit{Winner-Takes-All}$: Cooperative Policy Optimization Improves Diverse LLM Reasoning
by: Chen, Haoxuan, et al.
Published: (2026)
by: Chen, Haoxuan, et al.
Published: (2026)
AT$^2$PO: Agentic Turn-based Policy Optimization via Tree Search
by: Zong, Zefang, et al.
Published: (2026)
by: Zong, Zefang, et al.
Published: (2026)
DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization
by: Chen, Wentse, et al.
Published: (2022)
by: Chen, Wentse, et al.
Published: (2022)
Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge
by: Raju, Ravi, et al.
Published: (2024)
by: Raju, Ravi, et al.
Published: (2024)
RE-PO: Robust Enhanced Policy Optimization as a General Framework for LLM Alignment
by: Cao, Xiaoyang, et al.
Published: (2025)
by: Cao, Xiaoyang, et al.
Published: (2025)
RiskPO: Risk-based Policy Optimization via Verifiable Reward for LLM Post-Training
by: Ren, Tao, et al.
Published: (2025)
by: Ren, Tao, et al.
Published: (2025)
CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning
by: Zhang, Shijie, et al.
Published: (2025)
by: Zhang, Shijie, et al.
Published: (2025)
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
by: Su, Zhenpeng, et al.
Published: (2025)
by: Su, Zhenpeng, et al.
Published: (2025)
Hindsight-Anchored Policy Optimization: Turning Failure into Feedback in Sparse Reward Settings
by: Wu, Yuning, et al.
Published: (2026)
by: Wu, Yuning, et al.
Published: (2026)
OptiSet: Unified Optimizing Set Selection and Ranking for Retrieval-Augmented Generation
by: Jiang, Yi, et al.
Published: (2026)
by: Jiang, Yi, et al.
Published: (2026)
GHOST: Solving the Traveling Salesman Problem on Graphs of Convex Sets
by: Tang, Jingtao, et al.
Published: (2025)
by: Tang, Jingtao, et al.
Published: (2025)
RePO: Replay-Enhanced Policy Optimization
by: Li, Siheng, et al.
Published: (2025)
by: Li, Siheng, et al.
Published: (2025)
Goal-Driven Reasoning in DatalogMTL with Magic Sets
by: Wang, Shaoyu, et al.
Published: (2024)
by: Wang, Shaoyu, et al.
Published: (2024)
Multi-Preference Optimization: Generalizing DPO via Set-Level Contrasts
by: Gupta, Taneesh, et al.
Published: (2024)
by: Gupta, Taneesh, et al.
Published: (2024)
Frame-Level Captions for Long Video Generation with Complex Multi Scenes
by: Zheng, Guangcong, et al.
Published: (2025)
by: Zheng, Guangcong, et al.
Published: (2025)
Graph-Enhanced Policy Optimization in LLM Agent Training
by: Yuan, Jiazhen, et al.
Published: (2025)
by: Yuan, Jiazhen, et al.
Published: (2025)
InfoPO: Information-Driven Policy Optimization for User-Centric Agents
by: Kong, Fanqi, et al.
Published: (2026)
by: Kong, Fanqi, et al.
Published: (2026)
Random Graph Set and Evidence Pattern Reasoning Model
by: Zhan, Tianxiang, et al.
Published: (2024)
by: Zhan, Tianxiang, et al.
Published: (2024)
Beyond KL Divergence: Policy Optimization with Flexible Bregman Divergences for LLM Reasoning
by: Yuan, Rui, et al.
Published: (2026)
by: Yuan, Rui, et al.
Published: (2026)
Style-Preserving Policy Optimization for Game Agents
by: Li, Lingfeng, et al.
Published: (2025)
by: Li, Lingfeng, et al.
Published: (2025)
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
by: Shen, Guobin, et al.
Published: (2026)
by: Shen, Guobin, et al.
Published: (2026)
Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning
by: Wang, Ziyan, et al.
Published: (2025)
by: Wang, Ziyan, et al.
Published: (2025)
Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations
by: Zhang, Yuan, et al.
Published: (2026)
by: Zhang, Yuan, et al.
Published: (2026)
PolicySim: An LLM-Based Agent Social Simulation Sandbox for Proactive Policy Optimization
by: Huang, Renhong, et al.
Published: (2026)
by: Huang, Renhong, et al.
Published: (2026)
ERPO: Token-Level Entropy-Regulated Policy Optimization for Large Reasoning Models
by: Yu, Song, et al.
Published: (2026)
by: Yu, Song, et al.
Published: (2026)
Rethinking Importance Sampling in LLM Policy Optimization: A Cumulative Token Perspective
by: Zhang, Yuheng, et al.
Published: (2026)
by: Zhang, Yuheng, et al.
Published: (2026)
JoyAI-Image: Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation
by: Song, Lin, et al.
Published: (2026)
by: Song, Lin, et al.
Published: (2026)
Policy of Thoughts: Scaling LLM Reasoning via Test-time Policy Evolution
by: Jiao, Zhengbo, et al.
Published: (2026)
by: Jiao, Zhengbo, et al.
Published: (2026)
Beyond State-Wise Mirror Descent: Offline Policy Optimization with Parametric Policies
by: Li, Xiang, et al.
Published: (2026)
by: Li, Xiang, et al.
Published: (2026)
GRACE: A Dynamic Coreset Selection Framework for Large Language Model Optimization
by: Tang, Tianhao, et al.
Published: (2026)
by: Tang, Tianhao, et al.
Published: (2026)
SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning
by: Ma, Yufei, et al.
Published: (2026)
by: Ma, Yufei, et al.
Published: (2026)
RAE-AR: Taming Autoregressive Models with Representation Autoencoders
by: Yu, Hu, et al.
Published: (2026)
by: Yu, Hu, et al.
Published: (2026)
IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning
by: Liang, Zihan, et al.
Published: (2026)
by: Liang, Zihan, et al.
Published: (2026)
Predicate-Conditional Conformalized Answer Sets for Knowledge Graph Embeddings
by: Zhu, Yuqicheng, et al.
Published: (2025)
by: Zhu, Yuqicheng, et al.
Published: (2025)
Similar Items
-
STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives
by: Wang, Bo, et al.
Published: (2025) -
Generative Pre-trained Autoregressive Diffusion Transformer
by: Zhang, Yuan, et al.
Published: (2025) -
PrAg-PO: Prompt Augmented Policy Optimization for Robust and Diverse Mathematical Reasoning
by: Lu, Wenquan, et al.
Published: (2026) -
MemPO: Self-Memory Policy Optimization for Long-Horizon Agents
by: Li, Ruoran, et al.
Published: (2026) -
Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning
by: Li, Xuan, et al.
Published: (2026)