Saved in:
| Main Authors: | Lou, Xiaoxuan, Wang, Chaojie, An, Bo |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.19039 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HiPO: Hierarchical Preference Optimization for Adaptive Reasoning in LLMs
by: Kachroo, Darsh, et al.
Published: (2026)
by: Kachroo, Darsh, et al.
Published: (2026)
DynamicPO: Dynamic Preference Optimization for Recommendation
by: Hu, Xingyu, et al.
Published: (2026)
by: Hu, Xingyu, et al.
Published: (2026)
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
by: Wang, Chaojie, et al.
Published: (2024)
by: Wang, Chaojie, et al.
Published: (2024)
FairPO: Robust Preference Optimization for Fair Multi-Label Learning
by: Mondal, Soumen Kumar, et al.
Published: (2025)
by: Mondal, Soumen Kumar, et al.
Published: (2025)
ViPO: Visual Preference Optimization at Scale
by: Li, Ming, et al.
Published: (2026)
by: Li, Ming, et al.
Published: (2026)
SetPO: Set-Level Policy Optimization for Diversity-Preserving LLM Reasoning
by: Li, Chenyi, et al.
Published: (2026)
by: Li, Chenyi, et al.
Published: (2026)
FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings
by: Liu, Tong, et al.
Published: (2025)
by: Liu, Tong, et al.
Published: (2025)
PGPO: Enhancing Agent Reasoning via Pseudocode-style Planning Guided Preference Optimization
by: Cao, Zouying, et al.
Published: (2025)
by: Cao, Zouying, et al.
Published: (2025)
RankPO: Preference Optimization for Job-Talent Matching
by: Zhang, Yafei, et al.
Published: (2025)
by: Zhang, Yafei, et al.
Published: (2025)
AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models
by: Liu, Qi, et al.
Published: (2025)
by: Liu, Qi, et al.
Published: (2025)
Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization
by: Liu, Jiacai, et al.
Published: (2024)
by: Liu, Jiacai, et al.
Published: (2024)
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
by: Zhao, Hanyang, et al.
Published: (2024)
by: Zhao, Hanyang, et al.
Published: (2024)
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism
by: Liu, Shulin, et al.
Published: (2025)
by: Liu, Shulin, et al.
Published: (2025)
RePO: Understanding Preference Learning Through ReLU-Based Optimization
by: Wu, Junkang, et al.
Published: (2025)
by: Wu, Junkang, et al.
Published: (2025)
TO-Agents: A Multi-Agent AI Pipeline for Preference-Guided Topology Optimization
by: Stewart, Isabella A., et al.
Published: (2026)
by: Stewart, Isabella A., et al.
Published: (2026)
SynPO: Synergizing Descriptiveness and Preference Optimization for Video Detailed Captioning
by: Dang, Jisheng, et al.
Published: (2025)
by: Dang, Jisheng, et al.
Published: (2025)
PerPO: Perceptual Preference Optimization via Discriminative Rewarding
by: Zhu, Zining, et al.
Published: (2025)
by: Zhu, Zining, et al.
Published: (2025)
Beyond Local Code Optimization: Multi-Agent Reasoning for Software System Optimization
by: Peng, Huiyun, et al.
Published: (2026)
by: Peng, Huiyun, et al.
Published: (2026)
MT-Mol:Multi Agent System with Tool-based Reasoning for Molecular Optimization
by: Kim, Hyomin, et al.
Published: (2025)
by: Kim, Hyomin, et al.
Published: (2025)
MDPO: Multi-Granularity Direct Preference Optimization for Mathematical Reasoning
by: Lin, Yunze
Published: (2025)
by: Lin, Yunze
Published: (2025)
Iterative Reasoning Preference Optimization
by: Pang, Richard Yuanzhe, et al.
Published: (2024)
by: Pang, Richard Yuanzhe, et al.
Published: (2024)
MARS: Multi-Agent Adaptive Reasoning with Socratic Guidance for Automated Prompt Optimization
by: Zhang, Jian, et al.
Published: (2025)
by: Zhang, Jian, et al.
Published: (2025)
IMAGINE: Integrating Multi-Agent System into One Model for Complex Reasoning and Planning
by: Zhang, Xikai, et al.
Published: (2025)
by: Zhang, Xikai, et al.
Published: (2025)
Conjunctive Prompt Attacks in Multi-Agent LLM Systems
by: Arif, Nokimul Hasan, et al.
Published: (2026)
by: Arif, Nokimul Hasan, et al.
Published: (2026)
MemPO: Self-Memory Policy Optimization for Long-Horizon Agents
by: Li, Ruoran, et al.
Published: (2026)
by: Li, Ruoran, et al.
Published: (2026)
InfoPO: Information-Driven Policy Optimization for User-Centric Agents
by: Kong, Fanqi, et al.
Published: (2026)
by: Kong, Fanqi, et al.
Published: (2026)
Urban-MAS: Human-Centered Urban Prediction with LLM-Based Multi-Agent System
by: Lou, Shangyu
Published: (2025)
by: Lou, Shangyu
Published: (2025)
Building Coding Agents via Entropy-Enhanced Multi-Turn Preference Optimization
by: Yu, Jiahao, et al.
Published: (2025)
by: Yu, Jiahao, et al.
Published: (2025)
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
by: Lyu, Yougang, et al.
Published: (2024)
by: Lyu, Yougang, et al.
Published: (2024)
KnowPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models
by: Zhang, Ruizhe, et al.
Published: (2024)
by: Zhang, Ruizhe, et al.
Published: (2024)
SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents
by: Lin, Jiaye, et al.
Published: (2025)
by: Lin, Jiaye, et al.
Published: (2025)
Agent Mars: Multi-Agent Simulation for Multi-Planetary Life Exploration and Settlement
by: Wang, Ziyang
Published: (2026)
by: Wang, Ziyang
Published: (2026)
Understanding Individual Agent Importance in Multi-Agent System via Counterfactual Reasoning
by: Chen, Jianming, et al.
Published: (2024)
by: Chen, Jianming, et al.
Published: (2024)
JailPO: A Novel Black-box Jailbreak Framework via Preference Optimization against Aligned LLMs
by: Li, Hongyi, et al.
Published: (2024)
by: Li, Hongyi, et al.
Published: (2024)
BAMAS: Structuring Budget-Aware Multi-Agent Systems
by: Yang, Liming, et al.
Published: (2025)
by: Yang, Liming, et al.
Published: (2025)
DIANOIA: Diagnostic Decomposition and Joint Optimization for Multi-Agent Reasoning
by: Yang, Yiming, et al.
Published: (2026)
by: Yang, Yiming, et al.
Published: (2026)
Learning Latency-Aware Orchestration for Parallel Multi-Agent Systems
by: Shi, Xi, et al.
Published: (2026)
by: Shi, Xi, et al.
Published: (2026)
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
by: Liu, Chaohu, et al.
Published: (2025)
by: Liu, Chaohu, et al.
Published: (2025)
ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization
by: Yoon, Hee Suk, et al.
Published: (2025)
by: Yoon, Hee Suk, et al.
Published: (2025)
Calibration-Aware Policy Optimization for Reasoning LLMs
by: Wang, Ziqi, et al.
Published: (2026)
by: Wang, Ziqi, et al.
Published: (2026)
Similar Items
-
HiPO: Hierarchical Preference Optimization for Adaptive Reasoning in LLMs
by: Kachroo, Darsh, et al.
Published: (2026) -
DynamicPO: Dynamic Preference Optimization for Recommendation
by: Hu, Xingyu, et al.
Published: (2026) -
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
by: Wang, Chaojie, et al.
Published: (2024) -
FairPO: Robust Preference Optimization for Fair Multi-Label Learning
by: Mondal, Soumen Kumar, et al.
Published: (2025) -
ViPO: Visual Preference Optimization at Scale
by: Li, Ming, et al.
Published: (2026)