Saved in:
| Main Authors: | Wang, Jiaxuan, Hu, Yulan, Yang, Wenjin, Pan, Zheng, Li, Xin, Guo, Lan-Zhe |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.08178 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TRACE: Distilling Where It Matters via Token-Routed Self On-Policy Alignment
by: Wang, Jiaxuan, et al.
Published: (2026)
by: Wang, Jiaxuan, et al.
Published: (2026)
Multi-Stakeholder LLM Alignment: Decomposing Estimation from Aggregation
by: Zheng, Lulu, et al.
Published: (2026)
by: Zheng, Lulu, et al.
Published: (2026)
Beyond Itinerary Planning-A Real-World Benchmark for Multi-Turn and Tool-Using Travel Tasks
by: Cheng, Xiang, et al.
Published: (2025)
by: Cheng, Xiang, et al.
Published: (2025)
Aligning Progress and Feasibility: A Neuro-Symbolic Dual Memory Framework for Long-Horizon LLM Agents
by: Wen, Bin, et al.
Published: (2026)
by: Wen, Bin, et al.
Published: (2026)
Coarse-to-Fine Process Reward Modeling for Mathematical Reasoning
by: Hu, Yulan, et al.
Published: (2025)
by: Hu, Yulan, et al.
Published: (2025)
Aligning Crowd Feedback via Distributional Preference Reward Modeling
by: Li, Dexun, et al.
Published: (2024)
by: Li, Dexun, et al.
Published: (2024)
VoiceAgentEval: A Dual-Dimensional Benchmark for Expert-Level Intelligent Voice-Agent Evaluation of Xbench's Professional-Aligned Series
by: Xu, Pengyu, et al.
Published: (2025)
by: Xu, Pengyu, et al.
Published: (2025)
Real-Time Aligned Reward Model beyond Semantics
by: Huang, Zixuan, et al.
Published: (2026)
by: Huang, Zixuan, et al.
Published: (2026)
NeSy-Route: A Neuro-Symbolic Benchmark for Constrained Route Planning in Remote Sensing
by: Yang, Ming, et al.
Published: (2026)
by: Yang, Ming, et al.
Published: (2026)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
by: Chen, Zhenfang, et al.
Published: (2025)
by: Chen, Zhenfang, et al.
Published: (2025)
Redundant or Necessary? A Benchmark for Detecting Redundant Steps in Agent Trajectories
by: Hu, Minyang, et al.
Published: (2026)
by: Hu, Minyang, et al.
Published: (2026)
GUNDAM: Aligning Large Language Models with Graph Understanding
by: Ouyang, Sheng, et al.
Published: (2024)
by: Ouyang, Sheng, et al.
Published: (2024)
No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning
by: Li, Zhicong, et al.
Published: (2026)
by: Li, Zhicong, et al.
Published: (2026)
AMAP Agentic Planning Technical Report
by: AMAP AI Agent Team, et al.
Published: (2025)
by: AMAP AI Agent Team, et al.
Published: (2025)
RRO: LLM Agent Optimization Through Rising Reward Trajectories
by: Wang, Zilong, et al.
Published: (2025)
by: Wang, Zilong, et al.
Published: (2025)
A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents
by: Huang, Yuting, et al.
Published: (2025)
by: Huang, Yuting, et al.
Published: (2025)
Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents
by: Men, Tianyi, et al.
Published: (2025)
by: Men, Tianyi, et al.
Published: (2025)
ChinaTravel: An Open-Ended Travel Planning Benchmark with Compositional Constraint Validation for Language Agents
by: Shao, Jie-Jing, et al.
Published: (2024)
by: Shao, Jie-Jing, et al.
Published: (2024)
Towards Comprehensive Preference Data Collection for Reward Modeling
by: Hu, Yulan, et al.
Published: (2024)
by: Hu, Yulan, et al.
Published: (2024)
TROJail: Trajectory-Level Optimization for Multi-Turn Large Language Model Jailbreaks with Process Rewards
by: Xiong, Xiqiao, et al.
Published: (2025)
by: Xiong, Xiqiao, et al.
Published: (2025)
TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning
by: Li, Xiang, et al.
Published: (2024)
by: Li, Xiang, et al.
Published: (2024)
Reward Bound for Behavioral Guarantee of Model-based Planning Agents
by: An, Zhiyu, et al.
Published: (2024)
by: An, Zhiyu, et al.
Published: (2024)
AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion
by: Jiang, Tianyue, et al.
Published: (2026)
by: Jiang, Tianyue, et al.
Published: (2026)
StructVRM: Aligning Multimodal Reasoning with Structured and Verifiable Reward Models
by: Zhang, Xiangxiang, et al.
Published: (2025)
by: Zhang, Xiangxiang, et al.
Published: (2025)
Intra-Trajectory Consistency for Reward Modeling
by: Zhou, Chaoyang, et al.
Published: (2025)
by: Zhou, Chaoyang, et al.
Published: (2025)
GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents
by: Wu, Xiongbin, et al.
Published: (2026)
by: Wu, Xiongbin, et al.
Published: (2026)
ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis
by: Li, Yu, et al.
Published: (2026)
by: Li, Yu, et al.
Published: (2026)
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
by: Zhang, Miaosen, et al.
Published: (2024)
by: Zhang, Miaosen, et al.
Published: (2024)
MTRec: Learning to Align with User Preferences via Mental Reward Models
by: Zhao, Mengchen, et al.
Published: (2025)
by: Zhao, Mengchen, et al.
Published: (2025)
Towards Reward Fairness in RLHF: From a Resource Allocation Perspective
by: Ouyang, Sheng, et al.
Published: (2025)
by: Ouyang, Sheng, et al.
Published: (2025)
ALaRM: Align Language Models via Hierarchical Rewards Modeling
by: Lai, Yuhang, et al.
Published: (2024)
by: Lai, Yuhang, et al.
Published: (2024)
Reinforced Imitative Trajectory Planning for Urban Automated Driving
by: Zeng, Di, et al.
Published: (2024)
by: Zeng, Di, et al.
Published: (2024)
GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning
by: Li, Wenjin, et al.
Published: (2026)
by: Li, Wenjin, et al.
Published: (2026)
GTPO and GRPO-S: Token and Sequence-Level Reward Shaping with Policy Entropy
by: Tan, Hongze, et al.
Published: (2025)
by: Tan, Hongze, et al.
Published: (2025)
RED: Unleashing Token-Level Rewards from Holistic Feedback via Reward Redistribution
by: Li, Jiahui, et al.
Published: (2024)
by: Li, Jiahui, et al.
Published: (2024)
AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable Diffusion Model
by: Dong, Zibin, et al.
Published: (2023)
by: Dong, Zibin, et al.
Published: (2023)
Human-like Cognitive Generalization for Large Models via Brain-in-the-loop Supervision
by: Chen, Jiaxuan, et al.
Published: (2025)
by: Chen, Jiaxuan, et al.
Published: (2025)
StoryAlign: Evaluating and Training Reward Models for Story Generation
by: Xia, Haotian, et al.
Published: (2026)
by: Xia, Haotian, et al.
Published: (2026)
WebUncertainty: Dual-Level Uncertainty Driven Planning and Reasoning For Autonomous Web Agent
by: Zhang, Lingfeng, et al.
Published: (2026)
by: Zhang, Lingfeng, et al.
Published: (2026)
Aligning Individual and Collective Objectives in Multi-Agent Cooperation
by: Li, Yang, et al.
Published: (2024)
by: Li, Yang, et al.
Published: (2024)
Similar Items
-
TRACE: Distilling Where It Matters via Token-Routed Self On-Policy Alignment
by: Wang, Jiaxuan, et al.
Published: (2026) -
Multi-Stakeholder LLM Alignment: Decomposing Estimation from Aggregation
by: Zheng, Lulu, et al.
Published: (2026) -
Beyond Itinerary Planning-A Real-World Benchmark for Multi-Turn and Tool-Using Travel Tasks
by: Cheng, Xiang, et al.
Published: (2025) -
Aligning Progress and Feasibility: A Neuro-Symbolic Dual Memory Framework for Long-Horizon LLM Agents
by: Wen, Bin, et al.
Published: (2026) -
Coarse-to-Fine Process Reward Modeling for Mathematical Reasoning
by: Hu, Yulan, et al.
Published: (2025)