:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shu, Jiangming, Zhang, Yuxiang, Ma, Ye, Lin, Xueyuan, Sang, Jitao
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.09203
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks
by: Zhang, Yuxiang, et al.
Published: (2025)

Agent models: Internalizing Chain-of-Action Generation into Reasoning models
by: Zhang, Yuxiang, et al.
Published: (2025)

OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
by: Zhang, Yuxiang, et al.
Published: (2024)

o1-Coder: an o1 Replication for Coding
by: Zhang, Yuxiang, et al.
Published: (2024)

Named Entity Recognition in COVID-19 tweets with Entity Knowledge Augmentation
by: Zhang, Xuankang, et al.
Published: (2025)

CSPO: Alleviating Reward Ambiguity for Structured Table-to-LaTeX Generation
by: Yang, Yunfan, et al.
Published: (2026)

Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning
by: Zhu, Jiachen, et al.
Published: (2025)

KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge Graph-based False Premise Questions
by: Zhu, Yanxu, et al.
Published: (2024)

A Disguised Wolf Is More Harmful Than a Toothless Tiger: Adaptive Malicious Code Injection Backdoor Attack Leveraging User Behavior as Triggers
by: Wu, Shangxi, et al.
Published: (2024)

GUITestScape: Towards Open-set Evaluation on Exploratory GUI Testing
by: Chen, Xiaoyi, et al.
Published: (2026)

WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis
by: Gao, Yifei, et al.
Published: (2025)

NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models
by: Zhang, Jiaming, et al.
Published: (2025)

Self-Guided Defense: Adaptive Safety Alignment for Reasoning Models via Synthesized Guidelines
by: Wang, Yuhang, et al.
Published: (2025)

AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models
by: Zhang, Jiaming, et al.
Published: (2024)

GUITester: Enabling GUI Agents for Exploratory Defect Discovery
by: Gao, Yifei, et al.
Published: (2026)

ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents
by: Li, Dawei, et al.
Published: (2026)

Exploring the Privacy Protection Capabilities of Chinese Large Language Models
by: Yang, Yuqi, et al.
Published: (2024)

Reasoning Shapes Alignment: Investigating Cultural Alignment in Large Reasoning Models with Cultural Norms
by: Wang, Yuhang, et al.
Published: (2025)

How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation
by: Zhu, Lixi, et al.
Published: (2024)

ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation
by: Jia, Haitao, et al.
Published: (2025)

HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation
by: Wu, Peilin, et al.
Published: (2025)

Evaluation of Retrieval-Augmented Generation: A Survey
by: Yu, Hao, et al.
Published: (2024)

Deepchecks: Evaluating Retrieval-Augmented Generation (RAG)
by: Gerner, Assaf, et al.
Published: (2026)

Inference-Time Rule Eraser: Fair Recognition via Distilling and Removing Biased Rules
by: Zhang, Yi, et al.
Published: (2024)

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
by: Lù, Xing Han, et al.
Published: (2025)

Hybrid Differential Reward: Combining Temporal Difference and Action Gradients for Efficient Multi-Agent Reinforcement Learning in Cooperative Driving
by: Han, Ye, et al.
Published: (2025)

ITDR: An Instruction Tuning Dataset for Enhancing Large Language Models in Recommendations
by: Liu, Zekun, et al.
Published: (2025)

Unifying Perplexing Behaviors in Modified BP Attributions through Alignment Perspective
by: Zheng, Guanhua, et al.
Published: (2025)

DICE: Discrete Interpretable Comparative Evaluation with Probabilistic Scoring for Retrieval-Augmented Generation
by: Liu, Shiyan, et al.
Published: (2025)

RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents
by: Atinafu, Yonas, et al.
Published: (2026)

A LLM-based Controllable, Scalable, Human-Involved User Simulator Framework for Conversational Recommender Systems
by: Zhu, Lixi, et al.
Published: (2024)

RPM-MCTS: Knowledge-Retrieval as Process Reward Model with Monte Carlo Tree Search for Code Generation
by: Lin, Yuanyuan, et al.
Published: (2025)

Evaluating Retrieval-Augmented Generation Agents for Autonomous Scientific Discovery in Astrophysics
by: Xu, Xueqing, et al.
Published: (2025)

Privacy in Action: Towards Realistic Privacy Mitigation and Evaluation for LLM-Powered Agents
by: Wang, Shouju, et al.
Published: (2025)

ReasonSTL: Bridging Natural Language and Signal Temporal Logic via Tool-Augmented Process-Rewarded Learning
by: Ye, Bowen, et al.
Published: (2026)

StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error
by: Yang, Shu-Xun, et al.
Published: (2025)

Retrieval Augmented Generation (RAG) for Fintech: Agentic Design and Evaluation
by: Cook, Thomas, et al.
Published: (2025)

Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models
by: Zhou, Xin, et al.
Published: (2025)

Adaptive Federated Distillation for Multi-Domain Non-IID Textual Data
by: Xiao, Jiahao, et al.
Published: (2025)

AJ-Bench: Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation
by: Shi, Wentao, et al.
Published: (2026)