Saved in:
| Main Authors: | Shu, Jiangming, Zhang, Yuxiang, Ma, Ye, Lin, Xueyuan, Sang, Jitao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.09203 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks
by: Zhang, Yuxiang, et al.
Published: (2025)
by: Zhang, Yuxiang, et al.
Published: (2025)
Agent models: Internalizing Chain-of-Action Generation into Reasoning models
by: Zhang, Yuxiang, et al.
Published: (2025)
by: Zhang, Yuxiang, et al.
Published: (2025)
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
by: Zhang, Yuxiang, et al.
Published: (2024)
by: Zhang, Yuxiang, et al.
Published: (2024)
o1-Coder: an o1 Replication for Coding
by: Zhang, Yuxiang, et al.
Published: (2024)
by: Zhang, Yuxiang, et al.
Published: (2024)
Named Entity Recognition in COVID-19 tweets with Entity Knowledge Augmentation
by: Zhang, Xuankang, et al.
Published: (2025)
by: Zhang, Xuankang, et al.
Published: (2025)
CSPO: Alleviating Reward Ambiguity for Structured Table-to-LaTeX Generation
by: Yang, Yunfan, et al.
Published: (2026)
by: Yang, Yunfan, et al.
Published: (2026)
Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning
by: Zhu, Jiachen, et al.
Published: (2025)
by: Zhu, Jiachen, et al.
Published: (2025)
KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge Graph-based False Premise Questions
by: Zhu, Yanxu, et al.
Published: (2024)
by: Zhu, Yanxu, et al.
Published: (2024)
A Disguised Wolf Is More Harmful Than a Toothless Tiger: Adaptive Malicious Code Injection Backdoor Attack Leveraging User Behavior as Triggers
by: Wu, Shangxi, et al.
Published: (2024)
by: Wu, Shangxi, et al.
Published: (2024)
GUITestScape: Towards Open-set Evaluation on Exploratory GUI Testing
by: Chen, Xiaoyi, et al.
Published: (2026)
by: Chen, Xiaoyi, et al.
Published: (2026)
WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis
by: Gao, Yifei, et al.
Published: (2025)
by: Gao, Yifei, et al.
Published: (2025)
NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models
by: Zhang, Jiaming, et al.
Published: (2025)
by: Zhang, Jiaming, et al.
Published: (2025)
Self-Guided Defense: Adaptive Safety Alignment for Reasoning Models via Synthesized Guidelines
by: Wang, Yuhang, et al.
Published: (2025)
by: Wang, Yuhang, et al.
Published: (2025)
AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models
by: Zhang, Jiaming, et al.
Published: (2024)
by: Zhang, Jiaming, et al.
Published: (2024)
GUITester: Enabling GUI Agents for Exploratory Defect Discovery
by: Gao, Yifei, et al.
Published: (2026)
by: Gao, Yifei, et al.
Published: (2026)
ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents
by: Li, Dawei, et al.
Published: (2026)
by: Li, Dawei, et al.
Published: (2026)
Exploring the Privacy Protection Capabilities of Chinese Large Language Models
by: Yang, Yuqi, et al.
Published: (2024)
by: Yang, Yuqi, et al.
Published: (2024)
Reasoning Shapes Alignment: Investigating Cultural Alignment in Large Reasoning Models with Cultural Norms
by: Wang, Yuhang, et al.
Published: (2025)
by: Wang, Yuhang, et al.
Published: (2025)
How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation
by: Zhu, Lixi, et al.
Published: (2024)
by: Zhu, Lixi, et al.
Published: (2024)
ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation
by: Jia, Haitao, et al.
Published: (2025)
by: Jia, Haitao, et al.
Published: (2025)
HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation
by: Wu, Peilin, et al.
Published: (2025)
by: Wu, Peilin, et al.
Published: (2025)
Evaluation of Retrieval-Augmented Generation: A Survey
by: Yu, Hao, et al.
Published: (2024)
by: Yu, Hao, et al.
Published: (2024)
Deepchecks: Evaluating Retrieval-Augmented Generation (RAG)
by: Gerner, Assaf, et al.
Published: (2026)
by: Gerner, Assaf, et al.
Published: (2026)
Inference-Time Rule Eraser: Fair Recognition via Distilling and Removing Biased Rules
by: Zhang, Yi, et al.
Published: (2024)
by: Zhang, Yi, et al.
Published: (2024)
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
by: Lù, Xing Han, et al.
Published: (2025)
by: Lù, Xing Han, et al.
Published: (2025)
Hybrid Differential Reward: Combining Temporal Difference and Action Gradients for Efficient Multi-Agent Reinforcement Learning in Cooperative Driving
by: Han, Ye, et al.
Published: (2025)
by: Han, Ye, et al.
Published: (2025)
ITDR: An Instruction Tuning Dataset for Enhancing Large Language Models in Recommendations
by: Liu, Zekun, et al.
Published: (2025)
by: Liu, Zekun, et al.
Published: (2025)
Unifying Perplexing Behaviors in Modified BP Attributions through Alignment Perspective
by: Zheng, Guanhua, et al.
Published: (2025)
by: Zheng, Guanhua, et al.
Published: (2025)
DICE: Discrete Interpretable Comparative Evaluation with Probabilistic Scoring for Retrieval-Augmented Generation
by: Liu, Shiyan, et al.
Published: (2025)
by: Liu, Shiyan, et al.
Published: (2025)
RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents
by: Atinafu, Yonas, et al.
Published: (2026)
by: Atinafu, Yonas, et al.
Published: (2026)
A LLM-based Controllable, Scalable, Human-Involved User Simulator Framework for Conversational Recommender Systems
by: Zhu, Lixi, et al.
Published: (2024)
by: Zhu, Lixi, et al.
Published: (2024)
RPM-MCTS: Knowledge-Retrieval as Process Reward Model with Monte Carlo Tree Search for Code Generation
by: Lin, Yuanyuan, et al.
Published: (2025)
by: Lin, Yuanyuan, et al.
Published: (2025)
Evaluating Retrieval-Augmented Generation Agents for Autonomous Scientific Discovery in Astrophysics
by: Xu, Xueqing, et al.
Published: (2025)
by: Xu, Xueqing, et al.
Published: (2025)
Privacy in Action: Towards Realistic Privacy Mitigation and Evaluation for LLM-Powered Agents
by: Wang, Shouju, et al.
Published: (2025)
by: Wang, Shouju, et al.
Published: (2025)
ReasonSTL: Bridging Natural Language and Signal Temporal Logic via Tool-Augmented Process-Rewarded Learning
by: Ye, Bowen, et al.
Published: (2026)
by: Ye, Bowen, et al.
Published: (2026)
StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error
by: Yang, Shu-Xun, et al.
Published: (2025)
by: Yang, Shu-Xun, et al.
Published: (2025)
Retrieval Augmented Generation (RAG) for Fintech: Agentic Design and Evaluation
by: Cook, Thomas, et al.
Published: (2025)
by: Cook, Thomas, et al.
Published: (2025)
Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models
by: Zhou, Xin, et al.
Published: (2025)
by: Zhou, Xin, et al.
Published: (2025)
Adaptive Federated Distillation for Multi-Domain Non-IID Textual Data
by: Xiao, Jiahao, et al.
Published: (2025)
by: Xiao, Jiahao, et al.
Published: (2025)
AJ-Bench: Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation
by: Shi, Wentao, et al.
Published: (2026)
by: Shi, Wentao, et al.
Published: (2026)
Similar Items
-
Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks
by: Zhang, Yuxiang, et al.
Published: (2025) -
Agent models: Internalizing Chain-of-Action Generation into Reasoning models
by: Zhang, Yuxiang, et al.
Published: (2025) -
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
by: Zhang, Yuxiang, et al.
Published: (2024) -
o1-Coder: an o1 Replication for Coding
by: Zhang, Yuxiang, et al.
Published: (2024) -
Named Entity Recognition in COVID-19 tweets with Entity Knowledge Augmentation
by: Zhang, Xuankang, et al.
Published: (2025)