Saved in:
| Main Authors: | Gao, Yicheng, Zhou, Xiaolin, Li, Yahan, Zhao, Yue, Liu, Ruishan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.07058 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Fairness or Fluency? An Investigation into Language Bias of Pairwise LLM-as-a-Judge
by: Zhou, Xiaolin, et al.
Published: (2026)
by: Zhou, Xiaolin, et al.
Published: (2026)
MedQA-CS: Objective Structured Clinical Examination (OSCE)-Style Benchmark for Evaluating LLM Clinical Skills
by: Yao, Zonghai, et al.
Published: (2024)
by: Yao, Zonghai, et al.
Published: (2024)
From Biased Chatbots to Biased Agents: Examining Role Assignment Effects on LLM Agent Robustness
by: Cao, Linbo, et al.
Published: (2026)
by: Cao, Linbo, et al.
Published: (2026)
Learning to Ask: When LLM Agents Meet Unclear Instruction
by: Wang, Wenxuan, et al.
Published: (2024)
by: Wang, Wenxuan, et al.
Published: (2024)
MedAgentGym: A Scalable Agentic Training Environment for Code-Centric Reasoning in Biomedical Data Science
by: Xu, Ran, et al.
Published: (2025)
by: Xu, Ran, et al.
Published: (2025)
MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph
by: Zhang, Duzhen, et al.
Published: (2025)
by: Zhang, Duzhen, et al.
Published: (2025)
KLong: Training LLM Agent for Extremely Long-horizon Tasks
by: Liu, Yue, et al.
Published: (2026)
by: Liu, Yue, et al.
Published: (2026)
From Helpfulness to Toxic Proactivity: Diagnosing Behavioral Misalignment in LLM Agents
by: Wang, Xinyue, et al.
Published: (2026)
by: Wang, Xinyue, et al.
Published: (2026)
Ask-before-Plan: Proactive Language Agents for Real-World Planning
by: Zhang, Xuan, et al.
Published: (2024)
by: Zhang, Xuan, et al.
Published: (2024)
Training Proactive and Personalized LLM Agents
by: Sun, Weiwei, et al.
Published: (2025)
by: Sun, Weiwei, et al.
Published: (2025)
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
by: Tang, Xiangru, et al.
Published: (2025)
by: Tang, Xiangru, et al.
Published: (2025)
Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards
by: Da, Jeff, et al.
Published: (2025)
by: Da, Jeff, et al.
Published: (2025)
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
by: Su, Zhe, et al.
Published: (2024)
by: Su, Zhe, et al.
Published: (2024)
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
by: Xi, Zhiheng, et al.
Published: (2024)
by: Xi, Zhiheng, et al.
Published: (2024)
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
by: Tang, Xiangru, et al.
Published: (2023)
by: Tang, Xiangru, et al.
Published: (2023)
TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration
by: Ma, Zerun, et al.
Published: (2026)
by: Ma, Zerun, et al.
Published: (2026)
Training Versatile Coding Agents in Synthetic Environments
by: Zhu, Yiqi, et al.
Published: (2025)
by: Zhu, Yiqi, et al.
Published: (2025)
COMAP: Co-Evolving World Models and Agent Policies for LLM Agents
by: Liu, Youwei, et al.
Published: (2026)
by: Liu, Youwei, et al.
Published: (2026)
ExpeL: LLM Agents Are Experiential Learners
by: Zhao, Andrew, et al.
Published: (2023)
by: Zhao, Andrew, et al.
Published: (2023)
Beyond Numeric Rewards: In-Context Dueling Bandits with LLM Agents
by: Xia, Fanzeng, et al.
Published: (2024)
by: Xia, Fanzeng, et al.
Published: (2024)
Medchain: Bridging the Gap Between LLM Agents and Clinical Practice with Interactive Sequence
by: Liu, Jie, et al.
Published: (2024)
by: Liu, Jie, et al.
Published: (2024)
Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training
by: Liu, Yixin, et al.
Published: (2026)
by: Liu, Yixin, et al.
Published: (2026)
Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People
by: Grand, Gabriel, et al.
Published: (2025)
by: Grand, Gabriel, et al.
Published: (2025)
DSGBench: A Diverse Strategic Game Benchmark for Evaluating LLM-based Agents in Complex Decision-Making Environments
by: Tang, Wenjie, et al.
Published: (2025)
by: Tang, Wenjie, et al.
Published: (2025)
AgentAsk: Multi-Agent Systems Need to Ask
by: Lin, Bohan, et al.
Published: (2025)
by: Lin, Bohan, et al.
Published: (2025)
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
by: Dong, Guanting, et al.
Published: (2026)
by: Dong, Guanting, et al.
Published: (2026)
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World
by: Yan, Weixiang, et al.
Published: (2024)
by: Yan, Weixiang, et al.
Published: (2024)
Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation
by: He, Lanshan, et al.
Published: (2026)
by: He, Lanshan, et al.
Published: (2026)
AgentCollabBench: Diagnosing When Good Agents Make Bad Collaborators
by: Mazumder, Aritra, et al.
Published: (2026)
by: Mazumder, Aritra, et al.
Published: (2026)
Stance Detection with Collaborative Role-Infused LLM-Based Agents
by: Lan, Xiaochong, et al.
Published: (2023)
by: Lan, Xiaochong, et al.
Published: (2023)
ClawEnvKit: Automatic Environment Generation for Claw-Like Agents
by: Li, Xirui, et al.
Published: (2026)
by: Li, Xirui, et al.
Published: (2026)
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
by: Yuan, Tongxin, et al.
Published: (2024)
by: Yuan, Tongxin, et al.
Published: (2024)
Diagnosing Training Inference Mismatch in LLM Reinforcement Learning
by: Zhong, Tianle, et al.
Published: (2026)
by: Zhong, Tianle, et al.
Published: (2026)
Terminal-World: Scaling Terminal-Agent Environments via Agent Skills
by: Cheng, Zihao, et al.
Published: (2026)
by: Cheng, Zihao, et al.
Published: (2026)
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
by: Song, Yueqi, et al.
Published: (2025)
by: Song, Yueqi, et al.
Published: (2025)
MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks
by: Zhu, Yinghao, et al.
Published: (2025)
by: Zhu, Yinghao, et al.
Published: (2025)
Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents
by: Yang, Zonghan, et al.
Published: (2025)
by: Yang, Zonghan, et al.
Published: (2025)
Long-term Task-oriented Agent: Proactive Long-term Intent Maintenance in Dynamic Environments
by: Shi, Qinglong, et al.
Published: (2026)
by: Shi, Qinglong, et al.
Published: (2026)
On the Structural Memory of LLM Agents
by: Zeng, Ruihong, et al.
Published: (2024)
by: Zeng, Ruihong, et al.
Published: (2024)
Scalable Environments Drive Generalizable Agents
by: Zhang, Jiayi, et al.
Published: (2026)
by: Zhang, Jiayi, et al.
Published: (2026)
Similar Items
-
Fairness or Fluency? An Investigation into Language Bias of Pairwise LLM-as-a-Judge
by: Zhou, Xiaolin, et al.
Published: (2026) -
MedQA-CS: Objective Structured Clinical Examination (OSCE)-Style Benchmark for Evaluating LLM Clinical Skills
by: Yao, Zonghai, et al.
Published: (2024) -
From Biased Chatbots to Biased Agents: Examining Role Assignment Effects on LLM Agent Robustness
by: Cao, Linbo, et al.
Published: (2026) -
Learning to Ask: When LLM Agents Meet Unclear Instruction
by: Wang, Wenxuan, et al.
Published: (2024) -
MedAgentGym: A Scalable Agentic Training Environment for Code-Centric Reasoning in Biomedical Data Science
by: Xu, Ran, et al.
Published: (2025)