Saved in:
| Main Authors: | Hopman, Mia, Elstner, Jannes, Avramidou, Maria, Prasad, Amritanshu, Lindner, David |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.01608 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Consistency Training while Mitigating Obfuscation via Rate Matching
by: Imran, Sohaib, et al.
Published: (2026)
by: Imran, Sohaib, et al.
Published: (2026)
The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence
by: Wollschläger, Tom, et al.
Published: (2025)
by: Wollschläger, Tom, et al.
Published: (2025)
Combining Cost-Constrained Runtime Monitors for AI Safety
by: Hua, Tim Tian, et al.
Published: (2025)
by: Hua, Tim Tian, et al.
Published: (2025)
Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors
by: Wiedermann-Möller, Jonas, et al.
Published: (2026)
by: Wiedermann-Möller, Jonas, et al.
Published: (2026)
Differential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure
by: Yildirim, Caglar
Published: (2026)
by: Yildirim, Caglar
Published: (2026)
Towards Understanding Specification Gaming in Reasoning Models
by: Nishimura-Gasparian, Kei, et al.
Published: (2026)
by: Nishimura-Gasparian, Kei, et al.
Published: (2026)
Propensity Inference: Environmental Contributors to LLM Behaviour
by: Järviniemi, Olli, et al.
Published: (2026)
by: Järviniemi, Olli, et al.
Published: (2026)
Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena
by: Chen, Jiangjie, et al.
Published: (2023)
by: Chen, Jiangjie, et al.
Published: (2023)
PowerChain: A Verifiable Agentic AI System for Automating Distribution Grid Analyses
by: Badmus, Emmanuel O., et al.
Published: (2025)
by: Badmus, Emmanuel O., et al.
Published: (2025)
Do LLM Agents Know How to Ground, Recover, and Assess? A Benchmark for Epistemic Competence in Information-Seeking Agents
by: Shao, Jiaqi, et al.
Published: (2025)
by: Shao, Jiaqi, et al.
Published: (2025)
AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents
by: Naik, Akshat, et al.
Published: (2025)
by: Naik, Akshat, et al.
Published: (2025)
LifelongAgentBench: Evaluating LLM Agents as Lifelong Learners
by: Zheng, Junhao, et al.
Published: (2025)
by: Zheng, Junhao, et al.
Published: (2025)
PolicyBank: Evolving Policy Understanding for LLM Agents
by: Choi, Jihye, et al.
Published: (2026)
by: Choi, Jihye, et al.
Published: (2026)
AgentAuditor: Human-Level Safety and Security Evaluation for LLM Agents
by: Luo, Hanjun, et al.
Published: (2025)
by: Luo, Hanjun, et al.
Published: (2025)
Evaluating the Propensity of Generative AI for Producing Harmful Disinformation During the 2024 US Election Cycle
by: Schlicht, Erik J
Published: (2024)
by: Schlicht, Erik J
Published: (2024)
Understanding Agent Scaling in LLM-Based Multi-Agent Systems via Diversity
by: Yang, Yingxuan, et al.
Published: (2026)
by: Yang, Yingxuan, et al.
Published: (2026)
Discovery of False Data Injection Schemes on Frequency Controllers with Reinforcement Learning
by: Prasad, Romesh, et al.
Published: (2024)
by: Prasad, Romesh, et al.
Published: (2024)
RADAR: Mechanistic Pathways for Detecting Data Contamination in LLM Evaluation
by: Kattamuri, Ashish, et al.
Published: (2025)
by: Kattamuri, Ashish, et al.
Published: (2025)
RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents
by: Atinafu, Yonas, et al.
Published: (2026)
by: Atinafu, Yonas, et al.
Published: (2026)
OmniPatch: A Universal Adversarial Patch for ViT-CNN Cross-Architecture Transfer in Semantic Segmentation
by: Aggarwal, Aarush, et al.
Published: (2026)
by: Aggarwal, Aarush, et al.
Published: (2026)
Quantifying the Necessity of Chain of Thought through Opaque Serial Depth
by: Brown-Cohen, Jonah, et al.
Published: (2026)
by: Brown-Cohen, Jonah, et al.
Published: (2026)
AgentEscapeBench: Evaluating Out-of-Domain Tool-Grounded Reasoning in LLM Agents
by: Guo, Zhengkang, et al.
Published: (2026)
by: Guo, Zhengkang, et al.
Published: (2026)
Scheming Ability in LLM-to-LLM Strategic Interactions
by: Pham, Thao
Published: (2025)
by: Pham, Thao
Published: (2025)
The Necessity of a Unified Framework for LLM-Based Agent Evaluation
by: Zhu, Pengyu, et al.
Published: (2026)
by: Zhu, Pengyu, et al.
Published: (2026)
Memory for Autonomous LLM Agents:Mechanisms, Evaluation, and Emerging Frontiers
by: Du, Pengfei
Published: (2026)
by: Du, Pengfei
Published: (2026)
The Propensity for Density in Feed-forward Models
by: Schoots, Nandi, et al.
Published: (2024)
by: Schoots, Nandi, et al.
Published: (2024)
SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents
by: Nandi, Subhrangshu, et al.
Published: (2025)
by: Nandi, Subhrangshu, et al.
Published: (2025)
Evaluation and Benchmarking of LLM Agents: A Survey
by: Mohammadi, Mahmoud, et al.
Published: (2025)
by: Mohammadi, Mahmoud, et al.
Published: (2025)
MIRAI: Evaluating LLM Agents for Event Forecasting
by: Ye, Chenchen, et al.
Published: (2024)
by: Ye, Chenchen, et al.
Published: (2024)
Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability
by: Jiang, Xinyan, et al.
Published: (2026)
by: Jiang, Xinyan, et al.
Published: (2026)
Agent-in-the-Loop: A Data Flywheel for Continuous Improvement in LLM-based Customer Support
by: Zhao, Cen Mia, et al.
Published: (2025)
by: Zhao, Cen Mia, et al.
Published: (2025)
Quotient DAGs for Off-Policy Evaluation:Forward-Flow Importance Sampling and Exact Slate Propensities
by: Xie, Ziwen, et al.
Published: (2026)
by: Xie, Ziwen, et al.
Published: (2026)
Beyond Demand Estimation: Consumer Surplus Evaluation via Cumulative Propensity Weights
by: Bian, Zeyu, et al.
Published: (2026)
by: Bian, Zeyu, et al.
Published: (2026)
OLAF: Towards Robust LLM-Based Annotation Framework in Empirical Software Engineering
by: Imran, Mia Mohammad, et al.
Published: (2025)
by: Imran, Mia Mohammad, et al.
Published: (2025)
Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems
by: Priyanshu, Aman, et al.
Published: (2026)
by: Priyanshu, Aman, et al.
Published: (2026)
Gram: Assessing sabotage propensities via automated alignment auditing
by: Lindner, David, et al.
Published: (2026)
by: Lindner, David, et al.
Published: (2026)
Towards Understanding the Robustness of LLM-based Evaluations under Perturbations
by: Chaudhary, Manav, et al.
Published: (2024)
by: Chaudhary, Manav, et al.
Published: (2024)
LLM-BABYBENCH: Understanding and Evaluating Grounded Planning and Reasoning in LLMs
by: Choukrani, Omar, et al.
Published: (2025)
by: Choukrani, Omar, et al.
Published: (2025)
Survey on Evaluation of LLM-based Agents
by: Yehudai, Asaf, et al.
Published: (2025)
by: Yehudai, Asaf, et al.
Published: (2025)
ArgMed-Agents: Explainable Clinical Decision Reasoning with LLM Disscusion via Argumentation Schemes
by: Hong, Shengxin, et al.
Published: (2024)
by: Hong, Shengxin, et al.
Published: (2024)
Similar Items
-
Consistency Training while Mitigating Obfuscation via Rate Matching
by: Imran, Sohaib, et al.
Published: (2026) -
The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence
by: Wollschläger, Tom, et al.
Published: (2025) -
Combining Cost-Constrained Runtime Monitors for AI Safety
by: Hua, Tim Tian, et al.
Published: (2025) -
Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors
by: Wiedermann-Möller, Jonas, et al.
Published: (2026) -
Differential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure
by: Yildirim, Caglar
Published: (2026)