:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ward, Francis Rhys, MacDermott, Matt, Belardinelli, Francesco, Toni, Francesca, Everitt, Tom
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2402.07221
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Measuring Goal-Directedness
by: MacDermott, Matt, et al.
Published: (2024)

Reasoning Under Pressure: How do Training Incentives Influence Chain-of-Thought Monitorability?
by: MacDermott, Matt, et al.
Published: (2025)

Towards a Theory of AI Personhood
by: Ward, Francis Rhys
Published: (2025)

Can a Bayesian Oracle Prevent Harm from an Agent?
by: Bengio, Yoshua, et al.
Published: (2024)

Password-Activated Shutdown Protocols for Misaligned Frontier Agents
by: Williams, Kai, et al.
Published: (2025)

The Limits of Predicting Agents from Behaviour
by: Bellot, Alexis, et al.
Published: (2025)

Whatever Happened to Frank and Fearless?
by: MacDermott, Kathy
Published: (2013)

Higher-Order Belief in Incomplete Information MAIDs
by: Foxabbott, Jack, et al.
Published: (2025)

Robust agents learn causal world models
by: Richens, Jonathan, et al.
Published: (2024)

Incentives for Responsiveness, Instrumental Control and Impact
by: Carey, Ryan, et al.
Published: (2020)

Evaluating the Goal-Directedness of Large Language Models
by: Everitt, Tom, et al.
Published: (2025)

Neuro-Argumentative Learning with Case-Based Reasoning
by: Gould, Adam, et al.
Published: (2025)

Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
by: Bengio, Yoshua, et al.
Published: (2025)

Argumentative Human-AI Decision-Making: Toward AI Agents That Reason With Us, Not For Us
by: Vasileiou, Stylianos Loukas, et al.
Published: (2026)

Transparent Visual Reasoning via Object-Centric Agent Collaboration
by: Teoh, Benjamin, et al.
Published: (2025)

Leveraging Approximate Model-based Shielding for Probabilistic Safety Guarantees in Continuous Environments
by: Goodall, Alexander W., et al.
Published: (2024)

Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models
by: Goodall, Alexander W., et al.
Published: (2026)

Expressive Temporal Specifications for Reward Monitoring
by: Adalat, Omar, et al.
Published: (2025)

Retrieval- and Argumentation-Enhanced Multi-Agent LLMs for Judgmental Forecasting (Extended Version with Supplementary Material)
by: Gorur, Deniz, et al.
Published: (2025)

Supported Abstract Argumentation for Case-Based Reasoning
by: Gould, Adam, et al.
Published: (2025)

How does information access affect LLM monitors' ability to detect sabotage?
by: Arike, Rauno, et al.
Published: (2026)

Approximate Model-Based Shielding for Safe Reinforcement Learning
by: Goodall, Alexander W., et al.
Published: (2023)

ArgLLM-App: An Interactive System for Argumentative Reasoning with Large Language Models
by: Dejl, Adam, et al.
Published: (2026)

SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning
by: Anisimov, Maksim, et al.
Published: (2026)

General agents contain world models
by: Richens, Jonathan, et al.
Published: (2025)

Deep Arguing
by: Gould, Adam, et al.
Published: (2026)

Act2Goal: From World Model To General Goal-conditioned Policy
by: Zhou, Pengfei, et al.
Published: (2025)

Object-Centric Case-Based Reasoning via Argumentation
by: Gaul, Gabriel de Olim, et al.
Published: (2025)

Robust Counterfactual Explanations in Machine Learning: A Survey
by: Jiang, Junqi, et al.
Published: (2024)

Interval Abstractions for Robust Counterfactual Explanations
by: Jiang, Junqi, et al.
Published: (2024)

Recourse under Model Multiplicity via Argumentative Ensembling (Technical Report)
by: Jiang, Junqi, et al.
Published: (2023)

Gaze-based intention estimation: principles, methodologies, and applications in HRI
by: Belardinelli, Anna
Published: (2023)

Preference-Based Abstract Argumentation for Case-Based Reasoning (with Appendix)
by: Gould, Adam, et al.
Published: (2024)

The Elicitation Game: Evaluating Capability Elicitation Techniques
by: Hofstätter, Felix, et al.
Published: (2025)

Expressive Reward Synthesis with the Runtime Monitoring Language
by: Donnelly, Daniel, et al.
Published: (2025)

Exploring the Potential for Large Language Models to Demonstrate Rational Probabilistic Beliefs
by: Freedman, Gabriel, et al.
Published: (2025)

Argumentative Ensembling for Robust Recourse under Model Multiplicity
by: Jiang, Junqi, et al.
Published: (2025)

Quantum Mechanics from Symmetry
by: Hegstrom, Roger A., et al.
Published: (2022)

Shapley-PC: Constraint-based Causal Structure Learning with a Shapley Inspired Framework
by: Russo, Fabrizio, et al.
Published: (2023)

MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning
by: Liu, Yuxin, et al.
Published: (2026)