Saved in:
| Main Authors: | Ward, Francis Rhys, MacDermott, Matt, Belardinelli, Francesco, Toni, Francesca, Everitt, Tom |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.07221 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Measuring Goal-Directedness
by: MacDermott, Matt, et al.
Published: (2024)
by: MacDermott, Matt, et al.
Published: (2024)
Reasoning Under Pressure: How do Training Incentives Influence Chain-of-Thought Monitorability?
by: MacDermott, Matt, et al.
Published: (2025)
by: MacDermott, Matt, et al.
Published: (2025)
Towards a Theory of AI Personhood
by: Ward, Francis Rhys
Published: (2025)
by: Ward, Francis Rhys
Published: (2025)
Can a Bayesian Oracle Prevent Harm from an Agent?
by: Bengio, Yoshua, et al.
Published: (2024)
by: Bengio, Yoshua, et al.
Published: (2024)
Password-Activated Shutdown Protocols for Misaligned Frontier Agents
by: Williams, Kai, et al.
Published: (2025)
by: Williams, Kai, et al.
Published: (2025)
The Limits of Predicting Agents from Behaviour
by: Bellot, Alexis, et al.
Published: (2025)
by: Bellot, Alexis, et al.
Published: (2025)
Whatever Happened to Frank and Fearless?
by: MacDermott, Kathy
Published: (2013)
by: MacDermott, Kathy
Published: (2013)
Higher-Order Belief in Incomplete Information MAIDs
by: Foxabbott, Jack, et al.
Published: (2025)
by: Foxabbott, Jack, et al.
Published: (2025)
Robust agents learn causal world models
by: Richens, Jonathan, et al.
Published: (2024)
by: Richens, Jonathan, et al.
Published: (2024)
Incentives for Responsiveness, Instrumental Control and Impact
by: Carey, Ryan, et al.
Published: (2020)
by: Carey, Ryan, et al.
Published: (2020)
Evaluating the Goal-Directedness of Large Language Models
by: Everitt, Tom, et al.
Published: (2025)
by: Everitt, Tom, et al.
Published: (2025)
Neuro-Argumentative Learning with Case-Based Reasoning
by: Gould, Adam, et al.
Published: (2025)
by: Gould, Adam, et al.
Published: (2025)
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
by: Bengio, Yoshua, et al.
Published: (2025)
by: Bengio, Yoshua, et al.
Published: (2025)
Argumentative Human-AI Decision-Making: Toward AI Agents That Reason With Us, Not For Us
by: Vasileiou, Stylianos Loukas, et al.
Published: (2026)
by: Vasileiou, Stylianos Loukas, et al.
Published: (2026)
Transparent Visual Reasoning via Object-Centric Agent Collaboration
by: Teoh, Benjamin, et al.
Published: (2025)
by: Teoh, Benjamin, et al.
Published: (2025)
Leveraging Approximate Model-based Shielding for Probabilistic Safety Guarantees in Continuous Environments
by: Goodall, Alexander W., et al.
Published: (2024)
by: Goodall, Alexander W., et al.
Published: (2024)
Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models
by: Goodall, Alexander W., et al.
Published: (2026)
by: Goodall, Alexander W., et al.
Published: (2026)
Expressive Temporal Specifications for Reward Monitoring
by: Adalat, Omar, et al.
Published: (2025)
by: Adalat, Omar, et al.
Published: (2025)
Retrieval- and Argumentation-Enhanced Multi-Agent LLMs for Judgmental Forecasting (Extended Version with Supplementary Material)
by: Gorur, Deniz, et al.
Published: (2025)
by: Gorur, Deniz, et al.
Published: (2025)
Supported Abstract Argumentation for Case-Based Reasoning
by: Gould, Adam, et al.
Published: (2025)
by: Gould, Adam, et al.
Published: (2025)
How does information access affect LLM monitors' ability to detect sabotage?
by: Arike, Rauno, et al.
Published: (2026)
by: Arike, Rauno, et al.
Published: (2026)
Approximate Model-Based Shielding for Safe Reinforcement Learning
by: Goodall, Alexander W., et al.
Published: (2023)
by: Goodall, Alexander W., et al.
Published: (2023)
ArgLLM-App: An Interactive System for Argumentative Reasoning with Large Language Models
by: Dejl, Adam, et al.
Published: (2026)
by: Dejl, Adam, et al.
Published: (2026)
SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning
by: Anisimov, Maksim, et al.
Published: (2026)
by: Anisimov, Maksim, et al.
Published: (2026)
General agents contain world models
by: Richens, Jonathan, et al.
Published: (2025)
by: Richens, Jonathan, et al.
Published: (2025)
Deep Arguing
by: Gould, Adam, et al.
Published: (2026)
by: Gould, Adam, et al.
Published: (2026)
Act2Goal: From World Model To General Goal-conditioned Policy
by: Zhou, Pengfei, et al.
Published: (2025)
by: Zhou, Pengfei, et al.
Published: (2025)
Object-Centric Case-Based Reasoning via Argumentation
by: Gaul, Gabriel de Olim, et al.
Published: (2025)
by: Gaul, Gabriel de Olim, et al.
Published: (2025)
Robust Counterfactual Explanations in Machine Learning: A Survey
by: Jiang, Junqi, et al.
Published: (2024)
by: Jiang, Junqi, et al.
Published: (2024)
Interval Abstractions for Robust Counterfactual Explanations
by: Jiang, Junqi, et al.
Published: (2024)
by: Jiang, Junqi, et al.
Published: (2024)
Recourse under Model Multiplicity via Argumentative Ensembling (Technical Report)
by: Jiang, Junqi, et al.
Published: (2023)
by: Jiang, Junqi, et al.
Published: (2023)
Gaze-based intention estimation: principles, methodologies, and applications in HRI
by: Belardinelli, Anna
Published: (2023)
by: Belardinelli, Anna
Published: (2023)
Preference-Based Abstract Argumentation for Case-Based Reasoning (with Appendix)
by: Gould, Adam, et al.
Published: (2024)
by: Gould, Adam, et al.
Published: (2024)
The Elicitation Game: Evaluating Capability Elicitation Techniques
by: Hofstätter, Felix, et al.
Published: (2025)
by: Hofstätter, Felix, et al.
Published: (2025)
Expressive Reward Synthesis with the Runtime Monitoring Language
by: Donnelly, Daniel, et al.
Published: (2025)
by: Donnelly, Daniel, et al.
Published: (2025)
Exploring the Potential for Large Language Models to Demonstrate Rational Probabilistic Beliefs
by: Freedman, Gabriel, et al.
Published: (2025)
by: Freedman, Gabriel, et al.
Published: (2025)
Argumentative Ensembling for Robust Recourse under Model Multiplicity
by: Jiang, Junqi, et al.
Published: (2025)
by: Jiang, Junqi, et al.
Published: (2025)
Quantum Mechanics from Symmetry
by: Hegstrom, Roger A., et al.
Published: (2022)
by: Hegstrom, Roger A., et al.
Published: (2022)
Shapley-PC: Constraint-based Causal Structure Learning with a Shapley Inspired Framework
by: Russo, Fabrizio, et al.
Published: (2023)
by: Russo, Fabrizio, et al.
Published: (2023)
MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning
by: Liu, Yuxin, et al.
Published: (2026)
by: Liu, Yuxin, et al.
Published: (2026)
Similar Items
-
Measuring Goal-Directedness
by: MacDermott, Matt, et al.
Published: (2024) -
Reasoning Under Pressure: How do Training Incentives Influence Chain-of-Thought Monitorability?
by: MacDermott, Matt, et al.
Published: (2025) -
Towards a Theory of AI Personhood
by: Ward, Francis Rhys
Published: (2025) -
Can a Bayesian Oracle Prevent Harm from an Agent?
by: Bengio, Yoshua, et al.
Published: (2024) -
Password-Activated Shutdown Protocols for Misaligned Frontier Agents
by: Williams, Kai, et al.
Published: (2025)