:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Grey, Markov, Segerie, Charbel-Raphaël
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2505.05541
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

The AI Risk Spectrum: From Dangerous Capabilities to Existential Threats
by: Grey, Markov, et al.
Published: (2025)

BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards
by: Dorn, Diego, et al.
Published: (2024)

The bitter lesson of misuse detection
by: Mariaccia, Hadrien, et al.
Published: (2025)

Continuous Time Continuous Space Homeostatic Reinforcement Learning (CTCS-HRRL) : Towards Biological Self-Autonomous Agent
by: Laurencon, Hugo, et al.
Published: (2024)

Unpacking Human-AI Interaction in Safety-Critical Industries: A Systematic Literature Review
by: Bach, Tita A., et al.
Published: (2023)

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
by: Röttger, Paul, et al.
Published: (2024)

Mechanistic Interpretability for AI Safety -- A Review
by: Bereska, Leonard, et al.
Published: (2024)

Responsible AI in Construction Safety: Systematic Evaluation of Large Language Models and Prompt Engineering
by: Sammour, Farouq, et al.
Published: (2024)

SteeringSafety: A Systematic Safety Evaluation Framework of Representation Steering in LLMs
by: Siu, Vincent, et al.
Published: (2025)

International Agreements on AI Safety: Review and Recommendations for a Conditional AI Safety Treaty
by: Scholefield, Rebecca, et al.
Published: (2025)

OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)

SAGE-Eval: Evaluating LLMs for Systematic Generalizations of Safety Facts
by: Yueh-Han, Chen, et al.
Published: (2025)

Evaluating Human-AI Safety: A Framework for Measuring Harmful Capability Uplift
by: Vaccaro, Michelle, et al.
Published: (2026)

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
by: Ren, Richard, et al.
Published: (2024)

AI Safety is Stuck in Technical Terms -- A System Safety Response to the International AI Safety Report
by: Dobbe, Roel
Published: (2025)

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
by: Xie, Tinghao, et al.
Published: (2024)

Auto-Evaluation: A Critical Measure in Driving Improvements in Quality and Safety of AI-Generated Lesson Resources
by: Clark, Hannah-Beth, et al.
Published: (2025)

Generative AI for Requirements Engineering: A Systematic Literature Review
by: Cheng, Haowei, et al.
Published: (2024)

Safety Cases: A Scalable Approach to Frontier AI Safety
by: Hilton, Benjamin, et al.
Published: (2025)

SafePro: Evaluating the Safety of Professional-Level AI Agents
by: Zhou, Kaiwen, et al.
Published: (2026)

AI Safety: A Climb To Armageddon?
by: Cappelen, Herman, et al.
Published: (2024)

Evaluation Faking: Unveiling Observer Effects in Safety Evaluation of Frontier AI Systems
by: Fan, Yihe, et al.
Published: (2025)

The Ghost in the Grammar: Methodological Anthropomorphism in AI Safety Evaluations
by: Costa, Mariana Lins
Published: (2026)

Holistic Safety and Responsibility Evaluations of Advanced AI Models
by: Weidinger, Laura, et al.
Published: (2024)

Safety Under Scaffolding: How Evaluation Conditions Shape Measured Safety
by: Gringras, David
Published: (2026)

AI Adoption in NGOs: A Systematic Literature Review
by: Rotter, Janne, et al.
Published: (2025)

NeuroAI for AI Safety
by: Mineault, Patrick, et al.
Published: (2024)

Games for AI Control: Models of Safety Evaluations of AI Deployment Protocols
by: Griffin, Charlie, et al.
Published: (2024)

A Different Approach to AI Safety: Proceedings from the Columbia Convening on Openness in Artificial Intelligence and AI Safety
by: François, Camille, et al.
Published: (2025)

Systematic Literature Review: Explainable AI Definitions and Challenges in Education
by: Altukhi, Zaid M., et al.
Published: (2025)

Data-Driven Methods and AI in Engineering Design: A Systematic Literature Review Focusing on Challenges and Opportunities
by: Afifi, Nehal, et al.
Published: (2025)

Safety Cases: How to Justify the Safety of Advanced AI Systems
by: Clymer, Joshua, et al.
Published: (2024)

AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement
by: Zhang, Zhexin, et al.
Published: (2025)

A Real-World Evaluation of LLM Medication Safety Reviews in NHS Primary Care
by: Normand, Oliver, et al.
Published: (2025)

Generative AI and Creativity: A Systematic Literature Review and Meta-Analysis
by: Holzner, Niklas, et al.
Published: (2025)

MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models
by: Han, Tessa, et al.
Published: (2024)

AI in Computational Thinking Education in Higher Education: A Systematic Literature Review
by: Rahimi, Ebrahim, et al.
Published: (2025)

The Literature Review Network: An Explainable Artificial Intelligence for Systematic Literature Reviews, Meta-analyses, and Method Development
by: Morriss, Joshua, et al.
Published: (2024)

Improving the Safety and Trustworthiness of Medical AI via Multi-Agent Evaluation Loops
by: Ghafoor, Zainab, et al.
Published: (2026)

A Systematic Review of Open Datasets Used in Text-to-Image (T2I) Gen AI Model Safety
by: Rouf, Rakeen, et al.
Published: (2025)