Saved in:
| Main Authors: | Grey, Markov, Segerie, Charbel-Raphaël |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.05541 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The AI Risk Spectrum: From Dangerous Capabilities to Existential Threats
by: Grey, Markov, et al.
Published: (2025)
by: Grey, Markov, et al.
Published: (2025)
BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards
by: Dorn, Diego, et al.
Published: (2024)
by: Dorn, Diego, et al.
Published: (2024)
The bitter lesson of misuse detection
by: Mariaccia, Hadrien, et al.
Published: (2025)
by: Mariaccia, Hadrien, et al.
Published: (2025)
Continuous Time Continuous Space Homeostatic Reinforcement Learning (CTCS-HRRL) : Towards Biological Self-Autonomous Agent
by: Laurencon, Hugo, et al.
Published: (2024)
by: Laurencon, Hugo, et al.
Published: (2024)
Unpacking Human-AI Interaction in Safety-Critical Industries: A Systematic Literature Review
by: Bach, Tita A., et al.
Published: (2023)
by: Bach, Tita A., et al.
Published: (2023)
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
by: Röttger, Paul, et al.
Published: (2024)
by: Röttger, Paul, et al.
Published: (2024)
Mechanistic Interpretability for AI Safety -- A Review
by: Bereska, Leonard, et al.
Published: (2024)
by: Bereska, Leonard, et al.
Published: (2024)
Responsible AI in Construction Safety: Systematic Evaluation of Large Language Models and Prompt Engineering
by: Sammour, Farouq, et al.
Published: (2024)
by: Sammour, Farouq, et al.
Published: (2024)
SteeringSafety: A Systematic Safety Evaluation Framework of Representation Steering in LLMs
by: Siu, Vincent, et al.
Published: (2025)
by: Siu, Vincent, et al.
Published: (2025)
International Agreements on AI Safety: Review and Recommendations for a Conditional AI Safety Treaty
by: Scholefield, Rebecca, et al.
Published: (2025)
by: Scholefield, Rebecca, et al.
Published: (2025)
OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)
SAGE-Eval: Evaluating LLMs for Systematic Generalizations of Safety Facts
by: Yueh-Han, Chen, et al.
Published: (2025)
by: Yueh-Han, Chen, et al.
Published: (2025)
Evaluating Human-AI Safety: A Framework for Measuring Harmful Capability Uplift
by: Vaccaro, Michelle, et al.
Published: (2026)
by: Vaccaro, Michelle, et al.
Published: (2026)
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
by: Ren, Richard, et al.
Published: (2024)
by: Ren, Richard, et al.
Published: (2024)
AI Safety is Stuck in Technical Terms -- A System Safety Response to the International AI Safety Report
by: Dobbe, Roel
Published: (2025)
by: Dobbe, Roel
Published: (2025)
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
by: Xie, Tinghao, et al.
Published: (2024)
by: Xie, Tinghao, et al.
Published: (2024)
Auto-Evaluation: A Critical Measure in Driving Improvements in Quality and Safety of AI-Generated Lesson Resources
by: Clark, Hannah-Beth, et al.
Published: (2025)
by: Clark, Hannah-Beth, et al.
Published: (2025)
Generative AI for Requirements Engineering: A Systematic Literature Review
by: Cheng, Haowei, et al.
Published: (2024)
by: Cheng, Haowei, et al.
Published: (2024)
Safety Cases: A Scalable Approach to Frontier AI Safety
by: Hilton, Benjamin, et al.
Published: (2025)
by: Hilton, Benjamin, et al.
Published: (2025)
SafePro: Evaluating the Safety of Professional-Level AI Agents
by: Zhou, Kaiwen, et al.
Published: (2026)
by: Zhou, Kaiwen, et al.
Published: (2026)
AI Safety: A Climb To Armageddon?
by: Cappelen, Herman, et al.
Published: (2024)
by: Cappelen, Herman, et al.
Published: (2024)
Evaluation Faking: Unveiling Observer Effects in Safety Evaluation of Frontier AI Systems
by: Fan, Yihe, et al.
Published: (2025)
by: Fan, Yihe, et al.
Published: (2025)
The Ghost in the Grammar: Methodological Anthropomorphism in AI Safety Evaluations
by: Costa, Mariana Lins
Published: (2026)
by: Costa, Mariana Lins
Published: (2026)
Holistic Safety and Responsibility Evaluations of Advanced AI Models
by: Weidinger, Laura, et al.
Published: (2024)
by: Weidinger, Laura, et al.
Published: (2024)
Safety Under Scaffolding: How Evaluation Conditions Shape Measured Safety
by: Gringras, David
Published: (2026)
by: Gringras, David
Published: (2026)
AI Adoption in NGOs: A Systematic Literature Review
by: Rotter, Janne, et al.
Published: (2025)
by: Rotter, Janne, et al.
Published: (2025)
NeuroAI for AI Safety
by: Mineault, Patrick, et al.
Published: (2024)
by: Mineault, Patrick, et al.
Published: (2024)
Games for AI Control: Models of Safety Evaluations of AI Deployment Protocols
by: Griffin, Charlie, et al.
Published: (2024)
by: Griffin, Charlie, et al.
Published: (2024)
A Different Approach to AI Safety: Proceedings from the Columbia Convening on Openness in Artificial Intelligence and AI Safety
by: François, Camille, et al.
Published: (2025)
by: François, Camille, et al.
Published: (2025)
Systematic Literature Review: Explainable AI Definitions and Challenges in Education
by: Altukhi, Zaid M., et al.
Published: (2025)
by: Altukhi, Zaid M., et al.
Published: (2025)
Data-Driven Methods and AI in Engineering Design: A Systematic Literature Review Focusing on Challenges and Opportunities
by: Afifi, Nehal, et al.
Published: (2025)
by: Afifi, Nehal, et al.
Published: (2025)
Safety Cases: How to Justify the Safety of Advanced AI Systems
by: Clymer, Joshua, et al.
Published: (2024)
by: Clymer, Joshua, et al.
Published: (2024)
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement
by: Zhang, Zhexin, et al.
Published: (2025)
by: Zhang, Zhexin, et al.
Published: (2025)
A Real-World Evaluation of LLM Medication Safety Reviews in NHS Primary Care
by: Normand, Oliver, et al.
Published: (2025)
by: Normand, Oliver, et al.
Published: (2025)
Generative AI and Creativity: A Systematic Literature Review and Meta-Analysis
by: Holzner, Niklas, et al.
Published: (2025)
by: Holzner, Niklas, et al.
Published: (2025)
MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models
by: Han, Tessa, et al.
Published: (2024)
by: Han, Tessa, et al.
Published: (2024)
AI in Computational Thinking Education in Higher Education: A Systematic Literature Review
by: Rahimi, Ebrahim, et al.
Published: (2025)
by: Rahimi, Ebrahim, et al.
Published: (2025)
The Literature Review Network: An Explainable Artificial Intelligence for Systematic Literature Reviews, Meta-analyses, and Method Development
by: Morriss, Joshua, et al.
Published: (2024)
by: Morriss, Joshua, et al.
Published: (2024)
Improving the Safety and Trustworthiness of Medical AI via Multi-Agent Evaluation Loops
by: Ghafoor, Zainab, et al.
Published: (2026)
by: Ghafoor, Zainab, et al.
Published: (2026)
A Systematic Review of Open Datasets Used in Text-to-Image (T2I) Gen AI Model Safety
by: Rouf, Rakeen, et al.
Published: (2025)
by: Rouf, Rakeen, et al.
Published: (2025)
Similar Items
-
The AI Risk Spectrum: From Dangerous Capabilities to Existential Threats
by: Grey, Markov, et al.
Published: (2025) -
BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards
by: Dorn, Diego, et al.
Published: (2024) -
The bitter lesson of misuse detection
by: Mariaccia, Hadrien, et al.
Published: (2025) -
Continuous Time Continuous Space Homeostatic Reinforcement Learning (CTCS-HRRL) : Towards Biological Self-Autonomous Agent
by: Laurencon, Hugo, et al.
Published: (2024) -
Unpacking Human-AI Interaction in Safety-Critical Industries: A Systematic Literature Review
by: Bach, Tita A., et al.
Published: (2023)