:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	François, Camille, Péran, Ludovic, Bdeir, Ayah, Dziri, Nouha, Hawkins, Will, Jernite, Yacine, Kapoor, Sayash, Shen, Juliet, Khlaaf, Heidy, Klyman, Kevin, Marda, Nik, Pellat, Marie, Raji, Deb, Siddarth, Divya, Skowron, Aviya, Spisak, Joseph, Srikumar, Madhulika, Storchan, Victor, Tang, Audrey, Weedon, Jen
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2506.22183
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Safety Co-Option and Compromised National Security: The Self-Fulfilling Prophecy of Weakened AI Risk Thresholds
by: Khlaaf, Heidy, et al.
Published: (2025)

Towards a Framework for Openness in Foundation Models: Proceedings from the Columbia Convening on Openness in Artificial Intelligence
by: Basdevant, Adrien, et al.
Published: (2024)

Beyond Release: Access Considerations for Generative AI Systems
by: Solaiman, Irene, et al.
Published: (2025)

LeftoverLocals: Listening to LLM Responses Through Leaked GPU Local Memory
by: Sorensen, Tyler, et al.
Published: (2024)

OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)

Surfacing Semantic Orthogonality Across Model Safety Benchmarks: A Multi-Dimensional Analysis
by: Bennion, Jonathan, et al.
Published: (2025)

INTIMA: A Benchmark for Human-AI Companionship Behavior
by: Kaffee, Lucie-Aimée, et al.
Published: (2025)

Power Hungry Processing: Watts Driving the Cost of AI Deployment?
by: Luccioni, Alexandra Sasha, et al.
Published: (2023)

SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior
by: Li, Jing-Jing, et al.
Published: (2024)

Mind the Gap: Foundation Models and the Covert Proliferation of Military Intelligence, Surveillance, and Targeting
by: Khlaaf, Heidy, et al.
Published: (2024)

On the Societal Impact of Open Foundation Models
by: Kapoor, Sayash, et al.
Published: (2024)

Concrete Problems in AI Safety, Revisited
by: Raji, Inioluwa Deborah, et al.
Published: (2023)

In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI
by: Longpre, Shayne, et al.
Published: (2025)

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources
by: Longpre, Shayne, et al.
Published: (2024)

Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance
by: Zhou, Kaitlyn, et al.
Published: (2024)

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
by: Han, Seungju, et al.
Published: (2024)

A Safe Harbor for AI Evaluation and Red Teaming
by: Longpre, Shayne, et al.
Published: (2024)

COMPUTATIONAL LEADERSHIP: REMAINING INNOVATIVE AND PEOPLE‐CENTERED IN THE AGE OF AI
by: Brian R. Spisak
Published: (2024)

TurnWise: The Gap between Single- and Multi-turn Language Model Capabilities
by: Graf, Victoria, et al.
Published: (2026)

International AI Safety Report
by: Bengio, Yoshua, et al.
Published: (2025)

From Symptoms to Systems: An Expert-Guided Approach to Understanding Risks of Generative AI for Eating Disorders
by: Winecoff, Amy, et al.
Published: (2025)

The Reality of AI and Biorisk
by: Peppin, Aidan, et al.
Published: (2024)

Rationale and Schedule for a Classification System for Education and Education-Related Materials.
by: Woodbury, Marda
Published: (1972)

Selecting Instructional Materials. Fastback 110.
by: Woodbury, Marda
Published: (1978)

A Guide to Educational Resources.
by: Woodbury, Marda
Published: (1974)

The 2024 Foundation Model Transparency Index
by: Bommasani, Rishi, et al.
Published: (2024)

Funders Network Spring Convening
Published: (2024)

International Scientific Report on the Safety of Advanced AI (Interim Report)
by: Bengio, Yoshua, et al.
Published: (2024)

International AI Safety Report 2026
by: Bengio, Yoshua, et al.
Published: (2026)

NeuroAI for AI Safety
by: Mineault, Patrick, et al.
Published: (2024)

AI Agents That Matter
by: Kapoor, Sayash, et al.
Published: (2024)

AI Safety is Stuck in Technical Terms -- A System Safety Response to the International AI Safety Report
by: Dobbe, Roel
Published: (2025)

The Solar Dynamics Observatory in the Living With a Star Era: From Solar Observations to Predictive Heliophysics
by: Guhathakurta, Madhulika
Published: (2026)

AI Safety for Everyone
by: Gyevnar, Balint, et al.
Published: (2025)

CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models
by: Pistilli, Giada, et al.
Published: (2024)

Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression
by: Xu, Zhichao, et al.
Published: (2024)

Human-AI Safety: A Descendant of Generative AI and Control Systems Safety
by: Bajcsy, Andrea, et al.
Published: (2024)

The Role of AI Safety Institutes in Contributing to International Standards for Frontier AI Safety
by: Fort, Kristina
Published: (2024)

AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text
by: Lu, Ximing, et al.
Published: (2024)

Do AI Companies Make Good on Voluntary Commitments to the White House?
by: Wang, Jennifer, et al.
Published: (2025)