Saved in:
| Main Authors: | François, Camille, Péran, Ludovic, Bdeir, Ayah, Dziri, Nouha, Hawkins, Will, Jernite, Yacine, Kapoor, Sayash, Shen, Juliet, Khlaaf, Heidy, Klyman, Kevin, Marda, Nik, Pellat, Marie, Raji, Deb, Siddarth, Divya, Skowron, Aviya, Spisak, Joseph, Srikumar, Madhulika, Storchan, Victor, Tang, Audrey, Weedon, Jen |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.22183 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Safety Co-Option and Compromised National Security: The Self-Fulfilling Prophecy of Weakened AI Risk Thresholds
by: Khlaaf, Heidy, et al.
Published: (2025)
by: Khlaaf, Heidy, et al.
Published: (2025)
Towards a Framework for Openness in Foundation Models: Proceedings from the Columbia Convening on Openness in Artificial Intelligence
by: Basdevant, Adrien, et al.
Published: (2024)
by: Basdevant, Adrien, et al.
Published: (2024)
Beyond Release: Access Considerations for Generative AI Systems
by: Solaiman, Irene, et al.
Published: (2025)
by: Solaiman, Irene, et al.
Published: (2025)
LeftoverLocals: Listening to LLM Responses Through Leaked GPU Local Memory
by: Sorensen, Tyler, et al.
Published: (2024)
by: Sorensen, Tyler, et al.
Published: (2024)
OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)
Surfacing Semantic Orthogonality Across Model Safety Benchmarks: A Multi-Dimensional Analysis
by: Bennion, Jonathan, et al.
Published: (2025)
by: Bennion, Jonathan, et al.
Published: (2025)
INTIMA: A Benchmark for Human-AI Companionship Behavior
by: Kaffee, Lucie-Aimée, et al.
Published: (2025)
by: Kaffee, Lucie-Aimée, et al.
Published: (2025)
Power Hungry Processing: Watts Driving the Cost of AI Deployment?
by: Luccioni, Alexandra Sasha, et al.
Published: (2023)
by: Luccioni, Alexandra Sasha, et al.
Published: (2023)
SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior
by: Li, Jing-Jing, et al.
Published: (2024)
by: Li, Jing-Jing, et al.
Published: (2024)
Mind the Gap: Foundation Models and the Covert Proliferation of Military Intelligence, Surveillance, and Targeting
by: Khlaaf, Heidy, et al.
Published: (2024)
by: Khlaaf, Heidy, et al.
Published: (2024)
On the Societal Impact of Open Foundation Models
by: Kapoor, Sayash, et al.
Published: (2024)
by: Kapoor, Sayash, et al.
Published: (2024)
Concrete Problems in AI Safety, Revisited
by: Raji, Inioluwa Deborah, et al.
Published: (2023)
by: Raji, Inioluwa Deborah, et al.
Published: (2023)
In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI
by: Longpre, Shayne, et al.
Published: (2025)
by: Longpre, Shayne, et al.
Published: (2025)
The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources
by: Longpre, Shayne, et al.
Published: (2024)
by: Longpre, Shayne, et al.
Published: (2024)
Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance
by: Zhou, Kaitlyn, et al.
Published: (2024)
by: Zhou, Kaitlyn, et al.
Published: (2024)
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
by: Han, Seungju, et al.
Published: (2024)
by: Han, Seungju, et al.
Published: (2024)
A Safe Harbor for AI Evaluation and Red Teaming
by: Longpre, Shayne, et al.
Published: (2024)
by: Longpre, Shayne, et al.
Published: (2024)
COMPUTATIONAL LEADERSHIP: REMAINING INNOVATIVE AND PEOPLE‐CENTERED IN THE AGE OF AI
by: Brian R. Spisak
Published: (2024)
by: Brian R. Spisak
Published: (2024)
TurnWise: The Gap between Single- and Multi-turn Language Model Capabilities
by: Graf, Victoria, et al.
Published: (2026)
by: Graf, Victoria, et al.
Published: (2026)
International AI Safety Report
by: Bengio, Yoshua, et al.
Published: (2025)
by: Bengio, Yoshua, et al.
Published: (2025)
From Symptoms to Systems: An Expert-Guided Approach to Understanding Risks of Generative AI for Eating Disorders
by: Winecoff, Amy, et al.
Published: (2025)
by: Winecoff, Amy, et al.
Published: (2025)
The Reality of AI and Biorisk
by: Peppin, Aidan, et al.
Published: (2024)
by: Peppin, Aidan, et al.
Published: (2024)
Rationale and Schedule for a Classification System for Education and Education-Related Materials.
by: Woodbury, Marda
Published: (1972)
by: Woodbury, Marda
Published: (1972)
Selecting Instructional Materials. Fastback 110.
by: Woodbury, Marda
Published: (1978)
by: Woodbury, Marda
Published: (1978)
A Guide to Educational Resources.
by: Woodbury, Marda
Published: (1974)
by: Woodbury, Marda
Published: (1974)
The 2024 Foundation Model Transparency Index
by: Bommasani, Rishi, et al.
Published: (2024)
by: Bommasani, Rishi, et al.
Published: (2024)
Funders Network Spring Convening
Published: (2024)
Published: (2024)
International Scientific Report on the Safety of Advanced AI (Interim Report)
by: Bengio, Yoshua, et al.
Published: (2024)
by: Bengio, Yoshua, et al.
Published: (2024)
International AI Safety Report 2026
by: Bengio, Yoshua, et al.
Published: (2026)
by: Bengio, Yoshua, et al.
Published: (2026)
NeuroAI for AI Safety
by: Mineault, Patrick, et al.
Published: (2024)
by: Mineault, Patrick, et al.
Published: (2024)
AI Agents That Matter
by: Kapoor, Sayash, et al.
Published: (2024)
by: Kapoor, Sayash, et al.
Published: (2024)
AI Safety is Stuck in Technical Terms -- A System Safety Response to the International AI Safety Report
by: Dobbe, Roel
Published: (2025)
by: Dobbe, Roel
Published: (2025)
The Solar Dynamics Observatory in the Living With a Star Era: From Solar Observations to Predictive Heliophysics
by: Guhathakurta, Madhulika
Published: (2026)
by: Guhathakurta, Madhulika
Published: (2026)
AI Safety for Everyone
by: Gyevnar, Balint, et al.
Published: (2025)
by: Gyevnar, Balint, et al.
Published: (2025)
CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models
by: Pistilli, Giada, et al.
Published: (2024)
by: Pistilli, Giada, et al.
Published: (2024)
Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression
by: Xu, Zhichao, et al.
Published: (2024)
by: Xu, Zhichao, et al.
Published: (2024)
Human-AI Safety: A Descendant of Generative AI and Control Systems Safety
by: Bajcsy, Andrea, et al.
Published: (2024)
by: Bajcsy, Andrea, et al.
Published: (2024)
The Role of AI Safety Institutes in Contributing to International Standards for Frontier AI Safety
by: Fort, Kristina
Published: (2024)
by: Fort, Kristina
Published: (2024)
AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text
by: Lu, Ximing, et al.
Published: (2024)
by: Lu, Ximing, et al.
Published: (2024)
Do AI Companies Make Good on Voluntary Commitments to the White House?
by: Wang, Jennifer, et al.
Published: (2025)
by: Wang, Jennifer, et al.
Published: (2025)
Similar Items
-
Safety Co-Option and Compromised National Security: The Self-Fulfilling Prophecy of Weakened AI Risk Thresholds
by: Khlaaf, Heidy, et al.
Published: (2025) -
Towards a Framework for Openness in Foundation Models: Proceedings from the Columbia Convening on Openness in Artificial Intelligence
by: Basdevant, Adrien, et al.
Published: (2024) -
Beyond Release: Access Considerations for Generative AI Systems
by: Solaiman, Irene, et al.
Published: (2025) -
LeftoverLocals: Listening to LLM Responses Through Leaked GPU Local Memory
by: Sorensen, Tyler, et al.
Published: (2024) -
OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)