:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Stix, Charlotte, Pistillo, Matteo, Sastry, Girish, Hobbhahn, Marius, Ortega, Alejandro, Balesni, Mikita, Hallensleben, Annika, Goldowsky-Dill, Nix, Sharkey, Lee
Format:	Preprint
Published:	2025
Subjects:	Computers and Society
Online Access:	https://arxiv.org/abs/2504.12170
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

The Loss of Control Playbook: Degrees, Dynamics, and Preparedness
by: Stix, Charlotte, et al.
Published: (2025)

Large Language Models can Strategically Deceive their Users when Put Under Pressure
by: Scheurer, Jérémy, et al.
Published: (2023)

Pre-Deployment Information Sharing: A Zoning Taxonomy for Precursory Capabilities
by: Pistillo, Matteo, et al.
Published: (2024)

Assurance of Frontier AI Built for National Security
by: Pistillo, Matteo, et al.
Published: (2025)

Detecting Strategic Deception Using Linear Probes
by: Goldowsky-Dill, Nicholas, et al.
Published: (2025)

Towards evaluations-based safety cases for AI scheming
by: Balesni, Mikita, et al.
Published: (2024)

Internal Deployment in the AI Act
by: Pistillo, Matteo
Published: (2025)

Frontier Models are Capable of In-context Scheming
by: Meinke, Alexander, et al.
Published: (2024)

Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
by: Braun, Dan, et al.
Published: (2024)

Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
by: Laine, Rudolf, et al.
Published: (2024)

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability
by: Bushnaq, Lucius, et al.
Published: (2024)

Lessons from Studying Two-Hop Latent Reasoning
by: Balesni, Mikita, et al.
Published: (2024)

Stress Testing Deliberative Alignment for Anti-Scheming Training
by: Schoen, Bronson, et al.
Published: (2025)

How to evaluate control measures for LLM agents? A trajectory from today to superintelligence
by: Korbak, Tomek, et al.
Published: (2025)

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
by: Bushnaq, Lucius, et al.
Published: (2024)

Towards Frontier Safety Policies Plus
by: Pistillo, Matteo
Published: (2025)

Children in Police Custody: Adversity and Adversariality Behind Closed Doors
by: Frances Sheahan
Published: (2025)

Behind Office Doors
Published: (2026)

Behind Office Doors
Published: (2026)

Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack
by: McKee-Reid, Leo, et al.
Published: (2024)

Peeking Behind Closed Doors: Risks of LLM Evaluation by Private Data Curators
by: Bansal, Hritik, et al.
Published: (2025)

Defending Compute Thresholds Against Legal Loopholes
by: Pistillo, Matteo, et al.
Published: (2025)

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
by: Berglund, Lukas, et al.
Published: (2023)

Chapter 6 The Role of Corporate Governance in Macro-Prudential Regulation of Systemic Risk
by: Dill, Alexander
Published: (2020)

Owning the Stuff of Life
by: Stix, Gary

Backchaining Loss of Control Mitigations from Mission-Specific Benchmarks in National Security
by: Pistillo, Matteo, et al.
Published: (2026)

Technical Report: Evaluating Goal Drift in Language Model Agents
by: Arike, Rauno, et al.
Published: (2025)

Forecasting Frontier Language Model Agent Capabilities
by: Pimpale, Govind, et al.
Published: (2025)

Ground states for the Hartree energy functional in the critical case
by: Pistillo, Tommaso
Published: (2025)

Analogical Reasoning Within a Conceptual Hyperspace
by: Goldowsky, Howard, et al.
Published: (2024)

Behind Closed Doors: An Exploratory Study of the Perceptions of Librarians and the Hidden Intellectual Work of Collection Development in Canadian Public Libraries.
by: Nilsen, Kirsti, et al.
Published: (2002)

The Day the Library Closed Its Doors
by: Yates, Elizabeth
Published: (1970)

A Study of the Bookmobile Service of the Madison Public Library.
by: Nix, Larry T.
Published: (1981)

Bibliophilately Revisited.
by: Nix, Larry T.
Published: (2000)

Large Language Models Often Know When They Are Being Evaluated
by: Needham, Joe, et al.
Published: (2025)

Analyzing Probabilistic Methods for Evaluating Agent Capabilities
by: Højmark, Axel, et al.
Published: (2024)

The étale topos reconstructs varieties over sub-p-adic fields
by: Carlson, Magnus, et al.
Published: (2024)

Monographs in Microform: Issues in Cataloging and Bibliographic Control.
by: Mikita, Elizabeth G.
Published: (1981)

Hunter Midtown Library: The Closing of an Open Door
by: Foster, Barbara
Published: (1976)

DoorBot: Closed-Loop Task Planning and Manipulation for Door Opening in the Wild with Haptic Feedback
by: Wang, Zhi, et al.
Published: (2025)