:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Che, Zora, Casper, Stephen, Kirk, Robert, Satheesh, Anirudh, Slocum, Stewart, McKinney, Lev E, Gandikota, Rohit, Ewart, Aidan, Rosati, Domenic, Wu, Zichu, Cai, Zikui, Chughtai, Bilal, Gal, Yarin, Huang, Furong, Hadfield-Menell, Dylan
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.05209
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Eight Methods to Evaluate Robust Unlearning in LLMs
by: Lynch, Aengus, et al.
Published: (2024)

Diverse Preference Learning for Capabilities and Alignment
by: Slocum, Stewart, et al.
Published: (2025)

Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases
by: Hahm, Dongyoon, et al.
Published: (2026)

Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs
by: Khan, Ariba, et al.
Published: (2025)

Pitfalls of Evidence-Based AI Policy
by: Casper, Stephen, et al.
Published: (2025)

Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs
by: O'Brien, Kyle, et al.
Published: (2025)

Defending Against Unforeseen Failure Modes with Latent Adversarial Training
by: Casper, Stephen, et al.
Published: (2024)

GULPS: Two-Qubit Gate Synthesis via Linear Programming for Heterogeneous Instruction Sets
by: McKinney, Evan, et al.
Published: (2025)

Compositional Adversarial Training for Robust Visual Watermarking
by: Satheesh, Anirudh, et al.
Published: (2026)

Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
by: Sheshadri, Abhay, et al.
Published: (2024)

Provably Efficient Algorithms for S- and Non-Rectangular Robust MDPs with General Parameterization
by: Satheesh, Anirudh, et al.
Published: (2026)

EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles
by: Agrawal, Aakriti, et al.
Published: (2024)

EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles
by: Agrawal, Aakriti, et al.
Published: (2025)

SAFLEX: Self-Adaptive Augmentation via Feature Label Extrapolation
by: Ding, Mucong, et al.
Published: (2024)

AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security
by: Cai, Zikui, et al.
Published: (2025)

Distributional Process Reward Models: Calibrated Prediction of Future Rewards via Conditional Optimal Transport
by: Ma, Rachel, et al.
Published: (2026)

CALMA: A Process for Deriving Context-aligned Axes for Language Model Alignment
by: Soni, Prajna, et al.
Published: (2025)

Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models
by: Sankaranarayanan, Aruna, et al.
Published: (2025)

Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
by: Siththaranjan, Anand, et al.
Published: (2023)

Prompt Injection as Role Confusion
by: Ye, Charles, et al.
Published: (2026)

Postcolonialism and Migration in French Comics
by: McKinney, Mark
Published: (2025)

Grandmothering While Black: A Twenty‐First‐Century Story of Love, Coercion, and Survival. By Lashawnda L.Pittman. University of California Press, Oakland, California, 2023. 336 pp. $92.04 (hardcover). ISBN: 978‐0‐52‐038995‐3; $29.95 (paperback). ISBN: 978‐0‐52‐038996‐0; $29.95 (ebook). ISBN: 978‐0‐52‐038997‐7
by: Elliana McKinney
Published: (2025)

Schools Inquiring About Seven-Day School Rerecording of Public and Instructional Television Programs.
by: McKinney, Eleanor
Published: (1975)

Distilling Diversity and Control in Diffusion Models
by: Gandikota, Rohit, et al.
Published: (2025)

Water conservation surveys of New South Wales
by: McKinney, Hugh Giffen
Published: (1896)

Evolution of erect marine bryozoan faunas : repeated succes of unilaminate species
by: McKinney, F.K
Published: (1986)

Created from nafta : the structure, function, and significance of the treatys related institutions / Joseph A. McKinney
by: McKinney, Joseph A

Media Utilization in the Classroom.
by: Bowie, Melvin McKinney
Published: (1985)

Conceptual and Practical Matters: The Challenges and Benefits of Conducting Educational Research Using Historical Data. Sage Research Methods Cases Part 2
by: Stephen J. McKinney
Published: (2017)

The Contribution of Iona and Peter Opie to Children's Literature.
by: McKinney, Barbara J.
Published: (1996)

Another Degree? What For?
by: McKinney, Eleanor R.
Published: (1969)

A Constrained Multi-Agent Reinforcement Learning Approach to Autonomous Traffic Signal Control
by: Satheesh, Anirudh, et al.
Published: (2025)

Regret Analysis of Unichain Average Reward Constrained MDPs with General Parameterization
by: Satheesh, Anirudh, et al.
Published: (2026)

Cooperative Inverse Reinforcement Learning
by: Hadfield-Menell, Dylan, et al.
Published: (2016)

Flexible Agent Alignment with Goal Inference from Open-Ended Dialog
by: Ma, Rachel, et al.
Published: (2025)

Layered Unlearning for Adversarial Relearning
by: Qian, Timothy, et al.
Published: (2025)

Goal Inference from Open-Ended Dialog
by: Ma, Rachel, et al.
Published: (2024)

Activation Steering via Generative Causal Mediation
by: Sankaranarayanan, Aruna, et al.
Published: (2026)

MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning
by: Cai, Zikui, et al.
Published: (2025)

Uncertainty-Aware Answer Selection for Improved Reasoning in Multi-LLM Systems
by: Agrawal, Aakriti, et al.
Published: (2025)