:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kowal, Matthew, Paulo, Goncalo, Jaburi, Louis, Tseng, Tom, McKinney, Lev E, Heimersheim, Stefan, Tucker, Aaron David, Gleave, Adam, Pelrine, Kellin
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2602.14869
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks
by: Struppek, Lukas, et al.
Published: (2026)

Can Go AIs be adversarially robust?
by: Tseng, Tom, et al.
Published: (2024)

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility
by: Murphy, Brendan, et al.
Published: (2025)

Scaling Trends for Data Poisoning in LLMs
by: Bowen, Dillon, et al.
Published: (2024)

Exploiting Novel GPT-4 APIs
by: Pelrine, Kellin, et al.
Published: (2023)

Large language models can effectively convince people to believe conspiracies
by: Costello, Thomas H., et al.
Published: (2026)

It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics
by: Kowal, Matthew, et al.
Published: (2025)

The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes
by: Taufeeque, Mohammad, et al.
Published: (2026)

GULPS: Two-Qubit Gate Synthesis via Linear Programming for Heterogeneous Instruction Sets
by: McKinney, Evan, et al.
Published: (2025)

A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i
by: Ayonrinde, Kola, et al.
Published: (2025)

Evaluating Explanations: An Explanatory Virtues Framework for Mechanistic Interpretability -- The Strange Science Part I.ii
by: Ayonrinde, Kola, et al.
Published: (2025)

Emergent Persuasion: Will LLMs Persuade Without Being Prompted?
by: Chang, Vincent, et al.
Published: (2025)

TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering
by: Hossain, Saad, et al.
Published: (2026)

Leakage Safe Graph Features for Interpretable Fraud Detection in Temporal Transaction Networks
by: Khaleghpour, Hamideh, et al.
Published: (2026)

Student Conceptions of Group Work: Visual Research into LIS Student Group Work Using the Draw-and-Write Technique
by: McKinney, Pamela, et al.
Published: (2018)

Online Influence Campaigns: Strategies and Vulnerabilities
by: Musulan, Andreea, et al.
Published: (2024)

Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards
by: Pandey, Punya Syon, et al.
Published: (2025)

Combining Confidence Elicitation and Sample-based Methods for Uncertainty Quantification in Misinformation Mitigation
by: Rivera, Mauricio, et al.
Published: (2024)

Comparing GPT-4 and Open-Source Language Models in Misinformation Mitigation
by: Vergho, Tyler, et al.
Published: (2024)

Scaling Trends in Language Model Robustness
by: Howe, Nikolaus, et al.
Published: (2024)

Human Frailties: Springboard to Increased Systems Engineering Influence
by: Eileen Patrice Arnold, et al.
Published: (2024)

Postcolonialism and Migration in French Comics
by: McKinney, Mark
Published: (2025)

Grandmothering While Black: A Twenty‐First‐Century Story of Love, Coercion, and Survival. By Lashawnda L.Pittman. University of California Press, Oakland, California, 2023. 336 pp. $92.04 (hardcover). ISBN: 978‐0‐52‐038995‐3; $29.95 (paperback). ISBN: 978‐0‐52‐038996‐0; $29.95 (ebook). ISBN: 978‐0‐52‐038997‐7
by: Elliana McKinney
Published: (2025)

Schools Inquiring About Seven-Day School Rerecording of Public and Instructional Television Programs.
by: McKinney, Eleanor
Published: (1975)

Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN
by: Taufeeque, Mohammad, et al.
Published: (2025)

STACK: Adversarial Attacks on LLM Safeguard Pipelines
by: McKenzie, Ian R., et al.
Published: (2025)

Negation Neglect: When models fail to learn negations in training
by: Mayne, Harry, et al.
Published: (2026)

Water conservation surveys of New South Wales
by: McKinney, Hugh Giffen
Published: (1896)

Evolution of erect marine bryozoan faunas : repeated succes of unilaminate species
by: McKinney, F.K
Published: (1986)

Created from nafta : the structure, function, and significance of the treatys related institutions / Joseph A. McKinney
by: McKinney, Joseph A

Media Utilization in the Classroom.
by: Bowie, Melvin McKinney
Published: (1985)

Conceptual and Practical Matters: The Challenges and Benefits of Conducting Educational Research Using Historical Data. Sage Research Methods Cases Part 2
by: Stephen J. McKinney
Published: (2017)

The Contribution of Iona and Peter Opie to Children's Literature.
by: McKinney, Barbara J.
Published: (1996)

Another Degree? What For?
by: McKinney, Eleanor R.
Published: (1969)

Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition
by: Braun, Dan, et al.
Published: (2025)

Optimizing Neuro-Fuzzy and Colonial Competition Algorithms for Skin Cancer Diagnosis in Dermatoscopic Images
by: Khaleghpour, Hamideh, et al.
Published: (2025)

Unified AI for Accurate Audio Anomaly Detection
by: Khaleghpour, Hamideh, et al.
Published: (2025)

Uncertainty Resolution in Misinformation Detection
by: Orlovskiy, Yury, et al.
Published: (2024)

Preference Learning with Lie Detectors can Induce Honesty or Evasion
by: Cundy, Chris, et al.
Published: (2025)

You can remove GPT2's LayerNorm by fine-tuning
by: Heimersheim, Stefan
Published: (2024)