Saved in:
| Main Authors: | Kowal, Matthew, Paulo, Goncalo, Jaburi, Louis, Tseng, Tom, McKinney, Lev E, Heimersheim, Stefan, Tucker, Aaron David, Gleave, Adam, Pelrine, Kellin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.14869 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks
by: Struppek, Lukas, et al.
Published: (2026)
by: Struppek, Lukas, et al.
Published: (2026)
Can Go AIs be adversarially robust?
by: Tseng, Tom, et al.
Published: (2024)
by: Tseng, Tom, et al.
Published: (2024)
Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility
by: Murphy, Brendan, et al.
Published: (2025)
by: Murphy, Brendan, et al.
Published: (2025)
Scaling Trends for Data Poisoning in LLMs
by: Bowen, Dillon, et al.
Published: (2024)
by: Bowen, Dillon, et al.
Published: (2024)
Exploiting Novel GPT-4 APIs
by: Pelrine, Kellin, et al.
Published: (2023)
by: Pelrine, Kellin, et al.
Published: (2023)
Large language models can effectively convince people to believe conspiracies
by: Costello, Thomas H., et al.
Published: (2026)
by: Costello, Thomas H., et al.
Published: (2026)
It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics
by: Kowal, Matthew, et al.
Published: (2025)
by: Kowal, Matthew, et al.
Published: (2025)
The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes
by: Taufeeque, Mohammad, et al.
Published: (2026)
by: Taufeeque, Mohammad, et al.
Published: (2026)
GULPS: Two-Qubit Gate Synthesis via Linear Programming for Heterogeneous Instruction Sets
by: McKinney, Evan, et al.
Published: (2025)
by: McKinney, Evan, et al.
Published: (2025)
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i
by: Ayonrinde, Kola, et al.
Published: (2025)
by: Ayonrinde, Kola, et al.
Published: (2025)
Evaluating Explanations: An Explanatory Virtues Framework for Mechanistic Interpretability -- The Strange Science Part I.ii
by: Ayonrinde, Kola, et al.
Published: (2025)
by: Ayonrinde, Kola, et al.
Published: (2025)
Emergent Persuasion: Will LLMs Persuade Without Being Prompted?
by: Chang, Vincent, et al.
Published: (2025)
by: Chang, Vincent, et al.
Published: (2025)
TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering
by: Hossain, Saad, et al.
Published: (2026)
by: Hossain, Saad, et al.
Published: (2026)
Leakage Safe Graph Features for Interpretable Fraud Detection in Temporal Transaction Networks
by: Khaleghpour, Hamideh, et al.
Published: (2026)
by: Khaleghpour, Hamideh, et al.
Published: (2026)
Student Conceptions of Group Work: Visual Research into LIS Student Group Work Using the Draw-and-Write Technique
by: McKinney, Pamela, et al.
Published: (2018)
by: McKinney, Pamela, et al.
Published: (2018)
Online Influence Campaigns: Strategies and Vulnerabilities
by: Musulan, Andreea, et al.
Published: (2024)
by: Musulan, Andreea, et al.
Published: (2024)
Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards
by: Pandey, Punya Syon, et al.
Published: (2025)
by: Pandey, Punya Syon, et al.
Published: (2025)
Combining Confidence Elicitation and Sample-based Methods for Uncertainty Quantification in Misinformation Mitigation
by: Rivera, Mauricio, et al.
Published: (2024)
by: Rivera, Mauricio, et al.
Published: (2024)
Comparing GPT-4 and Open-Source Language Models in Misinformation Mitigation
by: Vergho, Tyler, et al.
Published: (2024)
by: Vergho, Tyler, et al.
Published: (2024)
Scaling Trends in Language Model Robustness
by: Howe, Nikolaus, et al.
Published: (2024)
by: Howe, Nikolaus, et al.
Published: (2024)
Human Frailties: Springboard to Increased Systems Engineering Influence
by: Eileen Patrice Arnold, et al.
Published: (2024)
by: Eileen Patrice Arnold, et al.
Published: (2024)
Postcolonialism and Migration in French Comics
by: McKinney, Mark
Published: (2025)
by: McKinney, Mark
Published: (2025)
Grandmothering While Black: A Twenty‐First‐Century Story of Love, Coercion, and Survival. By Lashawnda L.Pittman. University of California Press, Oakland, California, 2023. 336 pp. $92.04 (hardcover). ISBN: 978‐0‐52‐038995‐3; $29.95 (paperback). ISBN: 978‐0‐52‐038996‐0; $29.95 (ebook). ISBN: 978‐0‐52‐038997‐7
by: Elliana McKinney
Published: (2025)
by: Elliana McKinney
Published: (2025)
Schools Inquiring About Seven-Day School Rerecording of Public and Instructional Television Programs.
by: McKinney, Eleanor
Published: (1975)
by: McKinney, Eleanor
Published: (1975)
Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN
by: Taufeeque, Mohammad, et al.
Published: (2025)
by: Taufeeque, Mohammad, et al.
Published: (2025)
STACK: Adversarial Attacks on LLM Safeguard Pipelines
by: McKenzie, Ian R., et al.
Published: (2025)
by: McKenzie, Ian R., et al.
Published: (2025)
Negation Neglect: When models fail to learn negations in training
by: Mayne, Harry, et al.
Published: (2026)
by: Mayne, Harry, et al.
Published: (2026)
Water conservation surveys of New South Wales
by: McKinney, Hugh Giffen
Published: (1896)
by: McKinney, Hugh Giffen
Published: (1896)
Evolution of erect marine bryozoan faunas : repeated succes of unilaminate species
by: McKinney, F.K
Published: (1986)
by: McKinney, F.K
Published: (1986)
Created from nafta : the structure, function, and significance of the treatys related institutions / Joseph A. McKinney
by: McKinney, Joseph A
by: McKinney, Joseph A
Media Utilization in the Classroom.
by: Bowie, Melvin McKinney
Published: (1985)
by: Bowie, Melvin McKinney
Published: (1985)
Conceptual and Practical Matters: The Challenges and Benefits of Conducting Educational Research Using Historical Data. Sage Research Methods Cases Part 2
by: Stephen J. McKinney
Published: (2017)
by: Stephen J. McKinney
Published: (2017)
The Contribution of Iona and Peter Opie to Children's Literature.
by: McKinney, Barbara J.
Published: (1996)
by: McKinney, Barbara J.
Published: (1996)
Another Degree? What For?
by: McKinney, Eleanor R.
Published: (1969)
by: McKinney, Eleanor R.
Published: (1969)
Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition
by: Braun, Dan, et al.
Published: (2025)
by: Braun, Dan, et al.
Published: (2025)
Optimizing Neuro-Fuzzy and Colonial Competition Algorithms for Skin Cancer Diagnosis in Dermatoscopic Images
by: Khaleghpour, Hamideh, et al.
Published: (2025)
by: Khaleghpour, Hamideh, et al.
Published: (2025)
Unified AI for Accurate Audio Anomaly Detection
by: Khaleghpour, Hamideh, et al.
Published: (2025)
by: Khaleghpour, Hamideh, et al.
Published: (2025)
Uncertainty Resolution in Misinformation Detection
by: Orlovskiy, Yury, et al.
Published: (2024)
by: Orlovskiy, Yury, et al.
Published: (2024)
Preference Learning with Lie Detectors can Induce Honesty or Evasion
by: Cundy, Chris, et al.
Published: (2025)
by: Cundy, Chris, et al.
Published: (2025)
You can remove GPT2's LayerNorm by fine-tuning
by: Heimersheim, Stefan
Published: (2024)
by: Heimersheim, Stefan
Published: (2024)
Similar Items
-
Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks
by: Struppek, Lukas, et al.
Published: (2026) -
Can Go AIs be adversarially robust?
by: Tseng, Tom, et al.
Published: (2024) -
Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility
by: Murphy, Brendan, et al.
Published: (2025) -
Scaling Trends for Data Poisoning in LLMs
by: Bowen, Dillon, et al.
Published: (2024) -
Exploiting Novel GPT-4 APIs
by: Pelrine, Kellin, et al.
Published: (2023)