:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sandoval, Aaron, Rushing, Cody
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security Computation and Language
Online Access:	https://arxiv.org/abs/2512.02157
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Factor(U,T): Controlling Untrusted AI by Monitoring their Plans
by: Lip, Edward Lue Chee, et al.
Published: (2025)

Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback
by: Yan, Lecheng, et al.
Published: (2026)

Basic Legibility Protocols Improve Trusted Monitoring
by: Sreevatsa, Ashwin, et al.
Published: (2026)

Enhancing Security and Strengthening Defenses in Automated Short-Answer Grading Systems
by: Yarmohammadtoosky, Sahar, et al.
Published: (2025)

BashArena: A Control Setting for Highly Privileged AI Agents
by: Kaufman, Adam, et al.
Published: (2025)

Subversion via Focal Points: Investigating Collusion in LLM Monitoring
by: Järviniemi, Olli
Published: (2025)

Monitoring the Internal Monologue: Probe Trajectories Reveal Reasoning Dynamics
by: Chrabąszcz, Maciej, et al.
Published: (2026)

ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs
by: Zhao, Gejian, et al.
Published: (2025)

Towards Proactive Defense Against Cyber Cognitive Attacks
by: Rushing, Bonnie, et al.
Published: (2025)

Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents
by: Liang, Zhibo, et al.
Published: (2025)

Private Memorization Editing: Turning Memorization into a Defense to Strengthen Data Privacy in Large Language Models
by: Ruzzetti, Elena Sofia, et al.
Published: (2025)

MCPShield: A Security Cognition Layer for Adaptive Trust Calibration in Model Context Protocol Agents
by: Zhou, Zhenhong, et al.
Published: (2026)

Proof-of-Guardrail in AI Agents and What (Not) to Trust from It
by: Jin, Xisen, et al.
Published: (2026)

Triad: Trusted Timestamps in Untrusted Environments
by: Fernandez, Gabriel P., et al.
Published: (2023)

ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content
by: Chandna, Bhavik, et al.
Published: (2025)

Enforcing Attestable Workflows across Untrusted Networks
by: Dang, Hung, et al.
Published: (2026)

GradEscape: A Gradient-Based Evader Against AI-Generated Text Detectors
by: Meng, Wenlong, et al.
Published: (2025)

Towards Safe AI Clinicians: A Comprehensive Study on Large Language Model Jailbreaking in Healthcare
by: Zhang, Hang, et al.
Published: (2025)

Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy
by: Fu, Yu, et al.
Published: (2023)

Towards Understanding the Cognitive Habits of Large Reasoning Models
by: Dong, Jianshuo, et al.
Published: (2025)

"Give a Positive Review Only": An Early Investigation Into In-Paper Prompt Injection Attacks and Defenses for AI Reviewers
by: Zhou, Qin, et al.
Published: (2025)

Private Aggregate Queries to Untrusted Databases
by: Hafiz, Syed Mahbub, et al.
Published: (2024)

LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries
by: Wu, Zekun, et al.
Published: (2025)

AI Agents May Always Fall for Prompt Injections
by: Abdelnabi, Sahar, et al.
Published: (2026)

Institutional Platform for Secure Self-Service Large Language Model Exploration
by: Bumgardner, V. K. Cody, et al.
Published: (2024)

SMTFL: Secure Model Training to Untrusted Participants in Federated Learning
by: Zhao, Zhihui, et al.
Published: (2025)

Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media
by: Sun, Zhen, et al.
Published: (2024)

Cabin: Confining Untrusted Programs within Confidential VMs
by: Mei, Benshan, et al.
Published: (2024)

Covert Communication for Untrusted UAV-Assisted Wireless Systems
by: Gao, Chan, et al.
Published: (2024)

Towards Trustworthy Federated Learning with Untrusted Participants
by: Allouah, Youssef, et al.
Published: (2025)

In AI Sweet Harmony: Sociopragmatic Guardrail Bypasses and Evaluation-Awareness in OpenAI gpt-oss-20b
by: Durner, Nils
Published: (2025)

AVISE: Framework for Evaluating the Security of AI Systems
by: Lempinen, Mikko, et al.
Published: (2026)

LATTICE: Evaluating Decision Support Utility of Crypto Agents
by: Chan, Aaron, et al.
Published: (2026)

RedacBench: Can AI Erase Your Secrets?
by: Jeon, Hyunjun, et al.
Published: (2026)

Analysis and prevention of AI-based phishing email attacks
by: Eze, Chibuike Samuel, et al.
Published: (2024)

Leveraging ASIC AI Chips for Homomorphic Encryption
by: Tong, Jianming, et al.
Published: (2025)

Enabling Low-Cost Secure Computing on Untrusted In-Memory Architectures
by: Ghinani, Sahar Ghoflsaz, et al.
Published: (2025)

Pirates: Anonymous Group Calls Over Fully Untrusted Infrastructure
by: Coijanovic, Christoph, et al.
Published: (2024)

VelLMes: A high-interaction AI-based deception framework
by: Sladić, Muris, et al.
Published: (2025)

Modeling the Attack: Detecting AI-Generated Text by Quantifying Adversarial Perturbations
by: Teja, Lekkala Sai, et al.
Published: (2025)