:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Pan, Jonathan
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence Cryptography and Security
Online Access:	https://arxiv.org/abs/2601.12286
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

In-Context Representation Hijacking
by: Yona, Itay, et al.
Published: (2025)

Context Misleads LLMs: The Role of Context Filtering in Maintaining Safe Alignment of LLMs
by: Kim, Jinhwa, et al.
Published: (2025)

NeuroFilter: Privacy Guardrails for Conversational LLM Agents
by: Das, Saswat, et al.
Published: (2026)

Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
by: Ngong, Ivoline, et al.
Published: (2025)

Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models
by: Ying, Zonghao, et al.
Published: (2025)

Guarding Your Conversations: Privacy Gatekeepers for Secure Interactions with Cloud-Based AI Models
by: Uzor, GodsGift, et al.
Published: (2025)

Universal and Context-Independent Triggers for Precise Control of LLM Outputs
by: Liang, Jiashuo, et al.
Published: (2024)

Reverse-Engineering Model Editing on Language Models
by: Sun, Zhiyu, et al.
Published: (2026)

CATMark: A Context-Aware Thresholding Framework for Robust Cross-Task Watermarking in Large Language Models
by: Zhang, Yu, et al.
Published: (2025)

CCJA: Context-Coherent Jailbreak Attack for Aligned Large Language Models
by: Zhou, Guanghao, et al.
Published: (2025)

REEF: Representation Encoding Fingerprints for Large Language Models
by: Zhang, Jie, et al.
Published: (2024)

Swiss-Bench 003: Evaluating LLM Reliability and Adversarial Security for Swiss Regulatory Contexts
by: Uenal, Fatih
Published: (2026)

CREBench: Evaluating Large Language Models in Cryptographic Binary Reverse Engineering
by: Chen, Baicheng, et al.
Published: (2026)

Mitigating the Safety-utility Trade-off in LLM Alignment via Adaptive Safe Context Learning
by: Wang, Yanbo, et al.
Published: (2026)

RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
by: Xu, Huiyu, et al.
Published: (2024)

MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content
by: Guo, Ruoqi, et al.
Published: (2026)

Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs
by: Xing, Wenpeng, et al.
Published: (2025)

Shadow Unlearning: A Neuro-Semantic Approach to Fidelity-Preserving Faceless Forgetting in LLMs
by: P, Dinesh Srivasthav, et al.
Published: (2026)

Direct Token Optimization: A Self-contained Approach to Large Language Model Unlearning
by: Lee, Hong kyu, et al.
Published: (2025)

Medical Malice: A Dataset for Context-Aware Safety in Healthcare LLMs
by: D'addario, Andrew Maranhão Ventura
Published: (2025)

Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models
by: Chu, Junjie, et al.
Published: (2024)

AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMs
by: Lv, Lijia, et al.
Published: (2024)

Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning
by: Zhao, Shuai, et al.
Published: (2024)

A Survey of Recent Backdoor Attacks and Defenses in Large Language Models
by: Zhao, Shuai, et al.
Published: (2024)

DataShield: Safety-degrading Data Filtering for LLM Benign Instruction Fine-Tuning
by: Zhang, Junbo, et al.
Published: (2026)

PRISON: Unmasking the Criminal Potential of Large Language Models
by: Wu, Xinyi, et al.
Published: (2025)

Can LLMs Infer Conversational Agent Users' Personality Traits from Chat History?
by: Cögendez, Derya, et al.
Published: (2026)

The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
by: Xu, Rongwu, et al.
Published: (2023)

SWAN: Semantic Watermarking with Abstract Meaning Representation
by: Ye, Ziping, et al.
Published: (2026)

LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models
by: Yu, Miao, et al.
Published: (2024)

Beyond Context: Large Language Models' Failure to Grasp Users' Intent
by: Hussain, Ahmed M., et al.
Published: (2025)

Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
by: Zhu, Xiaoyuan, et al.
Published: (2025)

Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models
by: Tu, Shangqing, et al.
Published: (2024)

Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models
by: Zhang, Yihao, et al.
Published: (2024)

One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue
by: Shen, Xinjie, et al.
Published: (2026)

Federated In-Context LLM Agent Learning
by: Wu, Panlong, et al.
Published: (2024)

Securing Multi-turn Conversational Language Models From Distributed Backdoor Triggers
by: Tong, Terry, et al.
Published: (2024)

How Private is Your Attention? Bridging Privacy with In-Context Learning
by: Bonnerjee, Soham, et al.
Published: (2025)

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations
by: Wei, Zeming, et al.
Published: (2023)

Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval
by: Chen, Taiye, et al.
Published: (2025)