:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dang, Jacob, Xie, Brian Y., Younis, Omar G.
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.15559
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Subliminal Learning Is Steering Vector Distillation
by: Blank, Camila, et al.
Published: (2026)

Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
by: Schrodi, Simon, et al.
Published: (2025)

Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer
by: Askin, Baris, et al.
Published: (2026)

ClawSafety: "Safe" LLMs, Unsafe Agents
by: Wei, Bowen, et al.
Published: (2026)

Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents
by: Dang, Hung
Published: (2026)

Thought Virus: Viral Misalignment via Subliminal Prompting in Multi-Agent Systems
by: Weckbecker, Moritz, et al.
Published: (2026)

Subliminal Learning is a LoRA Artifact
by: Nief, Todd, et al.
Published: (2026)

Subliminal Learning: Language models transmit behavioral traits via hidden signals in data
by: Cloud, Alex, et al.
Published: (2025)

History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions
by: Salgado, Alberto G. Rodríguez
Published: (2026)

When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents
by: Jones, Jaylen, et al.
Published: (2026)

Sustained Gradient Alignment Mediates Subliminal Learning in a Multi-Step Setting: Evidence from MNIST Auxiliary Logit Distillation Experiment
by: Kitkana, Chayanon, et al.
Published: (2026)

Improving Pre-Trained Vision-Language-Action Policies with Model-Based Search
by: Neary, Cyrus, et al.
Published: (2025)

Behavioral Determinants of Deployed AI Agents in Social Networks: A Multi-Factor Study of Personality, Model, and Guardrail Specification
by: Wilson, Sarah, et al.
Published: (2026)

Subliminal Effects in Your Data: A General Mechanism via Log-Linearity
by: Aden-Ali, Ishaq, et al.
Published: (2026)

Behavioral Transfer in AI Agents: Evidence and Privacy Implications
by: Luo, Shilei, et al.
Published: (2026)

Breach By A Thousand Leaks: Unsafe Information Leakage in `Safe' AI Responses
by: Glukhov, David, et al.
Published: (2024)

Advancing Model Refinement: Muon-Optimized Distillation and Quantization for LLM Deployment
by: Sander, Jacob, et al.
Published: (2026)

Learning Through Noise: Why Subliminal Learning Works and When It Fails
by: Brockers, Vincent C., et al.
Published: (2026)

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
by: Ni, Jingwei, et al.
Published: (2026)

ELISA: An Interpretable Hybrid Generative AI Agent for Expression-Grounded Discovery in Single-Cell Genomics
by: Coser, Omar
Published: (2026)

PANDO: Efficient Multimodal AI Agents via Online Skill Distillation
by: Li, Yubo, et al.
Published: (2026)

Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search
by: Luo, Zeren, et al.
Published: (2025)

Promoting Online Safety by Simulating Unsafe Conversations with LLMs
by: Hoffman, Owen, et al.
Published: (2025)

AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes
by: Qiu, Jiahao, et al.
Published: (2025)

Offline Behavior Distillation
by: Lei, Shiye, et al.
Published: (2024)

Safe-Support Q-Learning: Learning without Unsafe Exploration
by: Lim, Yeeun, et al.
Published: (2026)

Bidirectional Distillation: A Mixed-Play Framework for Multi-Agent Generalizable Behaviors
by: Feng, Lang, et al.
Published: (2025)

Transferable XAI: Relating Understanding Across Domains with Explanation Transfer
by: Wang, Fei, et al.
Published: (2026)

Data-Free Generative Replay for Class-Incremental Learning on Imbalanced Data
by: Younis, Sohaib, et al.
Published: (2024)

Open, Reliable, and Collective: A Community-Driven Framework for Tool-Using AI Agents
by: Dang, Hy, et al.
Published: (2026)

The Agent Behavior: Model, Governance and Challenges in the AI Digital Age
by: Zhang, Qiang, et al.
Published: (2025)

Towards Understanding Unsafe Video Generation
by: Pang, Yan, et al.
Published: (2024)

Forage V2: Knowledge Evolution and Transfer in Autonomous Agent Organizations
by: Xie, Huaqing
Published: (2026)

AIBuildAI: An AI Agent for Automatically Building AI Models
by: Zhang, Ruiyi, et al.
Published: (2026)

Differentiable and Stable Long-Range Tracking of Multiple Posterior Modes
by: Younis, Ali, et al.
Published: (2024)

Predicting AI Agent Behavior through Approximation of the Perron-Frobenius Operator
by: Zhang, Shiqi, et al.
Published: (2024)

Safe Semantics, Unsafe Interpretations: Tackling Implicit Reasoning Safety in Large Vision-Language Models
by: Cai, Wei, et al.
Published: (2025)

Safe Vision-Language Models via Unsafe Weights Manipulation
by: D'Incà, Moreno, et al.
Published: (2025)

AI-Powered Database Management: Predictive Analytics for Performance Tuning
by: Shahwan, Younis Ali, et al.
Published: (2025)

Self-ReSET: Learning to Self-Recover from Unsafe Reasoning Trajectories
by: Zhang, Dongcheng, et al.
Published: (2026)