:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Yige, Feng, Yunhao, Sun, Jun
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.27117
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security
by: Zhao, Wei, et al.
Published: (2025)

Do Influence Functions Work on Large Language Models?
by: Li, Zhe, et al.
Published: (2024)

Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing
by: Zhao, Wei, et al.
Published: (2024)

Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing
by: Li, Zhe, et al.
Published: (2025)

Unleashing the Unseen: Harnessing Benign Datasets for Jailbreaking Large Language Models
by: Zhao, Wei, et al.
Published: (2024)

BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models
by: Li, Yige, et al.
Published: (2024)

AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents
by: Feng, Yunhao, et al.
Published: (2026)

Position: Intelligent Science Laboratory Requires the Integration of Cognitive and Embodied AI
by: Zhang, Sha, et al.
Published: (2025)

Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements
by: Zhang, Jingyu, et al.
Published: (2024)

Position: Require Frontier AI Labs To Release Small "Analog" Models
by: Upadhyay, Shriyash, et al.
Published: (2025)

Position: Human-Centric AI Requires a Minimum Viable Level of Human Understanding
by: Lin, Fangzhou, et al.
Published: (2026)

Position: Embodied AI Requires a Privacy-Utility Trade-off
by: Fan, Xiaoliang, et al.
Published: (2026)

Zero-Shot Defense Against Toxic Images via Inherent Multimodal Alignment in LVLMs
by: Zhao, Wei, et al.
Published: (2025)

Position: AI Safety Must Embrace an Antifragile Perspective
by: Jin, Ming, et al.
Published: (2025)

BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents
by: Feng, Yunhao, et al.
Published: (2026)

CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
by: Min, Nay Myat, et al.
Published: (2024)

Adaptive Content Restriction for Large Language Models via Suffix Optimization
by: Li, Yige, et al.
Published: (2025)

AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models
by: Zhang, Jiaming, et al.
Published: (2024)

Dataset Safety in Autonomous Driving: Requirements, Risks, and Assurance
by: Abbaspour, Alireza, et al.
Published: (2025)

Building Effective Safety Guardrails in AI Education Tools
by: Clark, Hannah-Beth, et al.
Published: (2025)

Negative as Positive: Enhancing Out-of-distribution Generalization for Graph Contrastive Learning
by: Wang, Zixu, et al.
Published: (2024)

Why AI Safety Requires Uncertainty, Incomplete Preferences, and Non-Archimedean Utilities
by: Benavoli, Alessio, et al.
Published: (2025)

Foundational Analysis of Safety Engineering Requirements (SAFER)
by: Chemo, Noga, et al.
Published: (2026)

Engineering Safety Requirements for Autonomous Driving with Large Language Models
by: Nouri, Ali, et al.
Published: (2024)

Position: Agentic AI System Is a Foreseeable Pathway to AGI
by: Liao, Junwei, et al.
Published: (2026)

AutoBackdoor: Automating Backdoor Attacks via LLM Agents
by: Li, Yige, et al.
Published: (2025)

Position: Retire the "Positive Backdoor" Label -- Secret Alignment Requires Strict and Systematic Evaluation
by: Li, Jianwei, et al.
Published: (2026)

Position: Safety and Fairness in Agentic AI Depend on Interaction Topology, Not on Model Scale or Alignment
by: Bajaj, Tanav Singh, et al.
Published: (2026)

Defining Explainable AI for Requirements Analysis
by: Sheh, Raymond, et al.
Published: (2026)

Engaging with AI: How Interface Design Shapes Human-AI Collaboration in High-Stakes Decision-Making
by: Chen, Zichen, et al.
Published: (2025)

NeurIPS Should Require Reproducibility Standards for Frontier AI Safety Claims
by: Vishwarupe, Varad, et al.
Published: (2026)

Responsible Agentic AI Requires Explicit Provenance
by: Hu, Jinwei, et al.
Published: (2026)

LinSATNet: The Positive Linear Satisfiability Neural Networks
by: Wang, Runzhong, et al.
Published: (2024)

Position: State-of-the-Art Claims Require State-of-the-Art Evidence
by: Oh, YongKyung
Published: (2026)

AIR: Improving Agent Safety through Incident Response
by: Xiao, Zibo, et al.
Published: (2026)

Games for AI Control: Models of Safety Evaluations of AI Deployment Protocols
by: Griffin, Charlie, et al.
Published: (2024)

AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection
by: Luo, Weidi, et al.
Published: (2025)

A New Perspective On AI Safety Through Control Theory Methodologies
by: Ullrich, Lars, et al.
Published: (2025)

Internal Safety Collapse in Frontier Large Language Models
by: Wu, Yutao, et al.
Published: (2026)

AI2-Active Safety: AI-enabled Interaction-aware Active Safety Analysis with Vehicle Dynamics
by: Wu, Keshu, et al.
Published: (2025)