Saved in:
| Main Authors: | Wang, Han, Wang, Gang, Zhang, Huan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.16721 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Activation Steering Meets Preference Optimization: Defense Against Jailbreaks in Vision Language Models
by: Wu, Sihao, et al.
Published: (2025)
by: Wu, Sihao, et al.
Published: (2025)
White-box Multimodal Jailbreaks Against Large Vision-Language Models
by: Wang, Ruofan, et al.
Published: (2024)
by: Wang, Ruofan, et al.
Published: (2024)
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
by: Zhou, Andy, et al.
Published: (2024)
by: Zhou, Andy, et al.
Published: (2024)
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
by: Wang, Ruofan, et al.
Published: (2024)
by: Wang, Ruofan, et al.
Published: (2024)
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
by: Zhao, Yunhan, et al.
Published: (2024)
by: Zhao, Yunhan, et al.
Published: (2024)
Robustness of Vision Language Models Against Split-Image Harmful Input Attacks
by: Rashid, Md Rafi Ur, et al.
Published: (2026)
by: Rashid, Md Rafi Ur, et al.
Published: (2026)
Understanding and Defending VLM Jailbreaks via Jailbreak-Related Representation Shift
by: Wei, Zhihua, et al.
Published: (2026)
by: Wei, Zhihua, et al.
Published: (2026)
Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks
by: Hossain, Md Zarif, et al.
Published: (2024)
by: Hossain, Md Zarif, et al.
Published: (2024)
Mitigating Entangled Steering in Large Vision-Language Models for Hallucination Reduction
by: Zhang, Yuanhong, et al.
Published: (2026)
by: Zhang, Yuanhong, et al.
Published: (2026)
Beyond Visual Safety: Jailbreaking Multimodal Large Language Models for Harmful Image Generation via Semantic-Agnostic Inputs
by: Yu, Mingyu, et al.
Published: (2026)
by: Yu, Mingyu, et al.
Published: (2026)
Defending LVLMs Against Vision Attacks through Partial-Perception Supervision
by: Zhou, Qi, et al.
Published: (2024)
by: Zhou, Qi, et al.
Published: (2024)
Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models
by: Liang, Shuang, et al.
Published: (2025)
by: Liang, Shuang, et al.
Published: (2025)
Jailbreaks on Vision Language Model via Multimodal Reasoning
by: Noheria, Aarush, et al.
Published: (2026)
by: Noheria, Aarush, et al.
Published: (2026)
Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models
by: Zou, Zhengtao, et al.
Published: (2025)
by: Zou, Zhengtao, et al.
Published: (2025)
Mitigating Hallucination in Vision-Language Models through Barrier-Regulated Adaptive Closed-form Steering
by: Jana, Soumyadeep, et al.
Published: (2026)
by: Jana, Soumyadeep, et al.
Published: (2026)
Benign-to-Toxic Jailbreaking: Inducing Harmful Responses from Harmless Prompts
by: Kim, Hee-Seon, et al.
Published: (2025)
by: Kim, Hee-Seon, et al.
Published: (2025)
ImgTrojan: Jailbreaking Vision-Language Models with ONE Image
by: Tao, Xijia, et al.
Published: (2024)
by: Tao, Xijia, et al.
Published: (2024)
TAIJI: Textual Anchoring for Immunizing Jailbreak Images in Vision Language Models
by: Yin, Xiangyu, et al.
Published: (2025)
by: Yin, Xiangyu, et al.
Published: (2025)
Adversarial Prompt Tuning for Vision-Language Models
by: Zhang, Jiaming, et al.
Published: (2023)
by: Zhang, Jiaming, et al.
Published: (2023)
Learning to Detect Unseen Jailbreak Attacks in Large Vision-Language Models
by: Liang, Shuang, et al.
Published: (2025)
by: Liang, Shuang, et al.
Published: (2025)
Steering Away from Memorization: Reachability-Constrained Reinforcement Learning for Text-to-Image Diffusion
by: Karnik, Sathwik, et al.
Published: (2026)
by: Karnik, Sathwik, et al.
Published: (2026)
When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models
by: Hou, Jiacheng, et al.
Published: (2026)
by: Hou, Jiacheng, et al.
Published: (2026)
Jailbreaking Vision-Language Models Through the Visual Modality
by: Azulay, Aharon, et al.
Published: (2026)
by: Azulay, Aharon, et al.
Published: (2026)
Steering to Say No: Configurable Refusal via Activation Steering in Vision Language Models
by: Yang, Jiaxi, et al.
Published: (2026)
by: Yang, Jiaxi, et al.
Published: (2026)
Defense-to-Attack: Bypassing Weak Defenses Enables Stronger Jailbreaks in Vision-Language Models
by: Zhao, Yunhan, et al.
Published: (2025)
by: Zhao, Yunhan, et al.
Published: (2025)
OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst
by: Cao, Jingtao, et al.
Published: (2024)
by: Cao, Jingtao, et al.
Published: (2024)
VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models
by: Balakrishnan, Ravikumar, et al.
Published: (2025)
by: Balakrishnan, Ravikumar, et al.
Published: (2025)
VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models
by: Phute, Mansi, et al.
Published: (2025)
by: Phute, Mansi, et al.
Published: (2025)
Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models
by: Yin, Jianghao, et al.
Published: (2026)
by: Yin, Jianghao, et al.
Published: (2026)
Adversarial Prompt Distillation for Vision-Language Models
by: Luo, Lin, et al.
Published: (2024)
by: Luo, Lin, et al.
Published: (2024)
Dynamic Token Reduction during Generation for Vision Language Models
by: Liang, Xiaoyu, et al.
Published: (2025)
by: Liang, Xiaoyu, et al.
Published: (2025)
Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models
by: Cui, Kaiyuan, et al.
Published: (2026)
by: Cui, Kaiyuan, et al.
Published: (2026)
Text is All You Need for Vision-Language Model Jailbreaking
by: Chen, Yihang, et al.
Published: (2026)
by: Chen, Yihang, et al.
Published: (2026)
NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models
by: Zhang, Jiaming, et al.
Published: (2025)
by: Zhang, Jiaming, et al.
Published: (2025)
Dropout Prompt Learning: Towards Robust and Adaptive Vision-Language Models
by: Chen, Biao, et al.
Published: (2025)
by: Chen, Biao, et al.
Published: (2025)
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision
by: Liu, Zeyu, et al.
Published: (2026)
by: Liu, Zeyu, et al.
Published: (2026)
A-VL: Adaptive Attention for Large Vision-Language Models
by: Zhang, Junyang, et al.
Published: (2024)
by: Zhang, Junyang, et al.
Published: (2024)
Adaptive Camera Sensor for Vision Models
by: Baek, Eunsu, et al.
Published: (2025)
by: Baek, Eunsu, et al.
Published: (2025)
Hard to Read, Easy to Jailbreak: How Visual Degradation Bypasses MLLM Safety Alignment
by: Song, Zhixue, et al.
Published: (2026)
by: Song, Zhixue, et al.
Published: (2026)
VEQ: Modality-Adaptive Quantization for MoE Vision-Language Models
by: Qin, Guangshuo, et al.
Published: (2026)
by: Qin, Guangshuo, et al.
Published: (2026)
Similar Items
-
Activation Steering Meets Preference Optimization: Defense Against Jailbreaks in Vision Language Models
by: Wu, Sihao, et al.
Published: (2025) -
White-box Multimodal Jailbreaks Against Large Vision-Language Models
by: Wang, Ruofan, et al.
Published: (2024) -
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
by: Zhou, Andy, et al.
Published: (2024) -
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
by: Wang, Ruofan, et al.
Published: (2024) -
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
by: Zhao, Yunhan, et al.
Published: (2024)