Saved in:
| Main Authors: | Hu, Xiangdong, Jiang, Yangyang, Hu, Qin, Jia, Xiaojun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.03416 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems
by: Guo, Qi, et al.
Published: (2025)
by: Guo, Qi, et al.
Published: (2025)
White-box Multimodal Jailbreaks Against Large Vision-Language Models
by: Wang, Ruofan, et al.
Published: (2024)
by: Wang, Ruofan, et al.
Published: (2024)
Distraction is All You Need for Multimodal Large Language Model Jailbreaking
by: Yang, Zuopeng, et al.
Published: (2025)
by: Yang, Zuopeng, et al.
Published: (2025)
Jailbreaking Safeguarded Text-to-Image Models via Large Language Models
by: Jiang, Zhengyuan, et al.
Published: (2025)
by: Jiang, Zhengyuan, et al.
Published: (2025)
SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning
by: Chen, Junkai, et al.
Published: (2025)
by: Chen, Junkai, et al.
Published: (2025)
MME-SCI: A Comprehensive and Challenging Science Benchmark for Multimodal Large Language Models
by: Ruan, Jiacheng, et al.
Published: (2025)
by: Ruan, Jiacheng, et al.
Published: (2025)
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
by: Jin, Haibo, et al.
Published: (2024)
by: Jin, Haibo, et al.
Published: (2024)
VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models
by: Sima, Bingrui, et al.
Published: (2025)
by: Sima, Bingrui, et al.
Published: (2025)
VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack
by: Zhao, Shiji, et al.
Published: (2025)
by: Zhao, Shiji, et al.
Published: (2025)
Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models
by: Wang, Lehan, et al.
Published: (2025)
by: Wang, Lehan, et al.
Published: (2025)
Perception-guided Jailbreak against Text-to-Image Models
by: Huang, Yihao, et al.
Published: (2024)
by: Huang, Yihao, et al.
Published: (2024)
CrashChat: A Multimodal Large Language Model for Multitask Traffic Crash Video Analysis
by: Liang, Kaidi, et al.
Published: (2025)
by: Liang, Kaidi, et al.
Published: (2025)
Jailbreaking Attack against Multimodal Large Language Model
by: Niu, Zhenxing, et al.
Published: (2024)
by: Niu, Zhenxing, et al.
Published: (2024)
PPE: Positional Preservation Embedding for Token Compression in Multimodal Large Language Models
by: Huang, Mouxiao, et al.
Published: (2025)
by: Huang, Mouxiao, et al.
Published: (2025)
Jailbreaking Multimodal Large Language Models using Multi-Clip Video
by: Kang, Choongwon, et al.
Published: (2026)
by: Kang, Choongwon, et al.
Published: (2026)
Beyond Visual Safety: Jailbreaking Multimodal Large Language Models for Harmful Image Generation via Semantic-Agnostic Inputs
by: Yu, Mingyu, et al.
Published: (2026)
by: Yu, Mingyu, et al.
Published: (2026)
Evaluating Large Language Models on Multimodal Chemistry Olympiad Exams
by: Cui, Yiming, et al.
Published: (2025)
by: Cui, Yiming, et al.
Published: (2025)
Towards Unified Facial Action Unit Recognition Framework by Large Language Models
by: Hu, Guohong, et al.
Published: (2024)
by: Hu, Guohong, et al.
Published: (2024)
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
by: Li, Yifan, et al.
Published: (2024)
by: Li, Yifan, et al.
Published: (2024)
Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
by: Jiang, Lei, et al.
Published: (2025)
by: Jiang, Lei, et al.
Published: (2025)
Mitigating Visual Context Degradation in Large Multimodal Models: A Training-Free Decoupled Agentic Framework
by: Jia, Hongrui, et al.
Published: (2025)
by: Jia, Hongrui, et al.
Published: (2025)
Unleashing the Intrinsic Visual Representation Capability of Multimodal Large Language Models
by: Li, Hengzhuang, et al.
Published: (2025)
by: Li, Hengzhuang, et al.
Published: (2025)
Visual Attention Drifts,but Anchors Hold:Mitigating Hallucination in Multimodal Large Language Models via Cross-Layer Visual Anchors
by: Yang, Chengxu, et al.
Published: (2026)
by: Yang, Chengxu, et al.
Published: (2026)
OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation
by: Jia, Xiaojun, et al.
Published: (2025)
by: Jia, Xiaojun, et al.
Published: (2025)
LoC-Path: Learning to Compress for Pathology Multimodal Large Language Models
by: Hu, Qingqiao, et al.
Published: (2025)
by: Hu, Qingqiao, et al.
Published: (2025)
Jailbreaks on Vision Language Model via Multimodal Reasoning
by: Noheria, Aarush, et al.
Published: (2026)
by: Noheria, Aarush, et al.
Published: (2026)
ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models
by: Tu, Yahan, et al.
Published: (2024)
by: Tu, Yahan, et al.
Published: (2024)
E$^2$AT: Multimodal Jailbreak Defense via Dynamic Joint Optimization for Multimodal Large Language Models
by: Lu, Liming, et al.
Published: (2025)
by: Lu, Liming, et al.
Published: (2025)
Large Language Model Aided Birt-Hogg-Dube Syndrome Diagnosis with Multimodal Retrieval-Augmented Generation
by: Li, Haoqing, et al.
Published: (2025)
by: Li, Haoqing, et al.
Published: (2025)
ST$^3$: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming
by: Zhuang, Jiedong, et al.
Published: (2024)
by: Zhuang, Jiedong, et al.
Published: (2024)
MIBench: Evaluating Multimodal Large Language Models over Multiple Images
by: Liu, Haowei, et al.
Published: (2024)
by: Liu, Haowei, et al.
Published: (2024)
ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation
by: Wu, Mengyang, et al.
Published: (2024)
by: Wu, Mengyang, et al.
Published: (2024)
TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model
by: Hu, Yujie, et al.
Published: (2025)
by: Hu, Yujie, et al.
Published: (2025)
Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models
by: Zou, Xin, et al.
Published: (2024)
by: Zou, Xin, et al.
Published: (2024)
SPR-128K: A New Benchmark for Spatial Plausibility Reasoning with Multimodal Large Language Models
by: Hu, Zhiyuan, et al.
Published: (2025)
by: Hu, Zhiyuan, et al.
Published: (2025)
Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language Models
by: Zhang, Linghao, et al.
Published: (2026)
by: Zhang, Linghao, et al.
Published: (2026)
Jailbreak Large Vision-Language Models Through Multi-Modal Linkage
by: Wang, Yu, et al.
Published: (2024)
by: Wang, Yu, et al.
Published: (2024)
HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation
by: Liu, Tengfei, et al.
Published: (2024)
by: Liu, Tengfei, et al.
Published: (2024)
Seeing Is Believing? A Benchmark for Multimodal Large Language Models on Visual Illusions and Anomalies
by: Hou, Wenjin, et al.
Published: (2026)
by: Hou, Wenjin, et al.
Published: (2026)
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
by: Wang, Ruofan, et al.
Published: (2024)
by: Wang, Ruofan, et al.
Published: (2024)
Similar Items
-
PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems
by: Guo, Qi, et al.
Published: (2025) -
White-box Multimodal Jailbreaks Against Large Vision-Language Models
by: Wang, Ruofan, et al.
Published: (2024) -
Distraction is All You Need for Multimodal Large Language Model Jailbreaking
by: Yang, Zuopeng, et al.
Published: (2025) -
Jailbreaking Safeguarded Text-to-Image Models via Large Language Models
by: Jiang, Zhengyuan, et al.
Published: (2025) -
SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning
by: Chen, Junkai, et al.
Published: (2025)