Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhao, Jiale, Mou, Xing, Wu, Jinlin, Yu, Hongyuan, Sun, Mingrui, Shi, Yang, Yin, Xuanwu, Chen, Zhen, Lei, Zhen, Wang, Yaohua
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2601.04199
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918277585829888
author	Zhao, Jiale Mou, Xing Wu, Jinlin Yu, Hongyuan Sun, Mingrui Shi, Yang Yin, Xuanwu Chen, Zhen Lei, Zhen Wang, Yaohua
author_facet	Zhao, Jiale Mou, Xing Wu, Jinlin Yu, Hongyuan Sun, Mingrui Shi, Yang Yin, Xuanwu Chen, Zhen Lei, Zhen Wang, Yaohua
contents	Medical Multimodal Large Language Models (Medical MLLMs) have achieved remarkable progress in specialized medical tasks; however, research into their safety has lagged, posing potential risks for real-world deployment. In this paper, we first establish a multidimensional evaluation framework to systematically benchmark the safety of current SOTA Medical MLLMs. Our empirical analysis reveals pervasive vulnerabilities across both general and medical-specific safety dimensions in existing models, particularly highlighting their fragility against cross-modality jailbreak attacks. Furthermore, we find that the medical fine-tuning process frequently induces catastrophic forgetting of the model's original safety alignment. To address this challenge, we propose a novel "Parameter-Space Intervention" approach for efficient safety re-alignment. This method extracts intrinsic safety knowledge representations from original base models and concurrently injects them into the target model during the construction of medical capabilities. Additionally, we design a fine-grained parameter search algorithm to achieve an optimal trade-off between safety and medical performance. Experimental results demonstrate that our approach significantly bolsters the safety guardrails of Medical MLLMs without relying on additional domain-specific safety data, while minimizing degradation to core medical performance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_04199
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	The Forgotten Shield: Safety Grafting in Parameter-Space for Medical MLLMs Zhao, Jiale Mou, Xing Wu, Jinlin Yu, Hongyuan Sun, Mingrui Shi, Yang Yin, Xuanwu Chen, Zhen Lei, Zhen Wang, Yaohua Machine Learning Artificial Intelligence Computation and Language Medical Multimodal Large Language Models (Medical MLLMs) have achieved remarkable progress in specialized medical tasks; however, research into their safety has lagged, posing potential risks for real-world deployment. In this paper, we first establish a multidimensional evaluation framework to systematically benchmark the safety of current SOTA Medical MLLMs. Our empirical analysis reveals pervasive vulnerabilities across both general and medical-specific safety dimensions in existing models, particularly highlighting their fragility against cross-modality jailbreak attacks. Furthermore, we find that the medical fine-tuning process frequently induces catastrophic forgetting of the model's original safety alignment. To address this challenge, we propose a novel "Parameter-Space Intervention" approach for efficient safety re-alignment. This method extracts intrinsic safety knowledge representations from original base models and concurrently injects them into the target model during the construction of medical capabilities. Additionally, we design a fine-grained parameter search algorithm to achieve an optimal trade-off between safety and medical performance. Experimental results demonstrate that our approach significantly bolsters the safety guardrails of Medical MLLMs without relying on additional domain-specific safety data, while minimizing degradation to core medical performance.
title	The Forgotten Shield: Safety Grafting in Parameter-Space for Medical MLLMs
topic	Machine Learning Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2601.04199

Similar Items