MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Balakrishnan, Ravikumar, Phute, Mansi
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Computer Vision and Pattern Recognition Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2509.25533
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866909815446437888
author	Balakrishnan, Ravikumar Phute, Mansi
author_facet	Balakrishnan, Ravikumar Phute, Mansi
contents	As Vision Language Models (VLMs) are deployed across safety-critical applications, understanding and controlling their behavioral patterns has become increasingly important. Existing behavioral control methods face significant limitations: system prompting approaches could easily be overridden by user instructions, while applying activation-based steering vectors requires invasive runtime access to model internals, precluding deployment with API-based services and closed-source models. Finding steering methods that transfer across multiple VLMs is still an open area of research. To this end, we introduce universal visual input based steering for output redirection (VISOR++), to achieve behavioral control through optimized visual inputs alone. We demonstrate that a single VISOR++ image can be generated for an ensemble of VLMs to emulate each of their steering vectors. By crafting universal visual inputs that induce target activation patterns, VISOR++ eliminates the need for runtime model access while remaining deployment-agnostic. This means that when an underlying model supports multimodal capability, model behaviors can be steered by inserting an image input replacing runtime steering vector based interventions. We first demonstrate the effectiveness of the VISOR++ images on open-access models such as LLaVA-1.5-7B and IDEFICS2-8B along three alignment directions: refusal, sycophancy and survival instinct. Both the model-specific steering images and the jointly optimized images achieve performance parity closely following that of steering vectors for both positive and negative steering tasks. We also show the promise of VISOR++ images in achieving directional behavioral shifts for unseen models including both open-access and closed-access ones. Furthermore, VISOR++ images are able to preserve 99.9% performance on 14,000 unrelated MMLU evaluation tasks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_25533
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models Balakrishnan, Ravikumar Phute, Mansi Computer Vision and Pattern Recognition Artificial Intelligence As Vision Language Models (VLMs) are deployed across safety-critical applications, understanding and controlling their behavioral patterns has become increasingly important. Existing behavioral control methods face significant limitations: system prompting approaches could easily be overridden by user instructions, while applying activation-based steering vectors requires invasive runtime access to model internals, precluding deployment with API-based services and closed-source models. Finding steering methods that transfer across multiple VLMs is still an open area of research. To this end, we introduce universal visual input based steering for output redirection (VISOR++), to achieve behavioral control through optimized visual inputs alone. We demonstrate that a single VISOR++ image can be generated for an ensemble of VLMs to emulate each of their steering vectors. By crafting universal visual inputs that induce target activation patterns, VISOR++ eliminates the need for runtime model access while remaining deployment-agnostic. This means that when an underlying model supports multimodal capability, model behaviors can be steered by inserting an image input replacing runtime steering vector based interventions. We first demonstrate the effectiveness of the VISOR++ images on open-access models such as LLaVA-1.5-7B and IDEFICS2-8B along three alignment directions: refusal, sycophancy and survival instinct. Both the model-specific steering images and the jointly optimized images achieve performance parity closely following that of steering vectors for both positive and negative steering tasks. We also show the promise of VISOR++ images in achieving directional behavioral shifts for unseen models including both open-access and closed-access ones. Furthermore, VISOR++ images are able to preserve 99.9% performance on 14,000 unrelated MMLU evaluation tasks.
title	VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2509.25533

Documenti analoghi