Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ngong, Ivoline C., Reza, Zarreen, Near, Joseph P.
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.04894
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910041780518912
author	Ngong, Ivoline C. Reza, Zarreen Near, Joseph P.
author_facet	Ngong, Ivoline C. Reza, Zarreen Near, Joseph P.
contents	Vision-language models are increasingly applied to sensitive domains such as medical imaging and personal photographs, yet existing differentially private methods for in-context learning are limited to few-shot, text-only settings because privacy cost scales with the number of tokens processed. We present Differentially Private Multimodal Task Vectors (DP-MTV), the first framework enabling many-shot multimodal in-context learning with formal $(\varepsilon, δ)$-differential privacy by aggregating hundreds of demonstrations into compact task vectors in activation space. DP-MTV partitions private data into disjoint chunks, applies per-layer clipping to bound sensitivity, and adds calibrated noise to the aggregate, requiring only a single noise addition that enables unlimited inference queries. We evaluate on eight benchmarks across three VLM architectures, supporting deployment with or without auxiliary data. At $\varepsilon=1.0$, DP-MTV achieves 50% on VizWiz compared to 55% non-private and 35% zero-shot, preserving most of the gain from in-context learning under meaningful privacy constraints.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_04894
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Differentially Private Multimodal In-Context Learning Ngong, Ivoline C. Reza, Zarreen Near, Joseph P. Artificial Intelligence Vision-language models are increasingly applied to sensitive domains such as medical imaging and personal photographs, yet existing differentially private methods for in-context learning are limited to few-shot, text-only settings because privacy cost scales with the number of tokens processed. We present Differentially Private Multimodal Task Vectors (DP-MTV), the first framework enabling many-shot multimodal in-context learning with formal $(\varepsilon, δ)$-differential privacy by aggregating hundreds of demonstrations into compact task vectors in activation space. DP-MTV partitions private data into disjoint chunks, applies per-layer clipping to bound sensitivity, and adds calibrated noise to the aggregate, requiring only a single noise addition that enables unlimited inference queries. We evaluate on eight benchmarks across three VLM architectures, supporting deployment with or without auxiliary data. At $\varepsilon=1.0$, DP-MTV achieves 50% on VizWiz compared to 55% non-private and 35% zero-shot, preserving most of the gain from in-context learning under meaningful privacy constraints.
title	Differentially Private Multimodal In-Context Learning
topic	Artificial Intelligence
url	https://arxiv.org/abs/2603.04894

Similar Items