Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Wenyi, Jia, Ju, Jia, Xiaojun, Huang, Yihao, Li, Xinfeng, Wu, Cong, Wang, Lina
Format:	Preprint
Published:	2025
Subjects:	Information Retrieval Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.11509
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915247070117888
author	Zhang, Wenyi Jia, Ju Jia, Xiaojun Huang, Yihao Li, Xinfeng Wu, Cong Wang, Lina
author_facet	Zhang, Wenyi Jia, Ju Jia, Xiaojun Huang, Yihao Li, Xinfeng Wu, Cong Wang, Lina
contents	The multimodal datasets can be leveraged to pre-train large-scale vision-language models by providing cross-modal semantics. Current endeavors for determining the usage of datasets mainly focus on single-modal dataset ownership verification through intrusive methods and non-intrusive techniques, while cross-modal approaches remain under-explored. Intrusive methods can adapt to multimodal datasets but degrade model accuracy, while non-intrusive methods rely on label-driven decision boundaries that fail to guarantee stable behaviors for verification. To address these issues, we propose a novel prompt-adapted transferable fingerprinting scheme from a training-free perspective, called PATFinger, which incorporates the global optimal perturbation (GOP) and the adaptive prompts to capture dataset-specific distribution characteristics. Our scheme utilizes inherent dataset attributes as fingerprints instead of compelling the model to learn triggers. The GOP is derived from the sample distribution to maximize embedding drifts between different modalities. Subsequently, our PATFinger re-aligns the adaptive prompt with GOP samples to capture the cross-modal interactions on the carefully crafted surrogate model. This allows the dataset owner to check the usage of datasets by observing specific prediction behaviors linked to the PATFinger during retrieval queries. Extensive experiments demonstrate the effectiveness of our scheme against unauthorized multimodal dataset usage on various cross-modal retrieval architectures by 30% over state-of-the-art baselines.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_11509
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage Zhang, Wenyi Jia, Ju Jia, Xiaojun Huang, Yihao Li, Xinfeng Wu, Cong Wang, Lina Information Retrieval Computer Vision and Pattern Recognition The multimodal datasets can be leveraged to pre-train large-scale vision-language models by providing cross-modal semantics. Current endeavors for determining the usage of datasets mainly focus on single-modal dataset ownership verification through intrusive methods and non-intrusive techniques, while cross-modal approaches remain under-explored. Intrusive methods can adapt to multimodal datasets but degrade model accuracy, while non-intrusive methods rely on label-driven decision boundaries that fail to guarantee stable behaviors for verification. To address these issues, we propose a novel prompt-adapted transferable fingerprinting scheme from a training-free perspective, called PATFinger, which incorporates the global optimal perturbation (GOP) and the adaptive prompts to capture dataset-specific distribution characteristics. Our scheme utilizes inherent dataset attributes as fingerprints instead of compelling the model to learn triggers. The GOP is derived from the sample distribution to maximize embedding drifts between different modalities. Subsequently, our PATFinger re-aligns the adaptive prompt with GOP samples to capture the cross-modal interactions on the carefully crafted surrogate model. This allows the dataset owner to check the usage of datasets by observing specific prediction behaviors linked to the PATFinger during retrieval queries. Extensive experiments demonstrate the effectiveness of our scheme against unauthorized multimodal dataset usage on various cross-modal retrieval architectures by 30% over state-of-the-art baselines.
title	PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage
topic	Information Retrieval Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2504.11509

Similar Items