Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Hrynenko, Olena, Baranouskaya, Darya, Baia, Alina Elena, Cavallaro, Andrea
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.07931
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915784673984512
author	Hrynenko, Olena Baranouskaya, Darya Baia, Alina Elena Cavallaro, Andrea
author_facet	Hrynenko, Olena Baranouskaya, Darya Baia, Alina Elena Cavallaro, Andrea
contents	Visual Language Models (VLMs) are often used for zero-shot detection of visual attributes in the image. We present a zero-shot evaluation of open-source VLMs for privacy-related attribute recognition. We identify the attributes for which VLMs exhibit strong inter-annotator agreement, and discuss the disagreement cases of human and VLM annotations. Our results show that when evaluated against human annotations, VLMs tend to predict the presence of privacy attributes more often than human annotators. In addition to this, we find that in cases of high inter-annotator agreement between VLMs, they can complement human annotation by identifying attributes overlooked by human annotators. This highlights the potential of VLMs to support privacy annotations in large-scale image datasets.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_07931
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Which private attributes do VLMs agree on and predict well? Hrynenko, Olena Baranouskaya, Darya Baia, Alina Elena Cavallaro, Andrea Computer Vision and Pattern Recognition Visual Language Models (VLMs) are often used for zero-shot detection of visual attributes in the image. We present a zero-shot evaluation of open-source VLMs for privacy-related attribute recognition. We identify the attributes for which VLMs exhibit strong inter-annotator agreement, and discuss the disagreement cases of human and VLM annotations. Our results show that when evaluated against human annotations, VLMs tend to predict the presence of privacy attributes more often than human annotators. In addition to this, we find that in cases of high inter-annotator agreement between VLMs, they can complement human annotation by identifying attributes overlooked by human annotators. This highlights the potential of VLMs to support privacy annotations in large-scale image datasets.
title	Which private attributes do VLMs agree on and predict well?
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.07931

Similar Items