Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ueno, Shiryu, Hayashi, Yoshikazu, Nakatsuka, Shunsuke, Yamada, Yusei, Aizawa, Hiroaki, Kato, Kunihito
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2502.09057
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915149623853056
author	Ueno, Shiryu Hayashi, Yoshikazu Nakatsuka, Shunsuke Yamada, Yusei Aizawa, Hiroaki Kato, Kunihito
author_facet	Ueno, Shiryu Hayashi, Yoshikazu Nakatsuka, Shunsuke Yamada, Yusei Aizawa, Hiroaki Kato, Kunihito
contents	We propose general visual inspection model using Vision-Language Model~(VLM) with few-shot images of non-defective or defective products, along with explanatory texts that serve as inspection criteria. Although existing VLM exhibit high performance across various tasks, they are not trained on specific tasks such as visual inspection. Thus, we construct a dataset consisting of diverse images of non-defective and defective products collected from the web, along with unified formatted output text, and fine-tune VLM. For new products, our method employs In-Context Learning, which allows the model to perform inspections with an example of non-defective or defective image and the corresponding explanatory texts with visual prompts. This approach eliminates the need to collect a large number of training samples and re-train the model for each product. The experimental results show that our method achieves high performance, with MCC of 0.804 and F1-score of 0.950 on MVTec AD in a one-shot manner. Our code is available at~https://github.com/ia-gu/Vision-Language-In-Context-Learning-Driven-Few-Shot-Visual-Inspection-Model.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_09057
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Vision-Language In-Context Learning Driven Few-Shot Visual Inspection Model Ueno, Shiryu Hayashi, Yoshikazu Nakatsuka, Shunsuke Yamada, Yusei Aizawa, Hiroaki Kato, Kunihito Computer Vision and Pattern Recognition We propose general visual inspection model using Vision-Language Model~(VLM) with few-shot images of non-defective or defective products, along with explanatory texts that serve as inspection criteria. Although existing VLM exhibit high performance across various tasks, they are not trained on specific tasks such as visual inspection. Thus, we construct a dataset consisting of diverse images of non-defective and defective products collected from the web, along with unified formatted output text, and fine-tune VLM. For new products, our method employs In-Context Learning, which allows the model to perform inspections with an example of non-defective or defective image and the corresponding explanatory texts with visual prompts. This approach eliminates the need to collect a large number of training samples and re-train the model for each product. The experimental results show that our method achieves high performance, with MCC of 0.804 and F1-score of 0.950 on MVTec AD in a one-shot manner. Our code is available at~https://github.com/ia-gu/Vision-Language-In-Context-Learning-Driven-Few-Shot-Visual-Inspection-Model.
title	Vision-Language In-Context Learning Driven Few-Shot Visual Inspection Model
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2502.09057

Similar Items