Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Choi, Sangbum, Go, Kyeongryeol, Jang, Taewoong
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2507.04270
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908634850525184
author	Choi, Sangbum Go, Kyeongryeol Jang, Taewoong
author_facet	Choi, Sangbum Go, Kyeongryeol Jang, Taewoong
contents	Foundation models have revolutionized AI, yet they struggle with zero-shot deployment in real-world industrial settings due to a lack of high-quality, domain-specific datasets. To bridge this gap, Superb AI introduces ZERO, an industry-ready vision foundation model that leverages multi-modal prompting (textual and visual) for generalization without retraining. Trained on a compact yet representative 0.9 million annotated samples from a proprietary billion-scale industrial dataset, ZERO demonstrates competitive performance on academic benchmarks like LVIS-Val and significantly outperforms existing models across 37 diverse industrial datasets. Furthermore, ZERO achieved 2nd place in the CVPR 2025 Object Instance Detection Challenge and 4th place in the Foundational Few-shot Object Detection Challenge, highlighting its practical deployability and generalizability with minimal adaptation and limited data. To the best of our knowledge, ZERO is the first vision foundation model explicitly built for domain-specific, zero-shot industrial applications.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_04270
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	ZERO: Industry-ready Vision Foundation Model with Multi-modal Prompts Choi, Sangbum Go, Kyeongryeol Jang, Taewoong Computer Vision and Pattern Recognition Artificial Intelligence Foundation models have revolutionized AI, yet they struggle with zero-shot deployment in real-world industrial settings due to a lack of high-quality, domain-specific datasets. To bridge this gap, Superb AI introduces ZERO, an industry-ready vision foundation model that leverages multi-modal prompting (textual and visual) for generalization without retraining. Trained on a compact yet representative 0.9 million annotated samples from a proprietary billion-scale industrial dataset, ZERO demonstrates competitive performance on academic benchmarks like LVIS-Val and significantly outperforms existing models across 37 diverse industrial datasets. Furthermore, ZERO achieved 2nd place in the CVPR 2025 Object Instance Detection Challenge and 4th place in the Foundational Few-shot Object Detection Challenge, highlighting its practical deployability and generalizability with minimal adaptation and limited data. To the best of our knowledge, ZERO is the first vision foundation model explicitly built for domain-specific, zero-shot industrial applications.
title	ZERO: Industry-ready Vision Foundation Model with Multi-modal Prompts
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2507.04270

Similar Items