Saved in:
Bibliographic Details
Main Authors: Choi, Sangbum, Go, Kyeongryeol, Jang, Taewoong
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.04270
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908634850525184
author Choi, Sangbum
Go, Kyeongryeol
Jang, Taewoong
author_facet Choi, Sangbum
Go, Kyeongryeol
Jang, Taewoong
contents Foundation models have revolutionized AI, yet they struggle with zero-shot deployment in real-world industrial settings due to a lack of high-quality, domain-specific datasets. To bridge this gap, Superb AI introduces ZERO, an industry-ready vision foundation model that leverages multi-modal prompting (textual and visual) for generalization without retraining. Trained on a compact yet representative 0.9 million annotated samples from a proprietary billion-scale industrial dataset, ZERO demonstrates competitive performance on academic benchmarks like LVIS-Val and significantly outperforms existing models across 37 diverse industrial datasets. Furthermore, ZERO achieved 2nd place in the CVPR 2025 Object Instance Detection Challenge and 4th place in the Foundational Few-shot Object Detection Challenge, highlighting its practical deployability and generalizability with minimal adaptation and limited data. To the best of our knowledge, ZERO is the first vision foundation model explicitly built for domain-specific, zero-shot industrial applications.
format Preprint
id arxiv_https___arxiv_org_abs_2507_04270
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle ZERO: Industry-ready Vision Foundation Model with Multi-modal Prompts
Choi, Sangbum
Go, Kyeongryeol
Jang, Taewoong
Computer Vision and Pattern Recognition
Artificial Intelligence
Foundation models have revolutionized AI, yet they struggle with zero-shot deployment in real-world industrial settings due to a lack of high-quality, domain-specific datasets. To bridge this gap, Superb AI introduces ZERO, an industry-ready vision foundation model that leverages multi-modal prompting (textual and visual) for generalization without retraining. Trained on a compact yet representative 0.9 million annotated samples from a proprietary billion-scale industrial dataset, ZERO demonstrates competitive performance on academic benchmarks like LVIS-Val and significantly outperforms existing models across 37 diverse industrial datasets. Furthermore, ZERO achieved 2nd place in the CVPR 2025 Object Instance Detection Challenge and 4th place in the Foundational Few-shot Object Detection Challenge, highlighting its practical deployability and generalizability with minimal adaptation and limited data. To the best of our knowledge, ZERO is the first vision foundation model explicitly built for domain-specific, zero-shot industrial applications.
title ZERO: Industry-ready Vision Foundation Model with Multi-modal Prompts
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2507.04270