Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ren, Tianhe, Liu, Shilong, Zeng, Ailing, Lin, Jing, Li, Kunchang, Cao, He, Chen, Jiayu, Huang, Xinyu, Chen, Yukang, Yan, Feng, Zeng, Zhaoyang, Zhang, Hao, Li, Feng, Yang, Jie, Li, Hongyang, Jiang, Qing, Zhang, Lei
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2401.14159
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929224122630144
author	Ren, Tianhe Liu, Shilong Zeng, Ailing Lin, Jing Li, Kunchang Cao, He Chen, Jiayu Huang, Xinyu Chen, Yukang Yan, Feng Zeng, Zhaoyang Zhang, Hao Li, Feng Yang, Jie Li, Hongyang Jiang, Qing Zhang, Lei
author_facet	Ren, Tianhe Liu, Shilong Zeng, Ailing Lin, Jing Li, Kunchang Cao, He Chen, Jiayu Huang, Xinyu Chen, Yukang Yan, Feng Zeng, Zhaoyang Zhang, Hao Li, Feng Yang, Jie Li, Hongyang Jiang, Qing Zhang, Lei
contents	We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM). This integration enables the detection and segmentation of any regions based on arbitrary text inputs and opens a door to connecting various vision models. As shown in Fig.1, a wide range of vision tasks can be achieved by using the versatile Grounded SAM pipeline. For example, an automatic annotation pipeline based solely on input images can be realized by incorporating models such as BLIP and Recognize Anything. Additionally, incorporating Stable-Diffusion allows for controllable image editing, while the integration of OSX facilitates promptable 3D human motion analysis. Grounded SAM also shows superior performance on open-vocabulary benchmarks, achieving 48.7 mean AP on SegInW (Segmentation in the wild) zero-shot benchmark with the combination of Grounding DINO-Base and SAM-Huge models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2401_14159
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks Ren, Tianhe Liu, Shilong Zeng, Ailing Lin, Jing Li, Kunchang Cao, He Chen, Jiayu Huang, Xinyu Chen, Yukang Yan, Feng Zeng, Zhaoyang Zhang, Hao Li, Feng Yang, Jie Li, Hongyang Jiang, Qing Zhang, Lei Computer Vision and Pattern Recognition We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM). This integration enables the detection and segmentation of any regions based on arbitrary text inputs and opens a door to connecting various vision models. As shown in Fig.1, a wide range of vision tasks can be achieved by using the versatile Grounded SAM pipeline. For example, an automatic annotation pipeline based solely on input images can be realized by incorporating models such as BLIP and Recognize Anything. Additionally, incorporating Stable-Diffusion allows for controllable image editing, while the integration of OSX facilitates promptable 3D human motion analysis. Grounded SAM also shows superior performance on open-vocabulary benchmarks, achieving 48.7 mean AP on SegInW (Segmentation in the wild) zero-shot benchmark with the combination of Grounding DINO-Base and SAM-Huge models.
title	Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2401.14159

Similar Items