:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ren, Tianhe, Jiang, Qing, Liu, Shilong, Zeng, Zhaoyang, Liu, Wenlong, Gao, Han, Huang, Hongjie, Ma, Zhengyu, Jiang, Xiaoke, Chen, Yihao, Xiong, Yuda, Zhang, Hao, Li, Feng, Tang, Peijun, Yu, Kent, Zhang, Lei
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2405.10300
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding
by: Ren, Tianhe, et al.
Published: (2024)

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
by: Liu, Shilong, et al.
Published: (2023)

Referring to Any Person
by: Jiang, Qing, et al.
Published: (2025)

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
by: Jiang, Qing, et al.
Published: (2024)

ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
by: Jiang, Qing, et al.
Published: (2024)

Detect Anything via Next Point Prediction
by: Jiang, Qing, et al.
Published: (2025)

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks
by: Ren, Tianhe, et al.
Published: (2024)

SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features
by: Qu, Jinyuan, et al.
Published: (2025)

TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video
by: Qu, Jinyuan, et al.
Published: (2024)

Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning
by: Jiang, Qing, et al.
Published: (2025)

TAPTR: Tracking Any Point with Transformers as Detection
by: Li, Hongyang, et al.
Published: (2024)

TAPTRv2: Attention-based Position Update Improves Tracking Any Point
by: Li, Hongyang, et al.
Published: (2024)

Chain-of-Ground: Improving GUI Grounding via Iterative Reasoning and Reference Feedback
by: Li, Aiden Yiliu, et al.
Published: (2025)

OVS-DINO: Open-Vocabulary Segmentation via Structure-Aligned SAM-DINO with Language Guidance
by: Zeng, Haoxi, et al.
Published: (2026)

ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
by: Liang, Tianming, et al.
Published: (2025)

Cross-DINO: Cross the Deep MLP and Transformer for Small Object Detection
by: Cao, Guiping, et al.
Published: (2025)

PET-DINO: Unifying Visual Cues into Grounding DINO with Prompt-Enriched Training
by: Fu, Weifu, et al.
Published: (2026)

From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
by: Jiang, Dongsheng, et al.
Published: (2023)

Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
by: Wasim, Syed Talal, et al.
Published: (2023)

Evaluating Stenosis Detection with Grounding DINO, YOLO, and DINO-DETR
by: Ansari, Muhammad Musab
Published: (2025)

Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection
by: Lu, Yehao, et al.
Published: (2025)

Guide, Think, Act: Interactive Embodied Reasoning in Vision-Language-Action Models
by: Ling, Yiran, et al.
Published: (2026)

A Reconstruction of the Neutrino Nature and a Unified Explanation of Related Puzzles Based on the Great Tao Model
by: Zeng, Jiqing, et al.
Published: (2026)

The Existence Field Theory of the Great Tao Model: Establishment of the Vacuum-Medium Unified Field Equations
by: Zeng, Jiqing, et al.
Published: (2026)

DIVE: Taming DINO for Subject-Driven Video Editing
by: Huang, Yi, et al.
Published: (2024)

Weak Dual Drazin Inverse and its Characterizations and Properties
by: Wang, Hongxing, et al.
Published: (2024)

Deploy DINO with Many-to-Many Association
by: Jiang, Haodong, et al.
Published: (2026)

AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference Understanding
by: Guo, Hao, et al.
Published: (2024)

Eating Smart: Advancing Health Informatics with the Grounding DINO based Dietary Assistant App
by: Nossair, Abdelilah, et al.
Published: (2024)

Discrimination-free Insurance Pricing with Privatized Sensitive Attributes
by: Zhang, Tianhe, et al.
Published: (2025)

Economic Inequality Brings About More Inaction Over Climate Change: The Role of Perception, Discussion, and Responsibility
by: Changcheng Wang, et al.
Published: (2025)

DINO-Tok: Adapting DINO for Visual Tokenizers
by: Jia, Mingkai, et al.
Published: (2025)

Unlocking the Potential of Grounding DINO in Videos: Parameter-Efficient Adaptation for Limited-Data Spatial-Temporal Localization
by: Wang, Zanyi, et al.
Published: (2026)

OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
by: Wang, Hao, et al.
Published: (2024)

DINO-MVR: Multi-View Readout of Frozen DINOv3 for Annotation-Efficient Medical Segmentation
by: Jiang, Wei, et al.
Published: (2026)

DINO-SD: Champion Solution for ICRA 2024 RoboDepth Challenge
by: Mao, Yifan, et al.
Published: (2024)

Few-Shot Adaptation of Grounding DINO for Agricultural Domain
by: Singh, Rajhans, et al.
Published: (2025)

DINO Eats CLIP: Adapting Beyond Knowns for Open-set 3D Object Retrieval
by: He, Xinwei, et al.
Published: (2026)

Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector
by: Fu, Yuqian, et al.
Published: (2024)

OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models
by: Kuang, Yuxuan, et al.
Published: (2024)