Saved in:
| Main Authors: | Ren, Tianhe, Jiang, Qing, Liu, Shilong, Zeng, Zhaoyang, Liu, Wenlong, Gao, Han, Huang, Hongjie, Ma, Zhengyu, Jiang, Xiaoke, Chen, Yihao, Xiong, Yuda, Zhang, Hao, Li, Feng, Tang, Peijun, Yu, Kent, Zhang, Lei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.10300 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding
by: Ren, Tianhe, et al.
Published: (2024)
by: Ren, Tianhe, et al.
Published: (2024)
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
by: Liu, Shilong, et al.
Published: (2023)
by: Liu, Shilong, et al.
Published: (2023)
Referring to Any Person
by: Jiang, Qing, et al.
Published: (2025)
by: Jiang, Qing, et al.
Published: (2025)
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
by: Jiang, Qing, et al.
Published: (2024)
by: Jiang, Qing, et al.
Published: (2024)
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
by: Jiang, Qing, et al.
Published: (2024)
by: Jiang, Qing, et al.
Published: (2024)
Detect Anything via Next Point Prediction
by: Jiang, Qing, et al.
Published: (2025)
by: Jiang, Qing, et al.
Published: (2025)
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks
by: Ren, Tianhe, et al.
Published: (2024)
by: Ren, Tianhe, et al.
Published: (2024)
SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features
by: Qu, Jinyuan, et al.
Published: (2025)
by: Qu, Jinyuan, et al.
Published: (2025)
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video
by: Qu, Jinyuan, et al.
Published: (2024)
by: Qu, Jinyuan, et al.
Published: (2024)
Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning
by: Jiang, Qing, et al.
Published: (2025)
by: Jiang, Qing, et al.
Published: (2025)
TAPTR: Tracking Any Point with Transformers as Detection
by: Li, Hongyang, et al.
Published: (2024)
by: Li, Hongyang, et al.
Published: (2024)
TAPTRv2: Attention-based Position Update Improves Tracking Any Point
by: Li, Hongyang, et al.
Published: (2024)
by: Li, Hongyang, et al.
Published: (2024)
Chain-of-Ground: Improving GUI Grounding via Iterative Reasoning and Reference Feedback
by: Li, Aiden Yiliu, et al.
Published: (2025)
by: Li, Aiden Yiliu, et al.
Published: (2025)
OVS-DINO: Open-Vocabulary Segmentation via Structure-Aligned SAM-DINO with Language Guidance
by: Zeng, Haoxi, et al.
Published: (2026)
by: Zeng, Haoxi, et al.
Published: (2026)
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
by: Liang, Tianming, et al.
Published: (2025)
by: Liang, Tianming, et al.
Published: (2025)
Cross-DINO: Cross the Deep MLP and Transformer for Small Object Detection
by: Cao, Guiping, et al.
Published: (2025)
by: Cao, Guiping, et al.
Published: (2025)
PET-DINO: Unifying Visual Cues into Grounding DINO with Prompt-Enriched Training
by: Fu, Weifu, et al.
Published: (2026)
by: Fu, Weifu, et al.
Published: (2026)
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
by: Jiang, Dongsheng, et al.
Published: (2023)
by: Jiang, Dongsheng, et al.
Published: (2023)
Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
by: Wasim, Syed Talal, et al.
Published: (2023)
by: Wasim, Syed Talal, et al.
Published: (2023)
Evaluating Stenosis Detection with Grounding DINO, YOLO, and DINO-DETR
by: Ansari, Muhammad Musab
Published: (2025)
by: Ansari, Muhammad Musab
Published: (2025)
Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection
by: Lu, Yehao, et al.
Published: (2025)
by: Lu, Yehao, et al.
Published: (2025)
Guide, Think, Act: Interactive Embodied Reasoning in Vision-Language-Action Models
by: Ling, Yiran, et al.
Published: (2026)
by: Ling, Yiran, et al.
Published: (2026)
A Reconstruction of the Neutrino Nature and a Unified Explanation of Related Puzzles Based on the Great Tao Model
by: Zeng, Jiqing, et al.
Published: (2026)
by: Zeng, Jiqing, et al.
Published: (2026)
The Existence Field Theory of the Great Tao Model: Establishment of the Vacuum-Medium Unified Field Equations
by: Zeng, Jiqing, et al.
Published: (2026)
by: Zeng, Jiqing, et al.
Published: (2026)
DIVE: Taming DINO for Subject-Driven Video Editing
by: Huang, Yi, et al.
Published: (2024)
by: Huang, Yi, et al.
Published: (2024)
Weak Dual Drazin Inverse and its Characterizations and Properties
by: Wang, Hongxing, et al.
Published: (2024)
by: Wang, Hongxing, et al.
Published: (2024)
Deploy DINO with Many-to-Many Association
by: Jiang, Haodong, et al.
Published: (2026)
by: Jiang, Haodong, et al.
Published: (2026)
AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference Understanding
by: Guo, Hao, et al.
Published: (2024)
by: Guo, Hao, et al.
Published: (2024)
Eating Smart: Advancing Health Informatics with the Grounding DINO based Dietary Assistant App
by: Nossair, Abdelilah, et al.
Published: (2024)
by: Nossair, Abdelilah, et al.
Published: (2024)
Discrimination-free Insurance Pricing with Privatized Sensitive Attributes
by: Zhang, Tianhe, et al.
Published: (2025)
by: Zhang, Tianhe, et al.
Published: (2025)
Economic Inequality Brings About More Inaction Over Climate Change: The Role of Perception, Discussion, and Responsibility
by: Changcheng Wang, et al.
Published: (2025)
by: Changcheng Wang, et al.
Published: (2025)
DINO-Tok: Adapting DINO for Visual Tokenizers
by: Jia, Mingkai, et al.
Published: (2025)
by: Jia, Mingkai, et al.
Published: (2025)
Unlocking the Potential of Grounding DINO in Videos: Parameter-Efficient Adaptation for Limited-Data Spatial-Temporal Localization
by: Wang, Zanyi, et al.
Published: (2026)
by: Wang, Zanyi, et al.
Published: (2026)
OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
by: Wang, Hao, et al.
Published: (2024)
by: Wang, Hao, et al.
Published: (2024)
DINO-MVR: Multi-View Readout of Frozen DINOv3 for Annotation-Efficient Medical Segmentation
by: Jiang, Wei, et al.
Published: (2026)
by: Jiang, Wei, et al.
Published: (2026)
DINO-SD: Champion Solution for ICRA 2024 RoboDepth Challenge
by: Mao, Yifan, et al.
Published: (2024)
by: Mao, Yifan, et al.
Published: (2024)
Few-Shot Adaptation of Grounding DINO for Agricultural Domain
by: Singh, Rajhans, et al.
Published: (2025)
by: Singh, Rajhans, et al.
Published: (2025)
DINO Eats CLIP: Adapting Beyond Knowns for Open-set 3D Object Retrieval
by: He, Xinwei, et al.
Published: (2026)
by: He, Xinwei, et al.
Published: (2026)
Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector
by: Fu, Yuqian, et al.
Published: (2024)
by: Fu, Yuqian, et al.
Published: (2024)
OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models
by: Kuang, Yuxuan, et al.
Published: (2024)
by: Kuang, Yuxuan, et al.
Published: (2024)
Similar Items
-
DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding
by: Ren, Tianhe, et al.
Published: (2024) -
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
by: Liu, Shilong, et al.
Published: (2023) -
Referring to Any Person
by: Jiang, Qing, et al.
Published: (2025) -
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
by: Jiang, Qing, et al.
Published: (2024) -
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
by: Jiang, Qing, et al.
Published: (2024)