Saved in:
| Main Authors: | Ren, Tianhe, Liu, Shilong, Zeng, Ailing, Lin, Jing, Li, Kunchang, Cao, He, Chen, Jiayu, Huang, Xinyu, Chen, Yukang, Yan, Feng, Zeng, Zhaoyang, Zhang, Hao, Li, Feng, Yang, Jie, Li, Hongyang, Jiang, Qing, Zhang, Lei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.14159 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
by: Jiang, Qing, et al.
Published: (2024)
by: Jiang, Qing, et al.
Published: (2024)
TAPTR: Tracking Any Point with Transformers as Detection
by: Li, Hongyang, et al.
Published: (2024)
by: Li, Hongyang, et al.
Published: (2024)
TAPTRv2: Attention-based Position Update Improves Tracking Any Point
by: Li, Hongyang, et al.
Published: (2024)
by: Li, Hongyang, et al.
Published: (2024)
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
by: Liu, Shilong, et al.
Published: (2023)
by: Liu, Shilong, et al.
Published: (2023)
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video
by: Qu, Jinyuan, et al.
Published: (2024)
by: Qu, Jinyuan, et al.
Published: (2024)
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
by: Ren, Tianhe, et al.
Published: (2024)
by: Ren, Tianhe, et al.
Published: (2024)
Open-World Human-Object Interaction Detection via Multi-modal Prompts
by: Yang, Jie, et al.
Published: (2024)
by: Yang, Jie, et al.
Published: (2024)
DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding
by: Ren, Tianhe, et al.
Published: (2024)
by: Ren, Tianhe, et al.
Published: (2024)
Chain-of-Ground: Improving GUI Grounding via Iterative Reasoning and Reference Feedback
by: Li, Aiden Yiliu, et al.
Published: (2025)
by: Li, Aiden Yiliu, et al.
Published: (2025)
SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features
by: Qu, Jinyuan, et al.
Published: (2025)
by: Qu, Jinyuan, et al.
Published: (2025)
VRP-SAM: SAM with Visual Reference Prompt
by: Sun, Yanpeng, et al.
Published: (2024)
by: Sun, Yanpeng, et al.
Published: (2024)
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
by: Jiang, Qing, et al.
Published: (2024)
by: Jiang, Qing, et al.
Published: (2024)
Referring to Any Person
by: Jiang, Qing, et al.
Published: (2025)
by: Jiang, Qing, et al.
Published: (2025)
Detect Anything via Next Point Prediction
by: Jiang, Qing, et al.
Published: (2025)
by: Jiang, Qing, et al.
Published: (2025)
Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis
by: Wu, Tianhe, et al.
Published: (2026)
by: Wu, Tianhe, et al.
Published: (2026)
Wanderland: Geometrically Grounded Simulation for Open-World Embodied AI
by: Liu, Xinhao, et al.
Published: (2025)
by: Liu, Xinhao, et al.
Published: (2025)
Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning
by: Jiang, Qing, et al.
Published: (2025)
by: Jiang, Qing, et al.
Published: (2025)
Images in Sentences: Scaling Interleaved Instructions for Unified Visual Generation
by: Zhang, Yabo, et al.
Published: (2026)
by: Zhang, Yabo, et al.
Published: (2026)
A Reconstruction of the Neutrino Nature and a Unified Explanation of Related Puzzles Based on the Great Tao Model
by: Zeng, Jiqing, et al.
Published: (2026)
by: Zeng, Jiqing, et al.
Published: (2026)
The Existence Field Theory of the Great Tao Model: Establishment of the Vacuum-Medium Unified Field Equations
by: Zeng, Jiqing, et al.
Published: (2026)
by: Zeng, Jiqing, et al.
Published: (2026)
Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition
by: Chen, Boyu, et al.
Published: (2024)
by: Chen, Boyu, et al.
Published: (2024)
GSAlign: Geometric and Semantic Alignment Network for Aerial-Ground Person Re-Identification
by: Li, Qiao, et al.
Published: (2025)
by: Li, Qiao, et al.
Published: (2025)
X-Pose: Detecting Any Keypoints
by: Yang, Jie, et al.
Published: (2023)
by: Yang, Jie, et al.
Published: (2023)
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
by: Yuan, Zhihao, et al.
Published: (2023)
by: Yuan, Zhihao, et al.
Published: (2023)
Enhanced Fast Switching of Viologen‐Based Electrochromics Using Hybrid Electrolytes
by: Ying Ma, et al.
Published: (2026)
by: Ying Ma, et al.
Published: (2026)
DSM: Constructing a Diverse Semantic Map for 3D Visual Grounding
by: Xie, Qinghongbing, et al.
Published: (2025)
by: Xie, Qinghongbing, et al.
Published: (2025)
Imaging and Polarimetric Signatures of Konoplya-Zhidenko Black Holes with Various Thick Disk
by: Wang, Xinyu, et al.
Published: (2025)
by: Wang, Xinyu, et al.
Published: (2025)
VQ-VA World: Towards High-Quality Visual Question-Visual Answering
by: Gou, Chenhui, et al.
Published: (2025)
by: Gou, Chenhui, et al.
Published: (2025)
Moving Toward Best Practice When Using Propensity Score Weighting in Survey Observational Studies
by: Yukang Zeng, et al.
Published: (2026)
by: Yukang Zeng, et al.
Published: (2026)
Moving toward best practice when using propensity score weighting in survey observational studies
by: Zeng, Yukang, et al.
Published: (2025)
by: Zeng, Yukang, et al.
Published: (2025)
LMM-Track4D: Eliciting 4D Dynamic Reasoning in LMMs via Trajectory-Grounded Dialogue
by: Li, Chaoyue, et al.
Published: (2026)
by: Li, Chaoyue, et al.
Published: (2026)
AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning
by: Huang, Duojun, et al.
Published: (2024)
by: Huang, Duojun, et al.
Published: (2024)
Adversarial Exploitation of Data Diversity Improves Visual Localization
by: Li, Sihang, et al.
Published: (2024)
by: Li, Sihang, et al.
Published: (2024)
AssemPlanner: A Multi-Agent Based Task Planning Framework for Flexible Assembly System
by: Zhang, Chenhao, et al.
Published: (2026)
by: Zhang, Chenhao, et al.
Published: (2026)
MDReID: Modality-Decoupled Learning for Any-to-Any Multi-Modal Object Re-Identification
by: Feng, Yingying, et al.
Published: (2025)
by: Feng, Yingying, et al.
Published: (2025)
Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO
by: Ren, Yiming, et al.
Published: (2026)
by: Ren, Yiming, et al.
Published: (2026)
Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection
by: Zhuang, Weijun, et al.
Published: (2025)
by: Zhuang, Weijun, et al.
Published: (2025)
Multi-LVI-SAM: A Robust LiDAR-Visual-Inertial Odometry for Multiple Fisheye Cameras
by: Zhang, Xinyu, et al.
Published: (2025)
by: Zhang, Xinyu, et al.
Published: (2025)
Less Signals, More Understanding: Channel-Capacity Codebook Design for Digital Task-Oriented Semantic Communication
by: Zhang, Anbang, et al.
Published: (2025)
by: Zhang, Anbang, et al.
Published: (2025)
VideoSAM: Open-World Video Segmentation
by: Guo, Pinxue, et al.
Published: (2024)
by: Guo, Pinxue, et al.
Published: (2024)
Similar Items
-
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
by: Jiang, Qing, et al.
Published: (2024) -
TAPTR: Tracking Any Point with Transformers as Detection
by: Li, Hongyang, et al.
Published: (2024) -
TAPTRv2: Attention-based Position Update Improves Tracking Any Point
by: Li, Hongyang, et al.
Published: (2024) -
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
by: Liu, Shilong, et al.
Published: (2023) -
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video
by: Qu, Jinyuan, et al.
Published: (2024)