:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ren, Tianhe, Liu, Shilong, Zeng, Ailing, Lin, Jing, Li, Kunchang, Cao, He, Chen, Jiayu, Huang, Xinyu, Chen, Yukang, Yan, Feng, Zeng, Zhaoyang, Zhang, Hao, Li, Feng, Yang, Jie, Li, Hongyang, Jiang, Qing, Zhang, Lei
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2401.14159
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
by: Jiang, Qing, et al.
Published: (2024)

TAPTR: Tracking Any Point with Transformers as Detection
by: Li, Hongyang, et al.
Published: (2024)

TAPTRv2: Attention-based Position Update Improves Tracking Any Point
by: Li, Hongyang, et al.
Published: (2024)

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
by: Liu, Shilong, et al.
Published: (2023)

TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video
by: Qu, Jinyuan, et al.
Published: (2024)

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
by: Ren, Tianhe, et al.
Published: (2024)

Open-World Human-Object Interaction Detection via Multi-modal Prompts
by: Yang, Jie, et al.
Published: (2024)

DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding
by: Ren, Tianhe, et al.
Published: (2024)

Chain-of-Ground: Improving GUI Grounding via Iterative Reasoning and Reference Feedback
by: Li, Aiden Yiliu, et al.
Published: (2025)

SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features
by: Qu, Jinyuan, et al.
Published: (2025)

VRP-SAM: SAM with Visual Reference Prompt
by: Sun, Yanpeng, et al.
Published: (2024)

ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
by: Jiang, Qing, et al.
Published: (2024)

Referring to Any Person
by: Jiang, Qing, et al.
Published: (2025)

Detect Anything via Next Point Prediction
by: Jiang, Qing, et al.
Published: (2025)

Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis
by: Wu, Tianhe, et al.
Published: (2026)

Wanderland: Geometrically Grounded Simulation for Open-World Embodied AI
by: Liu, Xinhao, et al.
Published: (2025)

Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning
by: Jiang, Qing, et al.
Published: (2025)

Images in Sentences: Scaling Interleaved Instructions for Unified Visual Generation
by: Zhang, Yabo, et al.
Published: (2026)

A Reconstruction of the Neutrino Nature and a Unified Explanation of Related Puzzles Based on the Great Tao Model
by: Zeng, Jiqing, et al.
Published: (2026)

The Existence Field Theory of the Great Tao Model: Establishment of the Vacuum-Medium Unified Field Equations
by: Zeng, Jiqing, et al.
Published: (2026)

Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition
by: Chen, Boyu, et al.
Published: (2024)

GSAlign: Geometric and Semantic Alignment Network for Aerial-Ground Person Re-Identification
by: Li, Qiao, et al.
Published: (2025)

X-Pose: Detecting Any Keypoints
by: Yang, Jie, et al.
Published: (2023)

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
by: Yuan, Zhihao, et al.
Published: (2023)

Enhanced Fast Switching of Viologen‐Based Electrochromics Using Hybrid Electrolytes
by: Ying Ma, et al.
Published: (2026)

DSM: Constructing a Diverse Semantic Map for 3D Visual Grounding
by: Xie, Qinghongbing, et al.
Published: (2025)

Imaging and Polarimetric Signatures of Konoplya-Zhidenko Black Holes with Various Thick Disk
by: Wang, Xinyu, et al.
Published: (2025)

VQ-VA World: Towards High-Quality Visual Question-Visual Answering
by: Gou, Chenhui, et al.
Published: (2025)

Moving Toward Best Practice When Using Propensity Score Weighting in Survey Observational Studies
by: Yukang Zeng, et al.
Published: (2026)

Moving toward best practice when using propensity score weighting in survey observational studies
by: Zeng, Yukang, et al.
Published: (2025)

LMM-Track4D: Eliciting 4D Dynamic Reasoning in LMMs via Trajectory-Grounded Dialogue
by: Li, Chaoyue, et al.
Published: (2026)

AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning
by: Huang, Duojun, et al.
Published: (2024)

Adversarial Exploitation of Data Diversity Improves Visual Localization
by: Li, Sihang, et al.
Published: (2024)

AssemPlanner: A Multi-Agent Based Task Planning Framework for Flexible Assembly System
by: Zhang, Chenhao, et al.
Published: (2026)

MDReID: Modality-Decoupled Learning for Any-to-Any Multi-Modal Object Re-Identification
by: Feng, Yingying, et al.
Published: (2025)

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO
by: Ren, Yiming, et al.
Published: (2026)

Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection
by: Zhuang, Weijun, et al.
Published: (2025)

Multi-LVI-SAM: A Robust LiDAR-Visual-Inertial Odometry for Multiple Fisheye Cameras
by: Zhang, Xinyu, et al.
Published: (2025)

Less Signals, More Understanding: Channel-Capacity Codebook Design for Digital Task-Oriented Semantic Communication
by: Zhang, Anbang, et al.
Published: (2025)

VideoSAM: Open-World Video Segmentation
by: Guo, Pinxue, et al.
Published: (2024)