:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhong, Chunlin, Hao, Shuang, Wu, Junhua, Chang, Xiaona, Jiang, Jiwei, Nie, Xiu, Tang, He, Bai, Xiang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2502.20869
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection
by: Hao, Shuang, et al.
Published: (2024)

UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
by: Bai, Sule, et al.
Published: (2025)

AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
by: Liu, Junli, et al.
Published: (2025)

AgroVG: A Large-Scale Multi-Source Benchmark for Agricultural Visual Grounding
by: Li, Haocheng, et al.
Published: (2026)

VG3T: Visual Geometry Grounded Gaussian Transformer
by: Kim, Junho, et al.
Published: (2025)

OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward
by: Zhong, Chunlin, et al.
Published: (2025)

A Giant Thoracic ALK ‐Rearranged Mesenchymal Neoplasm in a Child
by: Sheng Gao, et al.
Published: (2026)

PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination
by: Dai, Ming, et al.
Published: (2025)

SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
by: Kang, Weitai, et al.
Published: (2024)

SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
by: Shi, Liangtao, et al.
Published: (2025)

CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding
by: Xiao, Linhui, et al.
Published: (2023)

HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding
by: Xiao, Linhui, et al.
Published: (2024)

Synthetic Data in AI: Challenges, Applications, and Ethical Implications
by: Hao, Shuang, et al.
Published: (2024)

GeM-VG: Towards Generalized Multi-image Visual Grounding with Multimodal Large Language Models
by: Zheng, Shurong, et al.
Published: (2026)

VG-CoT: Towards Trustworthy Visual Reasoning via Grounded Chain-of-Thought
by: Lim, Byeonggeuk, et al.
Published: (2026)

VG3S: Visual Geometry Grounded Gaussian Splatting for Semantic Occupancy Prediction
by: Yan, Xiaoyang, et al.
Published: (2026)

ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding
by: Zheng, Minghang, et al.
Published: (2024)

ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
by: Kang, Weitai, et al.
Published: (2025)

VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting
by: Ilaslan, Muhammet Furkan, et al.
Published: (2024)

ProVG: Progressive Visual Grounding via Language Decoupling for Remote Sensing Imagery
by: Li, Ke, et al.
Published: (2026)

GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning
by: Kang, Weitai, et al.
Published: (2025)

$\text{VG}^2$GT: Voxel-Gaussian Splatting Visual Geometry Grounded Transformer
by: Zhao, Yibin, et al.
Published: (2026)

SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion
by: Dai, Ming, et al.
Published: (2024)

LLM4VG: Large Language Models Evaluation for Video Grounding
by: Feng, Wei, et al.
Published: (2023)

WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar
by: Guan, Runwei, et al.
Published: (2024)

VG-SSL: Benchmarking Self-supervised Representation Learning Approaches for Visual Geo-localization
by: Xiao, Jiuhong, et al.
Published: (2023)

VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning
by: Wang, Yuji, et al.
Published: (2025)

ReinPath: A Multimodal Reinforcement Learning Approach for Pathology
by: Zhou, Kangcheng, et al.
Published: (2026)

Benchmarking PathCLIP for Pathology Image Analysis
by: Zheng, Sunyi, et al.
Published: (2024)

A New Dataset and Benchmark for Grounding Multimodal Misinformation
by: Yang, Bingjian, et al.
Published: (2025)

PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks
by: Wu, Jianyu, et al.
Published: (2025)

Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark
by: Wang, Jiahao, et al.
Published: (2025)

TrajVG: 3D Trajectory-Coupled Visual Geometry Learning
by: Miao, Xingyu, et al.
Published: (2026)

MCFEND: A Multi-source Benchmark Dataset for Chinese Fake News Detection
by: Li, Yupeng, et al.
Published: (2024)

PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
by: Sun, Yuxuan, et al.
Published: (2024)

A Digital Pathology Resource for Liver Cancer Quantification with Datasets, Benchmarks, and Tools
by: Xiao, Ying, et al.
Published: (2026)

DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding
by: Yan, Hao, et al.
Published: (2026)

Comment on: The Capabilities of Large Language Models in Extracting Unstructured Data From Histopathology Reports
by: Junhua Qi, et al.
Published: (2026)

Comment on ‘adjuvant anti‐PD‐1 antibody versus observation for resected nail apparatus melanoma’
by: Junhua Qi, et al.
Published: (2026)

MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark
by: Shan, Bin, et al.
Published: (2024)