Saved in:
| Main Authors: | Zhong, Chunlin, Hao, Shuang, Wu, Junhua, Chang, Xiaona, Jiang, Jiwei, Nie, Xiu, Tang, He, Bai, Xiang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.20869 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection
by: Hao, Shuang, et al.
Published: (2024)
by: Hao, Shuang, et al.
Published: (2024)
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
by: Bai, Sule, et al.
Published: (2025)
by: Bai, Sule, et al.
Published: (2025)
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
by: Liu, Junli, et al.
Published: (2025)
by: Liu, Junli, et al.
Published: (2025)
AgroVG: A Large-Scale Multi-Source Benchmark for Agricultural Visual Grounding
by: Li, Haocheng, et al.
Published: (2026)
by: Li, Haocheng, et al.
Published: (2026)
VG3T: Visual Geometry Grounded Gaussian Transformer
by: Kim, Junho, et al.
Published: (2025)
by: Kim, Junho, et al.
Published: (2025)
OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward
by: Zhong, Chunlin, et al.
Published: (2025)
by: Zhong, Chunlin, et al.
Published: (2025)
A Giant Thoracic ALK ‐Rearranged Mesenchymal Neoplasm in a Child
by: Sheng Gao, et al.
Published: (2026)
by: Sheng Gao, et al.
Published: (2026)
PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination
by: Dai, Ming, et al.
Published: (2025)
by: Dai, Ming, et al.
Published: (2025)
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
by: Kang, Weitai, et al.
Published: (2024)
by: Kang, Weitai, et al.
Published: (2024)
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
by: Shi, Liangtao, et al.
Published: (2025)
by: Shi, Liangtao, et al.
Published: (2025)
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding
by: Xiao, Linhui, et al.
Published: (2023)
by: Xiao, Linhui, et al.
Published: (2023)
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding
by: Xiao, Linhui, et al.
Published: (2024)
by: Xiao, Linhui, et al.
Published: (2024)
Synthetic Data in AI: Challenges, Applications, and Ethical Implications
by: Hao, Shuang, et al.
Published: (2024)
by: Hao, Shuang, et al.
Published: (2024)
GeM-VG: Towards Generalized Multi-image Visual Grounding with Multimodal Large Language Models
by: Zheng, Shurong, et al.
Published: (2026)
by: Zheng, Shurong, et al.
Published: (2026)
VG-CoT: Towards Trustworthy Visual Reasoning via Grounded Chain-of-Thought
by: Lim, Byeonggeuk, et al.
Published: (2026)
by: Lim, Byeonggeuk, et al.
Published: (2026)
VG3S: Visual Geometry Grounded Gaussian Splatting for Semantic Occupancy Prediction
by: Yan, Xiaoyang, et al.
Published: (2026)
by: Yan, Xiaoyang, et al.
Published: (2026)
ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding
by: Zheng, Minghang, et al.
Published: (2024)
by: Zheng, Minghang, et al.
Published: (2024)
ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
by: Kang, Weitai, et al.
Published: (2025)
by: Kang, Weitai, et al.
Published: (2025)
VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting
by: Ilaslan, Muhammet Furkan, et al.
Published: (2024)
by: Ilaslan, Muhammet Furkan, et al.
Published: (2024)
ProVG: Progressive Visual Grounding via Language Decoupling for Remote Sensing Imagery
by: Li, Ke, et al.
Published: (2026)
by: Li, Ke, et al.
Published: (2026)
GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning
by: Kang, Weitai, et al.
Published: (2025)
by: Kang, Weitai, et al.
Published: (2025)
$\text{VG}^2$GT: Voxel-Gaussian Splatting Visual Geometry Grounded Transformer
by: Zhao, Yibin, et al.
Published: (2026)
by: Zhao, Yibin, et al.
Published: (2026)
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion
by: Dai, Ming, et al.
Published: (2024)
by: Dai, Ming, et al.
Published: (2024)
LLM4VG: Large Language Models Evaluation for Video Grounding
by: Feng, Wei, et al.
Published: (2023)
by: Feng, Wei, et al.
Published: (2023)
WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar
by: Guan, Runwei, et al.
Published: (2024)
by: Guan, Runwei, et al.
Published: (2024)
VG-SSL: Benchmarking Self-supervised Representation Learning Approaches for Visual Geo-localization
by: Xiao, Jiuhong, et al.
Published: (2023)
by: Xiao, Jiuhong, et al.
Published: (2023)
VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning
by: Wang, Yuji, et al.
Published: (2025)
by: Wang, Yuji, et al.
Published: (2025)
ReinPath: A Multimodal Reinforcement Learning Approach for Pathology
by: Zhou, Kangcheng, et al.
Published: (2026)
by: Zhou, Kangcheng, et al.
Published: (2026)
Benchmarking PathCLIP for Pathology Image Analysis
by: Zheng, Sunyi, et al.
Published: (2024)
by: Zheng, Sunyi, et al.
Published: (2024)
A New Dataset and Benchmark for Grounding Multimodal Misinformation
by: Yang, Bingjian, et al.
Published: (2025)
by: Yang, Bingjian, et al.
Published: (2025)
PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks
by: Wu, Jianyu, et al.
Published: (2025)
by: Wu, Jianyu, et al.
Published: (2025)
Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark
by: Wang, Jiahao, et al.
Published: (2025)
by: Wang, Jiahao, et al.
Published: (2025)
TrajVG: 3D Trajectory-Coupled Visual Geometry Learning
by: Miao, Xingyu, et al.
Published: (2026)
by: Miao, Xingyu, et al.
Published: (2026)
MCFEND: A Multi-source Benchmark Dataset for Chinese Fake News Detection
by: Li, Yupeng, et al.
Published: (2024)
by: Li, Yupeng, et al.
Published: (2024)
PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
by: Sun, Yuxuan, et al.
Published: (2024)
by: Sun, Yuxuan, et al.
Published: (2024)
A Digital Pathology Resource for Liver Cancer Quantification with Datasets, Benchmarks, and Tools
by: Xiao, Ying, et al.
Published: (2026)
by: Xiao, Ying, et al.
Published: (2026)
DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding
by: Yan, Hao, et al.
Published: (2026)
by: Yan, Hao, et al.
Published: (2026)
Comment on: The Capabilities of Large Language Models in Extracting Unstructured Data From Histopathology Reports
by: Junhua Qi, et al.
Published: (2026)
by: Junhua Qi, et al.
Published: (2026)
Comment on ‘adjuvant anti‐PD‐1 antibody versus observation for resected nail apparatus melanoma’
by: Junhua Qi, et al.
Published: (2026)
by: Junhua Qi, et al.
Published: (2026)
MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark
by: Shan, Bin, et al.
Published: (2024)
by: Shan, Bin, et al.
Published: (2024)
Similar Items
-
CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection
by: Hao, Shuang, et al.
Published: (2024) -
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
by: Bai, Sule, et al.
Published: (2025) -
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
by: Liu, Junli, et al.
Published: (2025) -
AgroVG: A Large-Scale Multi-Source Benchmark for Agricultural Visual Grounding
by: Li, Haocheng, et al.
Published: (2026) -
VG3T: Visual Geometry Grounded Gaussian Transformer
by: Kim, Junho, et al.
Published: (2025)