:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Xiaowen, Gao, Zhi, Jiao, Licheng, Li, Lingling, Li, Qing
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.11730
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement Learning
by: Cao, Meng, et al.
Published: (2025)

OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
by: Yao, Jiali, et al.
Published: (2025)

STSeg-Complex Video Object Segmentation: The 1st Solution for 4th PVUW MOSE Challenge
by: Song, Kehuan, et al.
Published: (2025)

Edit-Your-Interest: Efficient Video Editing via Feature Most-Similar Propagation
by: Zuo, Yi, et al.
Published: (2025)

DEVIS-GRPO: Unleashing GRPO on Dynamic Extreme View Synthesis
by: Zuo, Yi, et al.
Published: (2026)

Saliency-R1: Incentivizing Unified Saliency Reasoning Capability in MLLM with Confidence-Guided Reinforcement Learning
by: Li, Long, et al.
Published: (2025)

VideoSeg-R1:Reasoning Video Object Segmentation via Reinforcement Learning
by: Xu, Zishan, et al.
Published: (2025)

Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing
by: Zuo, Yi, et al.
Published: (2024)

VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning
by: Wang, Qi, et al.
Published: (2025)

Video-R1: Reinforcing Video Reasoning in MLLMs
by: Feng, Kaituo, et al.
Published: (2025)

ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
by: Li, Yifan, et al.
Published: (2025)

UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
by: Bai, Sule, et al.
Published: (2025)

Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning
by: Zhang, Bob, et al.
Published: (2025)

Wan-R1: Verifiable-Reinforcement Learning for Video Reasoning
by: Liu, Ming, et al.
Published: (2026)

Multiplane Prior Guided Few-Shot Aerial Scene Rendering
by: Gao, Zihan, et al.
Published: (2024)

InstanceV: Instance-Level Video Generation
by: Chen, Yuheng, et al.
Published: (2025)

Learning Evolution via Optimization Knowledge Adaptation
by: Wang, Chao, et al.
Published: (2025)

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
by: Pan, Jiazhen, et al.
Published: (2025)

VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations
by: Dong, Lu, et al.
Published: (2025)

DI3CL: Contrastive Learning With Dynamic Instances and Contour Consistency for SAR Land-Cover Classification Foundation Model
by: Ren, Zhongle, et al.
Published: (2025)

Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs
by: Zhang, Xintong, et al.
Published: (2025)

Clinically-Grounded Counterfactual Reasoning for Medical Video Diagnosis
by: Gao, Jianzhe, et al.
Published: (2026)

Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery
by: Zhang, Fan, et al.
Published: (2024)

VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation
by: Zhao, Yiming, et al.
Published: (2026)

AdaptMMBench: Benchmarking Adaptive Multimodal Reasoning for Mode Selection and Reasoning Process
by: Zhang, Xintong, et al.
Published: (2026)

Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
by: Xia, Jiaer, et al.
Published: (2025)

Tempo-R0: A Video-MLLM for Temporal Video Grounding through Efficient Temporal Sensing Reinforcement Learning
by: Yue, Feng, et al.
Published: (2025)

Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models
by: Maaz, Muhammad, et al.
Published: (2025)

Fast and Efficient: Mask Neural Fields for 3D Scene Segmentation
by: Gao, Zihan, et al.
Published: (2024)

Domain-aware Category-level Geometry Learning Segmentation for 3D Point Clouds
by: He, Pei, et al.
Published: (2025)

Think with Grounding: Curriculum Reinforced Reasoning with Video Grounding for Long Video Understanding
by: Chen, Houlun, et al.
Published: (2026)

Foresee-to-Ground: From Predictive Temporal Perception to Evidence-Driven Reasoning for Video Temporal Grounding
by: Zheng, Zelin, et al.
Published: (2026)

GraphThinker: Reinforcing Temporally Grounded Video Reasoning with Event Graph Thinking
by: Cheng, Zixu, et al.
Published: (2026)

MedReason-R1: Learning to Reason for CT Diagnosis with Reinforcement Learning and Local Zoom
by: Li, Yifan, et al.
Published: (2025)

Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification
by: Ma, Yanbiao, et al.
Published: (2024)

MOSS-ChatV: Reinforcement Learning with Process Reasoning Reward for Video Temporal Reasoning
by: Tao, Sicheng, et al.
Published: (2025)

Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding
by: Huang, Yanxiang, et al.
Published: (2026)

Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
by: Yuan, Haobo, et al.
Published: (2025)

InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception
by: Li, Haijie, et al.
Published: (2024)

SDI-Paste: Synthetic Dynamic Instance Copy-Paste for Video Instance Segmentation
by: Shrestha, Sahir, et al.
Published: (2024)