Saved in:
| Main Authors: | Zhang, Xiaowen, Gao, Zhi, Jiao, Licheng, Li, Lingling, Li, Qing |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.11730 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement Learning
by: Cao, Meng, et al.
Published: (2025)
by: Cao, Meng, et al.
Published: (2025)
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
by: Yao, Jiali, et al.
Published: (2025)
by: Yao, Jiali, et al.
Published: (2025)
STSeg-Complex Video Object Segmentation: The 1st Solution for 4th PVUW MOSE Challenge
by: Song, Kehuan, et al.
Published: (2025)
by: Song, Kehuan, et al.
Published: (2025)
Edit-Your-Interest: Efficient Video Editing via Feature Most-Similar Propagation
by: Zuo, Yi, et al.
Published: (2025)
by: Zuo, Yi, et al.
Published: (2025)
DEVIS-GRPO: Unleashing GRPO on Dynamic Extreme View Synthesis
by: Zuo, Yi, et al.
Published: (2026)
by: Zuo, Yi, et al.
Published: (2026)
Saliency-R1: Incentivizing Unified Saliency Reasoning Capability in MLLM with Confidence-Guided Reinforcement Learning
by: Li, Long, et al.
Published: (2025)
by: Li, Long, et al.
Published: (2025)
VideoSeg-R1:Reasoning Video Object Segmentation via Reinforcement Learning
by: Xu, Zishan, et al.
Published: (2025)
by: Xu, Zishan, et al.
Published: (2025)
Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing
by: Zuo, Yi, et al.
Published: (2024)
by: Zuo, Yi, et al.
Published: (2024)
VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning
by: Wang, Qi, et al.
Published: (2025)
by: Wang, Qi, et al.
Published: (2025)
Video-R1: Reinforcing Video Reasoning in MLLMs
by: Feng, Kaituo, et al.
Published: (2025)
by: Feng, Kaituo, et al.
Published: (2025)
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
by: Bai, Sule, et al.
Published: (2025)
by: Bai, Sule, et al.
Published: (2025)
Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning
by: Zhang, Bob, et al.
Published: (2025)
by: Zhang, Bob, et al.
Published: (2025)
Wan-R1: Verifiable-Reinforcement Learning for Video Reasoning
by: Liu, Ming, et al.
Published: (2026)
by: Liu, Ming, et al.
Published: (2026)
Multiplane Prior Guided Few-Shot Aerial Scene Rendering
by: Gao, Zihan, et al.
Published: (2024)
by: Gao, Zihan, et al.
Published: (2024)
InstanceV: Instance-Level Video Generation
by: Chen, Yuheng, et al.
Published: (2025)
by: Chen, Yuheng, et al.
Published: (2025)
Learning Evolution via Optimization Knowledge Adaptation
by: Wang, Chao, et al.
Published: (2025)
by: Wang, Chao, et al.
Published: (2025)
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
by: Pan, Jiazhen, et al.
Published: (2025)
by: Pan, Jiazhen, et al.
Published: (2025)
VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations
by: Dong, Lu, et al.
Published: (2025)
by: Dong, Lu, et al.
Published: (2025)
DI3CL: Contrastive Learning With Dynamic Instances and Contour Consistency for SAR Land-Cover Classification Foundation Model
by: Ren, Zhongle, et al.
Published: (2025)
by: Ren, Zhongle, et al.
Published: (2025)
Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs
by: Zhang, Xintong, et al.
Published: (2025)
by: Zhang, Xintong, et al.
Published: (2025)
Clinically-Grounded Counterfactual Reasoning for Medical Video Diagnosis
by: Gao, Jianzhe, et al.
Published: (2026)
by: Gao, Jianzhe, et al.
Published: (2026)
Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery
by: Zhang, Fan, et al.
Published: (2024)
by: Zhang, Fan, et al.
Published: (2024)
VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation
by: Zhao, Yiming, et al.
Published: (2026)
by: Zhao, Yiming, et al.
Published: (2026)
AdaptMMBench: Benchmarking Adaptive Multimodal Reasoning for Mode Selection and Reasoning Process
by: Zhang, Xintong, et al.
Published: (2026)
by: Zhang, Xintong, et al.
Published: (2026)
Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
by: Xia, Jiaer, et al.
Published: (2025)
by: Xia, Jiaer, et al.
Published: (2025)
Tempo-R0: A Video-MLLM for Temporal Video Grounding through Efficient Temporal Sensing Reinforcement Learning
by: Yue, Feng, et al.
Published: (2025)
by: Yue, Feng, et al.
Published: (2025)
Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models
by: Maaz, Muhammad, et al.
Published: (2025)
by: Maaz, Muhammad, et al.
Published: (2025)
Fast and Efficient: Mask Neural Fields for 3D Scene Segmentation
by: Gao, Zihan, et al.
Published: (2024)
by: Gao, Zihan, et al.
Published: (2024)
Domain-aware Category-level Geometry Learning Segmentation for 3D Point Clouds
by: He, Pei, et al.
Published: (2025)
by: He, Pei, et al.
Published: (2025)
Think with Grounding: Curriculum Reinforced Reasoning with Video Grounding for Long Video Understanding
by: Chen, Houlun, et al.
Published: (2026)
by: Chen, Houlun, et al.
Published: (2026)
Foresee-to-Ground: From Predictive Temporal Perception to Evidence-Driven Reasoning for Video Temporal Grounding
by: Zheng, Zelin, et al.
Published: (2026)
by: Zheng, Zelin, et al.
Published: (2026)
GraphThinker: Reinforcing Temporally Grounded Video Reasoning with Event Graph Thinking
by: Cheng, Zixu, et al.
Published: (2026)
by: Cheng, Zixu, et al.
Published: (2026)
MedReason-R1: Learning to Reason for CT Diagnosis with Reinforcement Learning and Local Zoom
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification
by: Ma, Yanbiao, et al.
Published: (2024)
by: Ma, Yanbiao, et al.
Published: (2024)
MOSS-ChatV: Reinforcement Learning with Process Reasoning Reward for Video Temporal Reasoning
by: Tao, Sicheng, et al.
Published: (2025)
by: Tao, Sicheng, et al.
Published: (2025)
Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding
by: Huang, Yanxiang, et al.
Published: (2026)
by: Huang, Yanxiang, et al.
Published: (2026)
Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
by: Yuan, Haobo, et al.
Published: (2025)
by: Yuan, Haobo, et al.
Published: (2025)
InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception
by: Li, Haijie, et al.
Published: (2024)
by: Li, Haijie, et al.
Published: (2024)
SDI-Paste: Synthetic Dynamic Instance Copy-Paste for Video Instance Segmentation
by: Shrestha, Sahir, et al.
Published: (2024)
by: Shrestha, Sahir, et al.
Published: (2024)
Similar Items
-
Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement Learning
by: Cao, Meng, et al.
Published: (2025) -
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
by: Yao, Jiali, et al.
Published: (2025) -
STSeg-Complex Video Object Segmentation: The 1st Solution for 4th PVUW MOSE Challenge
by: Song, Kehuan, et al.
Published: (2025) -
Edit-Your-Interest: Efficient Video Editing via Feature Most-Similar Propagation
by: Zuo, Yi, et al.
Published: (2025) -
DEVIS-GRPO: Unleashing GRPO on Dynamic Extreme View Synthesis
by: Zuo, Yi, et al.
Published: (2026)