Saved in:
| Main Authors: | Feng, Sicheng, Wang, Song, Ouyang, Shuyi, Kong, Lingdong, Song, Zikai, Zhu, Jianke, Wang, Huan, Wang, Xinchao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.18675 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
by: Feng, Sicheng, et al.
Published: (2025)
by: Feng, Sicheng, et al.
Published: (2025)
PixelThink: Towards Efficient Chain-of-Pixel Reasoning
by: Wang, Song, et al.
Published: (2025)
by: Wang, Song, et al.
Published: (2025)
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
by: Wang, Song, et al.
Published: (2025)
by: Wang, Song, et al.
Published: (2025)
Hypergraph-State Collaborative Reasoning for Multi-Object Tracking
by: Song, Zikai, et al.
Published: (2026)
by: Song, Zikai, et al.
Published: (2026)
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
by: Wu, Xueqing, et al.
Published: (2024)
by: Wu, Xueqing, et al.
Published: (2024)
CLiViS: Unleashing Cognitive Map through Linguistic-Visual Synergy for Embodied Visual Reasoning
by: Li, Kailing, et al.
Published: (2025)
by: Li, Kailing, et al.
Published: (2025)
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
by: Shen, Yifan, et al.
Published: (2025)
by: Shen, Yifan, et al.
Published: (2025)
ConciseHint: Boosting Efficient Reasoning via Continuous Concise Hints during Generation
by: Tang, Siao, et al.
Published: (2025)
by: Tang, Siao, et al.
Published: (2025)
TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning
by: von Klinski, Maximilian, et al.
Published: (2026)
by: von Klinski, Maximilian, et al.
Published: (2026)
ASPO: Adaptive Sentence-Level Preference Optimization for Fine-Grained Multimodal Reasoning
by: Wang, Yeyuan, et al.
Published: (2025)
by: Wang, Yeyuan, et al.
Published: (2025)
Integrating Fine-Grained Audio-Visual Evidence for Robust Multimodal Emotion Reasoning
by: Zhao, Zhixian, et al.
Published: (2026)
by: Zhao, Zhixian, et al.
Published: (2026)
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
by: Du, Yifan, et al.
Published: (2023)
by: Du, Yifan, et al.
Published: (2023)
Uncertainty-Instructed Structure Injection for Generalizable HD Map Construction
by: Liu, Xiaolu, et al.
Published: (2025)
by: Liu, Xiaolu, et al.
Published: (2025)
VGR: Visual Grounded Reasoning
by: Wang, Jiacong, et al.
Published: (2025)
by: Wang, Jiacong, et al.
Published: (2025)
MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction
by: Liu, Xiaolu, et al.
Published: (2024)
by: Liu, Xiaolu, et al.
Published: (2024)
ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities
by: Zhu, Chenming, et al.
Published: (2024)
by: Zhu, Chenming, et al.
Published: (2024)
Learning Adaptive Reasoning Paths for Efficient Visual Reasoning
by: Huang, Yixu, et al.
Published: (2026)
by: Huang, Yixu, et al.
Published: (2026)
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
by: Song, Wei, et al.
Published: (2025)
by: Song, Wei, et al.
Published: (2025)
Latent Visual Reasoning
by: Li, Bangzheng, et al.
Published: (2025)
by: Li, Bangzheng, et al.
Published: (2025)
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems
by: Wang, Song, et al.
Published: (2025)
by: Wang, Song, et al.
Published: (2025)
SafeMap: Robust HD Map Construction from Incomplete Observations
by: Hao, Xiaoshuai, et al.
Published: (2025)
by: Hao, Xiaoshuai, et al.
Published: (2025)
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
by: Wang, Weiyun, et al.
Published: (2025)
by: Wang, Weiyun, et al.
Published: (2025)
A Coarse-to-Fine Approach to Multi-Modality 3D Occupancy Grounding
by: Shi, Zhan, et al.
Published: (2025)
by: Shi, Zhan, et al.
Published: (2025)
$A^2R^2$: Advancing Img2LaTeX Conversion via Visual Reasoning with Attention-Guided Refinement
by: Li, Zhecheng, et al.
Published: (2025)
by: Li, Zhecheng, et al.
Published: (2025)
Efficient Reasoning Models: A Survey
by: Feng, Sicheng, et al.
Published: (2025)
by: Feng, Sicheng, et al.
Published: (2025)
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
by: Wang, Haochen, et al.
Published: (2025)
by: Wang, Haochen, et al.
Published: (2025)
Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning
by: Bai, Tianyi, et al.
Published: (2025)
by: Bai, Tianyi, et al.
Published: (2025)
ViewFusion: Structured Spatial Thinking Chains for Multi-View Reasoning
by: Tao, Xingjian, et al.
Published: (2026)
by: Tao, Xingjian, et al.
Published: (2026)
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
by: Zhou, Ziwei, et al.
Published: (2025)
by: Zhou, Ziwei, et al.
Published: (2025)
On Data Synthesis and Post-training for Visual Abstract Reasoning
by: Zhu, Ke, et al.
Published: (2025)
by: Zhu, Ke, et al.
Published: (2025)
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
by: Zhang, Wenqi, et al.
Published: (2025)
by: Zhang, Wenqi, et al.
Published: (2025)
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
by: Wei, Yana, et al.
Published: (2025)
by: Wei, Yana, et al.
Published: (2025)
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
by: Song, Mingyang, et al.
Published: (2026)
by: Song, Mingyang, et al.
Published: (2026)
VAEER: Visual Attention-Inspired Emotion Elicitation Reasoning
by: Man, Fanhang, et al.
Published: (2025)
by: Man, Fanhang, et al.
Published: (2025)
Interleaved Latent Visual Reasoning with Selective Perceptual Modeling
by: Dong, Shuai, et al.
Published: (2025)
by: Dong, Shuai, et al.
Published: (2025)
VisionPangu: A Compact and Fine-Grained Multimodal Assistant with 1.7B Parameters
by: Fan, Jiaxin, et al.
Published: (2026)
by: Fan, Jiaxin, et al.
Published: (2026)
MambaMap: Online Vectorized HD Map Construction using State Space Model
by: Yang, Ruizi, et al.
Published: (2025)
by: Yang, Ruizi, et al.
Published: (2025)
Forest Before Trees: Latent Superposition for Efficient Visual Reasoning
by: Wang, Yubo, et al.
Published: (2026)
by: Wang, Yubo, et al.
Published: (2026)
PROGRESSLM: Towards Progress Reasoning in Vision-Language Models
by: Zhang, Jianshu, et al.
Published: (2026)
by: Zhang, Jianshu, et al.
Published: (2026)
VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs
by: Liu, Peng, et al.
Published: (2025)
by: Liu, Peng, et al.
Published: (2025)
Similar Items
-
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
by: Feng, Sicheng, et al.
Published: (2025) -
PixelThink: Towards Efficient Chain-of-Pixel Reasoning
by: Wang, Song, et al.
Published: (2025) -
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
by: Wang, Song, et al.
Published: (2025) -
Hypergraph-State Collaborative Reasoning for Multi-Object Tracking
by: Song, Zikai, et al.
Published: (2026) -
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
by: Wu, Xueqing, et al.
Published: (2024)