Saved in:
| Main Authors: | Wang, Yifan, Fu, Yun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.03950 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning
by: Jiang, Yifan, et al.
Published: (2024)
by: Jiang, Yifan, et al.
Published: (2024)
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
by: Gao, Minghe, et al.
Published: (2025)
by: Gao, Minghe, et al.
Published: (2025)
Breaking Dual Bottlenecks: Evolving Unified Multimodal Models into Self-Adaptive Interleaved Visual Reasoners
by: Liu, Qingyang, et al.
Published: (2026)
by: Liu, Qingyang, et al.
Published: (2026)
A Stepwise Distillation Learning Strategy for Non-differentiable Visual Programming Frameworks on Visual Reasoning Tasks
by: Wan, Wentao, et al.
Published: (2023)
by: Wan, Wentao, et al.
Published: (2023)
Visual Superordinate Abstraction for Robust Concept Learning
by: Zheng, Qi, et al.
Published: (2022)
by: Zheng, Qi, et al.
Published: (2022)
MV-CoRe: Multimodal Visual-Conceptual Reasoning for Complex Visual Question Answering
by: Peng, Jingwei, et al.
Published: (2025)
by: Peng, Jingwei, et al.
Published: (2025)
Visual Prompt Based Reasoning for Offroad Mapping using Multimodal LLMs
by: Nasser, Abdelmoamen, et al.
Published: (2026)
by: Nasser, Abdelmoamen, et al.
Published: (2026)
Enhancing Medical Visual Grounding via Knowledge-guided Spatial Prompts
by: Gao, Yifan, et al.
Published: (2026)
by: Gao, Yifan, et al.
Published: (2026)
FlexAC: Towards Flexible Control of Associative Reasoning in Multimodal Large Language Models
by: Yuan, Shengming, et al.
Published: (2025)
by: Yuan, Shengming, et al.
Published: (2025)
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
by: Du, Yifan, et al.
Published: (2023)
by: Du, Yifan, et al.
Published: (2023)
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models
by: Cai, Huanqia, et al.
Published: (2025)
by: Cai, Huanqia, et al.
Published: (2025)
Temporal Adaptive RGBT Tracking with Modality Prompt
by: Wang, Hongyu, et al.
Published: (2024)
by: Wang, Hongyu, et al.
Published: (2024)
InterSketch: An Interleaved Reasoning Model with Self-correcting Visual Sketch and Stepwise Reward
by: Ning, Zhiwei, et al.
Published: (2026)
by: Ning, Zhiwei, et al.
Published: (2026)
Is a Picture Worth a Thousand Words? Adaptive Multimodal Fact-Checking with Visual Evidence Necessity
by: Jung, Jaeyoon, et al.
Published: (2026)
by: Jung, Jaeyoon, et al.
Published: (2026)
TableVista: Benchmarking Multimodal Table Reasoning under Visual and Structural Complexity
by: Yang, Zheyuan, et al.
Published: (2026)
by: Yang, Zheyuan, et al.
Published: (2026)
Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs
by: Sinha, Rohit, et al.
Published: (2026)
by: Sinha, Rohit, et al.
Published: (2026)
Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning
by: Zhang, Zhifang, et al.
Published: (2024)
by: Zhang, Zhifang, et al.
Published: (2024)
Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs
by: Zhang, Huanyu, et al.
Published: (2025)
by: Zhang, Huanyu, et al.
Published: (2025)
OmniSch: A Multimodal PCB Schematic Benchmark For Structured Diagram Visual Reasoning
by: Lu, Taiting, et al.
Published: (2026)
by: Lu, Taiting, et al.
Published: (2026)
CrossCheck-Bench: Diagnosing Compositional Failures in Multimodal Conflict Resolution
by: Tian, Baoliang, et al.
Published: (2025)
by: Tian, Baoliang, et al.
Published: (2025)
Spatially Prompted Visual Trajectory Prediction for Egocentric Manipulation
by: Li, Yifan, et al.
Published: (2026)
by: Li, Yifan, et al.
Published: (2026)
UniVLR: Unifying Text and Vision in Visual Latent Reasoning for Multimodal LLMs
by: Jiang, Houcheng, et al.
Published: (2026)
by: Jiang, Houcheng, et al.
Published: (2026)
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency
by: Wang, Zhikai, et al.
Published: (2025)
by: Wang, Zhikai, et al.
Published: (2025)
The Abstraction Gap in Vision-Language Causal Reasoning
by: Hoang, Chinh, et al.
Published: (2026)
by: Hoang, Chinh, et al.
Published: (2026)
WIR3D: Visually-Informed and Geometry-Aware 3D Shape Abstraction
by: Liu, Richard, et al.
Published: (2025)
by: Liu, Richard, et al.
Published: (2025)
Multilingual OCR-Aware Fine-Tuning and Prompt-Guided Chain-of-Thought Reasoning for Multimodal Large Language Models
by: Xu, Qinwu, et al.
Published: (2026)
by: Xu, Qinwu, et al.
Published: (2026)
Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models
by: Zhou, Qiji, et al.
Published: (2024)
by: Zhou, Qiji, et al.
Published: (2024)
SceneAlign: Aligning Multimodal Reasoning to Scene Graphs in Complex Visual Scenes
by: Wang, Chuhan, et al.
Published: (2026)
by: Wang, Chuhan, et al.
Published: (2026)
Teach Me Sign: Stepwise Prompting LLM for Sign Language Production
by: An, Zhaoyi, et al.
Published: (2025)
by: An, Zhaoyi, et al.
Published: (2025)
VP-MEL: Visual Prompts Guided Multimodal Entity Linking
by: Mi, Hongze, et al.
Published: (2024)
by: Mi, Hongze, et al.
Published: (2024)
TRACE: Task-Adaptive Reasoning and Representation Learning for Universal Multimodal Retrieval
by: Hao, Xiangzhao, et al.
Published: (2026)
by: Hao, Xiangzhao, et al.
Published: (2026)
CVBench: Benchmarking Cross-Video Synergies for Complex Multimodal Reasoning
by: Zhu, Nannan, et al.
Published: (2025)
by: Zhu, Nannan, et al.
Published: (2025)
Abstraction in Style
by: Lu, Min, et al.
Published: (2026)
by: Lu, Min, et al.
Published: (2026)
MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning
by: Shi, Weikang, et al.
Published: (2025)
by: Shi, Weikang, et al.
Published: (2025)
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
by: Wang, Jianghui, et al.
Published: (2023)
by: Wang, Jianghui, et al.
Published: (2023)
Audio-Guided Visual Editing with Complex Multi-Modal Prompts
by: Kim, Hyeonyu, et al.
Published: (2025)
by: Kim, Hyeonyu, et al.
Published: (2025)
Towards Global Optimal Visual In-Context Learning Prompt Selection
by: Xu, Chengming, et al.
Published: (2024)
by: Xu, Chengming, et al.
Published: (2024)
Graph-of-Mark: Promote Spatial Reasoning in Multimodal Language Models with Graph-Based Visual Prompting
by: Frisoni, Giacomo, et al.
Published: (2026)
by: Frisoni, Giacomo, et al.
Published: (2026)
Learning Adaptive Reasoning Paths for Efficient Visual Reasoning
by: Huang, Yixu, et al.
Published: (2026)
by: Huang, Yixu, et al.
Published: (2026)
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
by: Zhang, Yue, et al.
Published: (2024)
by: Zhang, Yue, et al.
Published: (2024)
Similar Items
-
MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning
by: Jiang, Yifan, et al.
Published: (2024) -
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
by: Gao, Minghe, et al.
Published: (2025) -
Breaking Dual Bottlenecks: Evolving Unified Multimodal Models into Self-Adaptive Interleaved Visual Reasoners
by: Liu, Qingyang, et al.
Published: (2026) -
A Stepwise Distillation Learning Strategy for Non-differentiable Visual Programming Frameworks on Visual Reasoning Tasks
by: Wan, Wentao, et al.
Published: (2023) -
Visual Superordinate Abstraction for Robust Concept Learning
by: Zheng, Qi, et al.
Published: (2022)