:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Shuhang, Yuan, Hangjie, Xu, Yunqiu, Liu, Pengwei, Feng, Tao, Cen, Jun, Huang, Zeying, Yang, Yi
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.16549
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving
by: Chen, Shuhang, et al.
Published: (2026)

SAMora: Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images
by: Chen, Shuhang, et al.
Published: (2025)

MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
by: Xu, Yunqiu, et al.
Published: (2024)

Echoes of ownership: Adversarial-guided dual injection for copyright protection in MLLMs
by: Xia, Chengwei, et al.
Published: (2026)

LumosFlow: Motion-Guided Long Video Generation
by: Chen, Jiahao, et al.
Published: (2025)

Knowledge is Power: Advancing Few-shot Action Recognition with Multimodal Semantics from MLLMs
by: Xing, Jiazheng, et al.
Published: (2026)

ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think
by: Feng, Tao, et al.
Published: (2025)

Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs
by: Zhang, Shan, et al.
Published: (2025)

Lumos-1: On Autoregressive Video Generation with Discrete Diffusion from a Unified Model Perspective
by: Yuan, Hangjie, et al.
Published: (2025)

DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing
by: Cai, Lingling, et al.
Published: (2025)

Aesthetic Image Captioning with Saliency Enhanced MLLMs
by: Tao, Yilin, et al.
Published: (2025)

GridPrune: From "Where to Look" to "What to Select" in Visual Token Pruning for MLLMs
by: Duan, Yuxiang, et al.
Published: (2025)

Curriculum Sampling: A Two-Phase Curriculum for Efficient Training of Flow Matching
by: Sun, Pengwei
Published: (2026)

Perceptual Flow Network for Visually Grounded Reasoning
by: Li, Yangfu, et al.
Published: (2026)

The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs
by: Anvekar, Tejas, et al.
Published: (2025)

Math Blind: Failures in Diagram Understanding Undermine Reasoning in MLLMs
by: Sun, Yanpeng, et al.
Published: (2025)

Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective
by: Lu, Aojun, et al.
Published: (2025)

IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting
by: Zhang, Tao, et al.
Published: (2025)

Do MLLMs Exhibit Human-like Perceptual Behaviors? HVSBench: A Benchmark for MLLM Alignment with Human Perceptual Behavior
by: Lin, Jiaying, et al.
Published: (2024)

Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
by: Yin, Hao, et al.
Published: (2025)

MathSticks: A Benchmark for Visual Symbolic Compositional Reasoning with Matchstick Puzzles
by: Ji, Yuheng, et al.
Published: (2025)

We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
by: Qiao, Runqi, et al.
Published: (2025)

VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
by: Yilmaz, Nilay, et al.
Published: (2025)

When Policy Entropy Constraint Fails: Preserving Diversity in Flow-based RLHF via Perceptual Entropy
by: Tan, Xiaofeng, et al.
Published: (2026)

MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning
by: Shi, Weikang, et al.
Published: (2025)

Adapt before Continual Learning
by: Lu, Aojun, et al.
Published: (2025)

Why Does RL Generalize Better Than SFT? A Data-Centric Perspective on VLM Post-Training
by: Lu, Aojun, et al.
Published: (2026)

Revisiting Neural Networks for Continual Learning: An Architectural Perspective
by: Lu, Aojun, et al.
Published: (2024)

Context Tokens are Anchors: Understanding the Repetition Curse in dMLLMs from an Information Flow Perspective
by: Zhao, Qiyan, et al.
Published: (2026)

VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning
by: Ma, Jingkun, et al.
Published: (2024)

GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts
by: Yuan, Fan, et al.
Published: (2025)

VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking
by: Meng, Desen, et al.
Published: (2025)

PriOr-Flow: Enhancing Primitive Panoramic Optical Flow with Orthogonal View
by: Liu, Longliang, et al.
Published: (2025)

Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models
by: Liu, Ziwei, et al.
Published: (2025)

Law of Vision Representation in MLLMs
by: Yang, Shijia, et al.
Published: (2024)

ControlGUI: Guiding Generative GUI Exploration through Perceptual Visual Flow
by: Garg, Aryan, et al.
Published: (2025)

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
by: Han, Xiaotian, et al.
Published: (2024)

Human Cognitive Benchmarks Reveal Foundational Visual Gaps in MLLMs
by: Huang, Jen-Tse, et al.
Published: (2025)

UAVBench and UAVIT-1M: Benchmarking and Enhancing MLLMs for Low-Altitude UAV Vision-Language Understanding
by: Zhan, Yang, et al.
Published: (2026)

A Faster Path to Continual Learning
by: Li, Wei, et al.
Published: (2026)