Saved in:
| Main Authors: | Chen, Shuhang, Yuan, Hangjie, Xu, Yunqiu, Liu, Pengwei, Feng, Tao, Cen, Jun, Huang, Zeying, Yang, Yi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.16549 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving
by: Chen, Shuhang, et al.
Published: (2026)
by: Chen, Shuhang, et al.
Published: (2026)
SAMora: Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images
by: Chen, Shuhang, et al.
Published: (2025)
by: Chen, Shuhang, et al.
Published: (2025)
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
by: Xu, Yunqiu, et al.
Published: (2024)
by: Xu, Yunqiu, et al.
Published: (2024)
Echoes of ownership: Adversarial-guided dual injection for copyright protection in MLLMs
by: Xia, Chengwei, et al.
Published: (2026)
by: Xia, Chengwei, et al.
Published: (2026)
LumosFlow: Motion-Guided Long Video Generation
by: Chen, Jiahao, et al.
Published: (2025)
by: Chen, Jiahao, et al.
Published: (2025)
Knowledge is Power: Advancing Few-shot Action Recognition with Multimodal Semantics from MLLMs
by: Xing, Jiazheng, et al.
Published: (2026)
by: Xing, Jiazheng, et al.
Published: (2026)
ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think
by: Feng, Tao, et al.
Published: (2025)
by: Feng, Tao, et al.
Published: (2025)
Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs
by: Zhang, Shan, et al.
Published: (2025)
by: Zhang, Shan, et al.
Published: (2025)
Lumos-1: On Autoregressive Video Generation with Discrete Diffusion from a Unified Model Perspective
by: Yuan, Hangjie, et al.
Published: (2025)
by: Yuan, Hangjie, et al.
Published: (2025)
DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing
by: Cai, Lingling, et al.
Published: (2025)
by: Cai, Lingling, et al.
Published: (2025)
Aesthetic Image Captioning with Saliency Enhanced MLLMs
by: Tao, Yilin, et al.
Published: (2025)
by: Tao, Yilin, et al.
Published: (2025)
GridPrune: From "Where to Look" to "What to Select" in Visual Token Pruning for MLLMs
by: Duan, Yuxiang, et al.
Published: (2025)
by: Duan, Yuxiang, et al.
Published: (2025)
Curriculum Sampling: A Two-Phase Curriculum for Efficient Training of Flow Matching
by: Sun, Pengwei
Published: (2026)
by: Sun, Pengwei
Published: (2026)
Perceptual Flow Network for Visually Grounded Reasoning
by: Li, Yangfu, et al.
Published: (2026)
by: Li, Yangfu, et al.
Published: (2026)
The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs
by: Anvekar, Tejas, et al.
Published: (2025)
by: Anvekar, Tejas, et al.
Published: (2025)
Math Blind: Failures in Diagram Understanding Undermine Reasoning in MLLMs
by: Sun, Yanpeng, et al.
Published: (2025)
by: Sun, Yanpeng, et al.
Published: (2025)
Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective
by: Lu, Aojun, et al.
Published: (2025)
by: Lu, Aojun, et al.
Published: (2025)
IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting
by: Zhang, Tao, et al.
Published: (2025)
by: Zhang, Tao, et al.
Published: (2025)
Do MLLMs Exhibit Human-like Perceptual Behaviors? HVSBench: A Benchmark for MLLM Alignment with Human Perceptual Behavior
by: Lin, Jiaying, et al.
Published: (2024)
by: Lin, Jiaying, et al.
Published: (2024)
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
by: Yin, Hao, et al.
Published: (2025)
by: Yin, Hao, et al.
Published: (2025)
MathSticks: A Benchmark for Visual Symbolic Compositional Reasoning with Matchstick Puzzles
by: Ji, Yuheng, et al.
Published: (2025)
by: Ji, Yuheng, et al.
Published: (2025)
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
by: Qiao, Runqi, et al.
Published: (2025)
by: Qiao, Runqi, et al.
Published: (2025)
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
by: Yilmaz, Nilay, et al.
Published: (2025)
by: Yilmaz, Nilay, et al.
Published: (2025)
When Policy Entropy Constraint Fails: Preserving Diversity in Flow-based RLHF via Perceptual Entropy
by: Tan, Xiaofeng, et al.
Published: (2026)
by: Tan, Xiaofeng, et al.
Published: (2026)
MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning
by: Shi, Weikang, et al.
Published: (2025)
by: Shi, Weikang, et al.
Published: (2025)
Adapt before Continual Learning
by: Lu, Aojun, et al.
Published: (2025)
by: Lu, Aojun, et al.
Published: (2025)
Why Does RL Generalize Better Than SFT? A Data-Centric Perspective on VLM Post-Training
by: Lu, Aojun, et al.
Published: (2026)
by: Lu, Aojun, et al.
Published: (2026)
Revisiting Neural Networks for Continual Learning: An Architectural Perspective
by: Lu, Aojun, et al.
Published: (2024)
by: Lu, Aojun, et al.
Published: (2024)
Context Tokens are Anchors: Understanding the Repetition Curse in dMLLMs from an Information Flow Perspective
by: Zhao, Qiyan, et al.
Published: (2026)
by: Zhao, Qiyan, et al.
Published: (2026)
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning
by: Ma, Jingkun, et al.
Published: (2024)
by: Ma, Jingkun, et al.
Published: (2024)
GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts
by: Yuan, Fan, et al.
Published: (2025)
by: Yuan, Fan, et al.
Published: (2025)
VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking
by: Meng, Desen, et al.
Published: (2025)
by: Meng, Desen, et al.
Published: (2025)
PriOr-Flow: Enhancing Primitive Panoramic Optical Flow with Orthogonal View
by: Liu, Longliang, et al.
Published: (2025)
by: Liu, Longliang, et al.
Published: (2025)
Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models
by: Liu, Ziwei, et al.
Published: (2025)
by: Liu, Ziwei, et al.
Published: (2025)
Law of Vision Representation in MLLMs
by: Yang, Shijia, et al.
Published: (2024)
by: Yang, Shijia, et al.
Published: (2024)
ControlGUI: Guiding Generative GUI Exploration through Perceptual Visual Flow
by: Garg, Aryan, et al.
Published: (2025)
by: Garg, Aryan, et al.
Published: (2025)
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
by: Han, Xiaotian, et al.
Published: (2024)
by: Han, Xiaotian, et al.
Published: (2024)
Human Cognitive Benchmarks Reveal Foundational Visual Gaps in MLLMs
by: Huang, Jen-Tse, et al.
Published: (2025)
by: Huang, Jen-Tse, et al.
Published: (2025)
UAVBench and UAVIT-1M: Benchmarking and Enhancing MLLMs for Low-Altitude UAV Vision-Language Understanding
by: Zhan, Yang, et al.
Published: (2026)
by: Zhan, Yang, et al.
Published: (2026)
A Faster Path to Continual Learning
by: Li, Wei, et al.
Published: (2026)
by: Li, Wei, et al.
Published: (2026)
Similar Items
-
CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving
by: Chen, Shuhang, et al.
Published: (2026) -
SAMora: Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images
by: Chen, Shuhang, et al.
Published: (2025) -
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
by: Xu, Yunqiu, et al.
Published: (2024) -
Echoes of ownership: Adversarial-guided dual injection for copyright protection in MLLMs
by: Xia, Chengwei, et al.
Published: (2026) -
LumosFlow: Motion-Guided Long Video Generation
by: Chen, Jiahao, et al.
Published: (2025)