Saved in:
| Main Authors: | Yi, Zhihang, Zhao, Jian, Lv, Jiancheng, Wang, Tao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.10138 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding
by: Kondic, Jovana, et al.
Published: (2026)
by: Kondic, Jovana, et al.
Published: (2026)
ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding
by: Xu, Zhengzhuo, et al.
Published: (2024)
by: Xu, Zhengzhuo, et al.
Published: (2024)
On Pre-training of Multimodal Language Models Customized for Chart Understanding
by: Fan, Wan-Cyuan, et al.
Published: (2024)
by: Fan, Wan-Cyuan, et al.
Published: (2024)
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models
by: Huang, Kung-Hsiang, et al.
Published: (2024)
by: Huang, Kung-Hsiang, et al.
Published: (2024)
In-Depth and In-Breadth: Pre-training Multimodal Language Models Customized for Comprehensive Chart Understanding
by: Fan, Wan-Cyuan, et al.
Published: (2025)
by: Fan, Wan-Cyuan, et al.
Published: (2025)
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
by: Yilmaz, Nilay, et al.
Published: (2025)
by: Yilmaz, Nilay, et al.
Published: (2025)
ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering
by: Wu, Yifan, et al.
Published: (2024)
by: Wu, Yifan, et al.
Published: (2024)
PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document Understanding
by: Huang, Kui, et al.
Published: (2025)
by: Huang, Kui, et al.
Published: (2025)
Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations
by: Li, Shuo, et al.
Published: (2025)
by: Li, Shuo, et al.
Published: (2025)
ChartHal: A Fine-grained Framework Evaluating Hallucination of Large Vision Language Models in Chart Understanding
by: Wang, Xingqi, et al.
Published: (2025)
by: Wang, Xingqi, et al.
Published: (2025)
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks
by: Ni, Feng, et al.
Published: (2025)
by: Ni, Feng, et al.
Published: (2025)
Can MLLMs Understand the Deep Implication Behind Chinese Images?
by: Zhang, Chenhao, et al.
Published: (2024)
by: Zhang, Chenhao, et al.
Published: (2024)
AskChart: Universal Chart Understanding through Textual Enhancement
by: Yang, Xudong, et al.
Published: (2024)
by: Yang, Xudong, et al.
Published: (2024)
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
by: Fu, Chaoyou, et al.
Published: (2024)
by: Fu, Chaoyou, et al.
Published: (2024)
VideoScaffold: Elastic-Scale Visual Hierarchies for Streaming Video Understanding in MLLMs
by: Zheng, Naishan, et al.
Published: (2025)
by: Zheng, Naishan, et al.
Published: (2025)
ChartCap: Mitigating Hallucination of Dense Chart Captioning
by: Lim, Junyoung, et al.
Published: (2025)
by: Lim, Junyoung, et al.
Published: (2025)
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
by: Zhang, Jiarui, et al.
Published: (2025)
by: Zhang, Jiarui, et al.
Published: (2025)
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
by: Masry, Ahmed, et al.
Published: (2024)
by: Masry, Ahmed, et al.
Published: (2024)
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding
by: Huang, Kung-Hsiang, et al.
Published: (2025)
by: Huang, Kung-Hsiang, et al.
Published: (2025)
Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models
by: Zhu, Yinglun, et al.
Published: (2025)
by: Zhu, Yinglun, et al.
Published: (2025)
Advancing Object Detection in Transportation with Multimodal Large Language Models (MLLMs): A Comprehensive Review and Empirical Testing
by: Ashqar, Huthaifa I., et al.
Published: (2024)
by: Ashqar, Huthaifa I., et al.
Published: (2024)
A Survey on Agentic Multimodal Large Language Models
by: Yao, Huanjin, et al.
Published: (2025)
by: Yao, Huanjin, et al.
Published: (2025)
VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety
by: Palaskar, Shruti, et al.
Published: (2025)
by: Palaskar, Shruti, et al.
Published: (2025)
A Survey on Benchmarks of Multimodal Large Language Models
by: Li, Jian, et al.
Published: (2024)
by: Li, Jian, et al.
Published: (2024)
GRIT: Teaching MLLMs to Think with Images
by: Fan, Yue, et al.
Published: (2025)
by: Fan, Yue, et al.
Published: (2025)
MULTI: Multimodal Understanding Leaderboard with Text and Images
by: Zhu, Zichen, et al.
Published: (2024)
by: Zhu, Zichen, et al.
Published: (2024)
VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction
by: Wang, Hao, et al.
Published: (2025)
by: Wang, Hao, et al.
Published: (2025)
Large Multimodal Agents: A Survey
by: Xie, Junlin, et al.
Published: (2024)
by: Xie, Junlin, et al.
Published: (2024)
MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI
by: Yao, Huanjin, et al.
Published: (2025)
by: Yao, Huanjin, et al.
Published: (2025)
MANTA: Cross-Modal Semantic Alignment and Information-Theoretic Optimization for Long-form Multimodal Understanding
by: Zhong, Ziqi, et al.
Published: (2025)
by: Zhong, Ziqi, et al.
Published: (2025)
PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
by: Chen, Liang, et al.
Published: (2024)
by: Chen, Liang, et al.
Published: (2024)
SUDER: Self-Improving Unified Large Multimodal Models for Understanding and Generation with Dual Self-Rewards
by: Hong, Jixiang, et al.
Published: (2025)
by: Hong, Jixiang, et al.
Published: (2025)
On the Limits of Token Reduction for Efficient Unified Vision Language Training
by: Chen, Siyi, et al.
Published: (2026)
by: Chen, Siyi, et al.
Published: (2026)
Real-Time Multimodal Cognitive Assistant for Emergency Medical Services
by: Weerasinghe, Keshara, et al.
Published: (2024)
by: Weerasinghe, Keshara, et al.
Published: (2024)
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
by: Wang, Haochen, et al.
Published: (2025)
by: Wang, Haochen, et al.
Published: (2025)
Cube Bench: A Benchmark for Spatial Visual Reasoning in MLLMs
by: Anand, Dhruv, et al.
Published: (2025)
by: Anand, Dhruv, et al.
Published: (2025)
AdaCodec: A Predictive Visual Code for Video MLLMs
by: Hou, Haowen, et al.
Published: (2026)
by: Hou, Haowen, et al.
Published: (2026)
LEMON: How Well Do MLLMs Perform Temporal Multimodal Understanding on Instructional Videos?
by: Yu, Zhuang, et al.
Published: (2026)
by: Yu, Zhuang, et al.
Published: (2026)
SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials
by: Kim, Wonjoong, et al.
Published: (2024)
by: Kim, Wonjoong, et al.
Published: (2024)
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
by: Ma, Yiyang, et al.
Published: (2024)
by: Ma, Yiyang, et al.
Published: (2024)
Similar Items
-
ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding
by: Kondic, Jovana, et al.
Published: (2026) -
ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding
by: Xu, Zhengzhuo, et al.
Published: (2024) -
On Pre-training of Multimodal Language Models Customized for Chart Understanding
by: Fan, Wan-Cyuan, et al.
Published: (2024) -
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models
by: Huang, Kung-Hsiang, et al.
Published: (2024) -
In-Depth and In-Breadth: Pre-training Multimodal Language Models Customized for Comprehensive Chart Understanding
by: Fan, Wan-Cyuan, et al.
Published: (2025)