Saved in:
| Main Authors: | Dreyer, Florian, Kolos, Ekaterina, Matiash, Daria |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.01064 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SciMDR: Advancing Scientific Multimodal Document Reasoning
by: Chen, Ziyu, et al.
Published: (2026)
by: Chen, Ziyu, et al.
Published: (2026)
MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
by: Kil, Jihyung, et al.
Published: (2024)
by: Kil, Jihyung, et al.
Published: (2024)
Generative Universal Verifier as Multimodal Meta-Reasoner
by: Zhang, Xinchen, et al.
Published: (2025)
by: Zhang, Xinchen, et al.
Published: (2025)
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot
by: Jiang, Ruixiang, et al.
Published: (2025)
by: Jiang, Ruixiang, et al.
Published: (2025)
DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs
by: Chiu, Bo-Cheng, et al.
Published: (2025)
by: Chiu, Bo-Cheng, et al.
Published: (2025)
Cross-modal Information Flow in Multimodal Large Language Models
by: Zhang, Zhi, et al.
Published: (2024)
by: Zhang, Zhi, et al.
Published: (2024)
ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?
by: Zhang, Leixin, et al.
Published: (2024)
by: Zhang, Leixin, et al.
Published: (2024)
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
by: Zhou, Shijie, et al.
Published: (2025)
by: Zhou, Shijie, et al.
Published: (2025)
EasyGen: Easing Multimodal Generation with BiDiffuser and LLMs
by: Zhao, Xiangyu, et al.
Published: (2023)
by: Zhao, Xiangyu, et al.
Published: (2023)
PosterSum: A Multimodal Benchmark for Scientific Poster Summarization
by: Saxena, Rohit, et al.
Published: (2025)
by: Saxena, Rohit, et al.
Published: (2025)
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
by: Pramanick, Shraman, et al.
Published: (2024)
by: Pramanick, Shraman, et al.
Published: (2024)
VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos
by: Song, Tingyu, et al.
Published: (2025)
by: Song, Tingyu, et al.
Published: (2025)
MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
by: Yu, Shoubin, et al.
Published: (2025)
by: Yu, Shoubin, et al.
Published: (2025)
Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs
by: Fu, Xingyu, et al.
Published: (2025)
by: Fu, Xingyu, et al.
Published: (2025)
How Multimodal LLMs Solve Image Tasks: A Lens on Visual Grounding, Task Reasoning, and Answer Decoding
by: Yu, Zhuoran, et al.
Published: (2025)
by: Yu, Zhuoran, et al.
Published: (2025)
MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding
by: Li, Zekun, et al.
Published: (2024)
by: Li, Zekun, et al.
Published: (2024)
Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SciCap Challenge 2023
by: Hsu, Ting-Yao E., et al.
Published: (2025)
by: Hsu, Ting-Yao E., et al.
Published: (2025)
Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs
by: Li, Yunxin, et al.
Published: (2023)
by: Li, Yunxin, et al.
Published: (2023)
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation
by: Leng, Jixuan, et al.
Published: (2025)
by: Leng, Jixuan, et al.
Published: (2025)
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
by: Caffagni, Davide, et al.
Published: (2024)
by: Caffagni, Davide, et al.
Published: (2024)
Multimodal Chain-of-Thought Reasoning in Language Models
by: Zhang, Zhuosheng, et al.
Published: (2023)
by: Zhang, Zhuosheng, et al.
Published: (2023)
Multimodal Fact-Level Attribution for Verifiable Reasoning
by: Wan, David, et al.
Published: (2026)
by: Wan, David, et al.
Published: (2026)
Analyzing Finetuning Representation Shift for Multimodal LLMs Steering
by: Khayatan, Pegah, et al.
Published: (2025)
by: Khayatan, Pegah, et al.
Published: (2025)
SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems
by: Guo, Ziyu, et al.
Published: (2025)
by: Guo, Ziyu, et al.
Published: (2025)
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
by: Papi, Sara, et al.
Published: (2025)
by: Papi, Sara, et al.
Published: (2025)
LLMs Meet Multimodal Generation and Editing: A Survey
by: He, Yingqing, et al.
Published: (2024)
by: He, Yingqing, et al.
Published: (2024)
Exploring Compositional Generalization of Multimodal LLMs for Medical Imaging
by: Cai, Zhenyang, et al.
Published: (2024)
by: Cai, Zhenyang, et al.
Published: (2024)
DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry
by: Cai, Zhenyang, et al.
Published: (2025)
by: Cai, Zhenyang, et al.
Published: (2025)
ResAdapt: Adaptive Resolution for Efficient Multimodal Reasoning
by: Liao, Huanxuan, et al.
Published: (2026)
by: Liao, Huanxuan, et al.
Published: (2026)
Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs
by: Saxena, Rohit, et al.
Published: (2025)
by: Saxena, Rohit, et al.
Published: (2025)
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
by: Fu, Chaoyou, et al.
Published: (2024)
by: Fu, Chaoyou, et al.
Published: (2024)
SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards
by: Batra, Hunar, et al.
Published: (2025)
by: Batra, Hunar, et al.
Published: (2025)
MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning
by: Jiang, Yulun, et al.
Published: (2025)
by: Jiang, Yulun, et al.
Published: (2025)
MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning
by: Ashraf, Tajamul, et al.
Published: (2025)
by: Ashraf, Tajamul, et al.
Published: (2025)
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
by: Wang, Yuqing, et al.
Published: (2023)
by: Wang, Yuqing, et al.
Published: (2023)
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research
by: Burgess, James, et al.
Published: (2025)
by: Burgess, James, et al.
Published: (2025)
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
by: Sarto, Sara, et al.
Published: (2025)
by: Sarto, Sara, et al.
Published: (2025)
Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring
by: Onsu, Murat Arda, et al.
Published: (2025)
by: Onsu, Murat Arda, et al.
Published: (2025)
Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation
by: Patil, Vaidehi, et al.
Published: (2025)
by: Patil, Vaidehi, et al.
Published: (2025)
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions
by: Zhang, Jiarui, et al.
Published: (2024)
by: Zhang, Jiarui, et al.
Published: (2024)
Similar Items
-
SciMDR: Advancing Scientific Multimodal Document Reasoning
by: Chen, Ziyu, et al.
Published: (2026) -
MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
by: Kil, Jihyung, et al.
Published: (2024) -
Generative Universal Verifier as Multimodal Meta-Reasoner
by: Zhang, Xinchen, et al.
Published: (2025) -
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot
by: Jiang, Ruixiang, et al.
Published: (2025) -
DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs
by: Chiu, Bo-Cheng, et al.
Published: (2025)