Saved in:
| Main Authors: | Gutteridge, Benjamin, Jackson, Matthew Thomas, Kukurin, Toni, Dong, Xiaowen |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.20295 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement
by: Jain, Chelsi, et al.
Published: (2025)
by: Jain, Chelsi, et al.
Published: (2025)
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
by: Ai, Jiaxin, et al.
Published: (2025)
by: Ai, Jiaxin, et al.
Published: (2025)
InkFM: A Foundational Model for Full-Page Online Handwritten Note Understanding
by: Fadeeva, Anastasiia, et al.
Published: (2025)
by: Fadeeva, Anastasiia, et al.
Published: (2025)
Robustness Evaluation of OCR-based Visual Document Understanding under Multi-Modal Adversarial Attacks
by: Tien, Dong Nguyen, et al.
Published: (2025)
by: Tien, Dong Nguyen, et al.
Published: (2025)
Seeing the Big Picture: Evaluating Multimodal LLMs' Ability to Interpret and Grade Handwritten Student Work
by: Henkel, Owen, et al.
Published: (2025)
by: Henkel, Owen, et al.
Published: (2025)
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding
by: Padhi, Trilok, et al.
Published: (2025)
by: Padhi, Trilok, et al.
Published: (2025)
Handwritten Text Recognition: A Survey
by: Garrido-Munoz, Carlos, et al.
Published: (2025)
by: Garrido-Munoz, Carlos, et al.
Published: (2025)
End-to-End Multi-Modal Diffusion Mamba
by: Lu, Chunhao, et al.
Published: (2025)
by: Lu, Chunhao, et al.
Published: (2025)
DohaScript: A Large-Scale Multi-Writer Dataset for Continuous Handwritten Hindi Text
by: Singh, Kunwar Arpit, et al.
Published: (2026)
by: Singh, Kunwar Arpit, et al.
Published: (2026)
DocR1: Evidence Page-Guided GRPO for Multi-Page Document Understanding
by: Xiong, Junyu, et al.
Published: (2025)
by: Xiong, Junyu, et al.
Published: (2025)
Use of Metric Learning for the Recognition of Handwritten Digits, and its Application to Increase the Outreach of Voice-based Communication Platforms
by: Pant, Devesh, et al.
Published: (2025)
by: Pant, Devesh, et al.
Published: (2025)
Hydra-Bench: A Benchmark for Multi-Modal Leaf Wetness Sensing
by: Liu, Yimeng, et al.
Published: (2025)
by: Liu, Yimeng, et al.
Published: (2025)
Beyond CNNs: Efficient Fine-Tuning of Multi-Modal LLMs for Object Detection on Low-Data Regimes
by: Elamon, Nirmal, et al.
Published: (2025)
by: Elamon, Nirmal, et al.
Published: (2025)
PLATTER: A Page-Level Handwritten Text Recognition System for Indic Scripts
by: Kasuba, Badri Vishal, et al.
Published: (2025)
by: Kasuba, Badri Vishal, et al.
Published: (2025)
Multi-Modal Foundation Models for Computational Pathology: A Survey
by: Li, Dong, et al.
Published: (2025)
by: Li, Dong, et al.
Published: (2025)
MMPB: It's Time for Multi-Modal Personalization
by: Kim, Jaeik, et al.
Published: (2025)
by: Kim, Jaeik, et al.
Published: (2025)
ProReason: Multi-Modal Proactive Reasoning with Decoupled Eyesight and Wisdom
by: Zhou, Jingqi, et al.
Published: (2024)
by: Zhou, Jingqi, et al.
Published: (2024)
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
by: Zhong, Yiwu, et al.
Published: (2024)
by: Zhong, Yiwu, et al.
Published: (2024)
PAD: Phase-Amplitude Decoupling Fusion for Multi-Modal Land Cover Classification
by: Zheng, Huiling, et al.
Published: (2025)
by: Zheng, Huiling, et al.
Published: (2025)
M3D-BFS: a Multi-stage Dynamic Fusion Strategy for Sample-Adaptive Multi-Modal Brain Network Analysis
by: Dong, Rui, et al.
Published: (2026)
by: Dong, Rui, et al.
Published: (2026)
MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval
by: Xu, Mingjun, et al.
Published: (2025)
by: Xu, Mingjun, et al.
Published: (2025)
Hydra: Accurate Multi-Modal Leaf Wetness Sensing with mm-Wave and Camera Fusion
by: Liu, Yimeng, et al.
Published: (2025)
by: Liu, Yimeng, et al.
Published: (2025)
Towards Scalable Training for Handwritten Mathematical Expression Recognition
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities
by: Dong, Hao, et al.
Published: (2024)
by: Dong, Hao, et al.
Published: (2024)
MMHMER:Multi-viewer and Multi-task for Handwritten Mathematical Expression Recognition
by: Chen, Kehua, et al.
Published: (2025)
by: Chen, Kehua, et al.
Published: (2025)
μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context
by: Quattrini, Fabio, et al.
Published: (2024)
by: Quattrini, Fabio, et al.
Published: (2024)
Multi-Modal interpretable automatic video captioning
by: Hanna-Asaad, Antoine, et al.
Published: (2024)
by: Hanna-Asaad, Antoine, et al.
Published: (2024)
Learning Progressive Adaptation for Multi-Modal Tracking
by: Wang, He, et al.
Published: (2026)
by: Wang, He, et al.
Published: (2026)
A Unified Model for Longitudinal Multi-Modal Multi-View Prediction with Missingness
by: Chen, Boqi, et al.
Published: (2024)
by: Chen, Boqi, et al.
Published: (2024)
DPO Learning with LLMs-Judge Signal for Computer Use Agents
by: Luo, Man, et al.
Published: (2025)
by: Luo, Man, et al.
Published: (2025)
VATr++: Choose Your Words Wisely for Handwritten Text Generation
by: Vanherle, Bram, et al.
Published: (2024)
by: Vanherle, Bram, et al.
Published: (2024)
Can AI Assistance Aid in the Grading of Handwritten Answer Sheets?
by: Sil, Pritam, et al.
Published: (2024)
by: Sil, Pritam, et al.
Published: (2024)
MMXU: A Multi-Modal and Multi-X-ray Understanding Dataset for Disease Progression
by: Mu, Linjie, et al.
Published: (2025)
by: Mu, Linjie, et al.
Published: (2025)
Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations
by: Kim, Jeonghyeon, et al.
Published: (2025)
by: Kim, Jeonghyeon, et al.
Published: (2025)
MammothModa: Multi-Modal Large Language Model
by: She, Qi, et al.
Published: (2024)
by: She, Qi, et al.
Published: (2024)
Multi-Prompt with Depth Partitioned Cross-Modal Learning
by: Tian, Yingjie, et al.
Published: (2023)
by: Tian, Yingjie, et al.
Published: (2023)
Unveiling Ontological Commitment in Multi-Modal Foundation Models
by: Keser, Mert, et al.
Published: (2024)
by: Keser, Mert, et al.
Published: (2024)
A Generalized Multi-Modal Fusion Detection Framework
by: Cui, Leichao, et al.
Published: (2023)
by: Cui, Leichao, et al.
Published: (2023)
AVIR: Adaptive Visual In-Document Retrieval for Efficient Multi-Page Document Question Answering
by: Li, Zongmin, et al.
Published: (2026)
by: Li, Zongmin, et al.
Published: (2026)
Multi-modal Generative AI: Multi-modal LLMs, Diffusions, and the Unification
by: Wang, Xin, et al.
Published: (2024)
by: Wang, Xin, et al.
Published: (2024)
Similar Items
-
SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement
by: Jain, Chelsi, et al.
Published: (2025) -
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
by: Ai, Jiaxin, et al.
Published: (2025) -
InkFM: A Foundational Model for Full-Page Online Handwritten Note Understanding
by: Fadeeva, Anastasiia, et al.
Published: (2025) -
Robustness Evaluation of OCR-based Visual Document Understanding under Multi-Modal Adversarial Attacks
by: Tien, Dong Nguyen, et al.
Published: (2025) -
Seeing the Big Picture: Evaluating Multimodal LLMs' Ability to Interpret and Grade Handwritten Student Work
by: Henkel, Owen, et al.
Published: (2025)