:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Henkel, Owen, Roberts, Bill, Jaffe, Doug, Holt, Laurence
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.05538
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Can AI Assistance Aid in the Grading of Handwritten Answer Sheets?
by: Sil, Pritam, et al.
Published: (2024)

Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability to Mark Short Answer Questions in K-12 Education
by: Henkel, Owen, et al.
Published: (2024)

Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning
by: Guo, Xingang, et al.
Published: (2025)

EDU-CIRCUIT-HW: Evaluating Multimodal Large Language Models on Real-World University-Level STEM Student Handwritten Solutions
by: Sun, Weiyu, et al.
Published: (2026)

Do MLLMs Really See It: Reinforcing Visual Attention in Multimodal LLMs
by: Ou, Siqu, et al.
Published: (2026)

Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs
by: Wang, Wei-Yao, et al.
Published: (2025)

Handwritten Text Recognition: A Survey
by: Garrido-Munoz, Carlos, et al.
Published: (2025)

Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs
by: Roberts, Jonathan, et al.
Published: (2023)

Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs
by: Wang, Wenxuan, et al.
Published: (2025)

Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models
by: Nigam, Shubham Kumar, et al.
Published: (2025)

See Further, Think Deeper: Advancing VLM's Reasoning Ability with Low-level Visual Cues and Reflection
by: Wu, Zhiheng, et al.
Published: (2026)

Hallucination Behavior in Multimodal LLMs Across Agricultural Image Interpretation and Generation Tasks
by: Ghose, Partho, et al.
Published: (2026)

Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization
by: Lai, Zhengzhao, et al.
Published: (2025)

Seeing It or Not? Interpretable Vision-aware Latent Steering to Mitigate Object Hallucinations
by: Chen, Boxu, et al.
Published: (2025)

User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
by: Verghese, Mrinal, et al.
Published: (2024)

Towards Scalable Training for Handwritten Mathematical Expression Recognition
by: Li, Haoyang, et al.
Published: (2025)

Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models
by: Fu, Bin, et al.
Published: (2024)

Multimodal Language Models See Better When They Look Shallower
by: Chen, Haoran, et al.
Published: (2025)

Enhancing Interpretability of Vertebrae Fracture Grading using Human-interpretable Prototypes
by: Sinhamahapatra, Poulami, et al.
Published: (2024)

VATr++: Choose Your Words Wisely for Handwritten Text Generation
by: Vanherle, Bram, et al.
Published: (2024)

LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
by: Zhang, Ruiyi, et al.
Published: (2024)

When VLMs 'Fix' Students: Identifying and Penalizing Over-Correction in the Evaluation of Multi-line Handwritten Math OCR
by: Seong, Jin, et al.
Published: (2026)

The Expense of Seeing: Attaining Trustworthy Multimodal Reasoning Within the Monolithic Paradigm
by: Goyal, Karan
Published: (2026)

Can Multimodal LLMs See Science Instruction? Benchmarking Pedagogical Reasoning in K-12 Classroom Videos
by: Shen, Yixuan, et al.
Published: (2026)

Beyond Diagnosis: Evaluating Multimodal LLMs for Pathology Localization in Chest Radiographs
by: Gosai, Advait, et al.
Published: (2025)

Evaluating the Impact of Post-Training Quantization on Reliable VQA with Multimodal LLMs
by: Kurz, Paul Jonas, et al.
Published: (2026)

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
by: Peng, Tianhao, et al.
Published: (2025)

WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
by: Hong, Jack, et al.
Published: (2025)

Mask & Match: Learning to Recognize Handwritten Math with Self-Supervised Attention
by: Mitra, Shree, et al.
Published: (2025)

An End-to-End, Segmentation-Free, Arabic Handwritten Recognition Model on KHATT
by: Aabed, Sondos, et al.
Published: (2024)

Bidirectional Trained Tree-Structured Decoder for Handwritten Mathematical Expression Recognition
by: Cheng, Hanbo, et al.
Published: (2023)

LocateBench: Evaluating the Locating Ability of Vision Language Models
by: Chiang, Ting-Rui, et al.
Published: (2024)

See What You Are Told: Visual Attention Sink in Large Multimodal Models
by: Kang, Seil, et al.
Published: (2025)

See&Trek: Training-Free Spatial Prompting for Multimodal Large Language Model
by: Li, Pengteng, et al.
Published: (2025)

Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math
by: Song, Dingjie, et al.
Published: (2026)

Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation
by: Wang, Xinkun, et al.
Published: (2025)

Can LLMs Grade Short-Answer Reading Comprehension Questions : An Empirical Study with a Novel Dataset
by: Henkel, Owen, et al.
Published: (2023)

Learning to Align: Addressing Character Frequency Distribution Shifts in Handwritten Text Recognition
by: Kaliosis, Panagiotis, et al.
Published: (2025)

Online Handwritten Signature Verification Based on Temporal-Spatial Graph Attention Transformer
by: Yuan, Hai-jie, et al.
Published: (2025)

Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription
by: Gutteridge, Benjamin, et al.
Published: (2025)