Saved in:
| Main Authors: | Henkel, Owen, Roberts, Bill, Jaffe, Doug, Holt, Laurence |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.05538 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Can AI Assistance Aid in the Grading of Handwritten Answer Sheets?
by: Sil, Pritam, et al.
Published: (2024)
by: Sil, Pritam, et al.
Published: (2024)
Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability to Mark Short Answer Questions in K-12 Education
by: Henkel, Owen, et al.
Published: (2024)
by: Henkel, Owen, et al.
Published: (2024)
Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning
by: Guo, Xingang, et al.
Published: (2025)
by: Guo, Xingang, et al.
Published: (2025)
EDU-CIRCUIT-HW: Evaluating Multimodal Large Language Models on Real-World University-Level STEM Student Handwritten Solutions
by: Sun, Weiyu, et al.
Published: (2026)
by: Sun, Weiyu, et al.
Published: (2026)
Do MLLMs Really See It: Reinforcing Visual Attention in Multimodal LLMs
by: Ou, Siqu, et al.
Published: (2026)
by: Ou, Siqu, et al.
Published: (2026)
Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs
by: Wang, Wei-Yao, et al.
Published: (2025)
by: Wang, Wei-Yao, et al.
Published: (2025)
Handwritten Text Recognition: A Survey
by: Garrido-Munoz, Carlos, et al.
Published: (2025)
by: Garrido-Munoz, Carlos, et al.
Published: (2025)
Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs
by: Roberts, Jonathan, et al.
Published: (2023)
by: Roberts, Jonathan, et al.
Published: (2023)
Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs
by: Wang, Wenxuan, et al.
Published: (2025)
by: Wang, Wenxuan, et al.
Published: (2025)
Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models
by: Nigam, Shubham Kumar, et al.
Published: (2025)
by: Nigam, Shubham Kumar, et al.
Published: (2025)
See Further, Think Deeper: Advancing VLM's Reasoning Ability with Low-level Visual Cues and Reflection
by: Wu, Zhiheng, et al.
Published: (2026)
by: Wu, Zhiheng, et al.
Published: (2026)
Hallucination Behavior in Multimodal LLMs Across Agricultural Image Interpretation and Generation Tasks
by: Ghose, Partho, et al.
Published: (2026)
by: Ghose, Partho, et al.
Published: (2026)
Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization
by: Lai, Zhengzhao, et al.
Published: (2025)
by: Lai, Zhengzhao, et al.
Published: (2025)
Seeing It or Not? Interpretable Vision-aware Latent Steering to Mitigate Object Hallucinations
by: Chen, Boxu, et al.
Published: (2025)
by: Chen, Boxu, et al.
Published: (2025)
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
by: Verghese, Mrinal, et al.
Published: (2024)
by: Verghese, Mrinal, et al.
Published: (2024)
Towards Scalable Training for Handwritten Mathematical Expression Recognition
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models
by: Fu, Bin, et al.
Published: (2024)
by: Fu, Bin, et al.
Published: (2024)
Multimodal Language Models See Better When They Look Shallower
by: Chen, Haoran, et al.
Published: (2025)
by: Chen, Haoran, et al.
Published: (2025)
Enhancing Interpretability of Vertebrae Fracture Grading using Human-interpretable Prototypes
by: Sinhamahapatra, Poulami, et al.
Published: (2024)
by: Sinhamahapatra, Poulami, et al.
Published: (2024)
VATr++: Choose Your Words Wisely for Handwritten Text Generation
by: Vanherle, Bram, et al.
Published: (2024)
by: Vanherle, Bram, et al.
Published: (2024)
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
by: Zhang, Ruiyi, et al.
Published: (2024)
by: Zhang, Ruiyi, et al.
Published: (2024)
When VLMs 'Fix' Students: Identifying and Penalizing Over-Correction in the Evaluation of Multi-line Handwritten Math OCR
by: Seong, Jin, et al.
Published: (2026)
by: Seong, Jin, et al.
Published: (2026)
The Expense of Seeing: Attaining Trustworthy Multimodal Reasoning Within the Monolithic Paradigm
by: Goyal, Karan
Published: (2026)
by: Goyal, Karan
Published: (2026)
Can Multimodal LLMs See Science Instruction? Benchmarking Pedagogical Reasoning in K-12 Classroom Videos
by: Shen, Yixuan, et al.
Published: (2026)
by: Shen, Yixuan, et al.
Published: (2026)
Beyond Diagnosis: Evaluating Multimodal LLMs for Pathology Localization in Chest Radiographs
by: Gosai, Advait, et al.
Published: (2025)
by: Gosai, Advait, et al.
Published: (2025)
Evaluating the Impact of Post-Training Quantization on Reliable VQA with Multimodal LLMs
by: Kurz, Paul Jonas, et al.
Published: (2026)
by: Kurz, Paul Jonas, et al.
Published: (2026)
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
by: Peng, Tianhao, et al.
Published: (2025)
by: Peng, Tianhao, et al.
Published: (2025)
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
by: Hong, Jack, et al.
Published: (2025)
by: Hong, Jack, et al.
Published: (2025)
Mask & Match: Learning to Recognize Handwritten Math with Self-Supervised Attention
by: Mitra, Shree, et al.
Published: (2025)
by: Mitra, Shree, et al.
Published: (2025)
An End-to-End, Segmentation-Free, Arabic Handwritten Recognition Model on KHATT
by: Aabed, Sondos, et al.
Published: (2024)
by: Aabed, Sondos, et al.
Published: (2024)
Bidirectional Trained Tree-Structured Decoder for Handwritten Mathematical Expression Recognition
by: Cheng, Hanbo, et al.
Published: (2023)
by: Cheng, Hanbo, et al.
Published: (2023)
LocateBench: Evaluating the Locating Ability of Vision Language Models
by: Chiang, Ting-Rui, et al.
Published: (2024)
by: Chiang, Ting-Rui, et al.
Published: (2024)
See What You Are Told: Visual Attention Sink in Large Multimodal Models
by: Kang, Seil, et al.
Published: (2025)
by: Kang, Seil, et al.
Published: (2025)
See&Trek: Training-Free Spatial Prompting for Multimodal Large Language Model
by: Li, Pengteng, et al.
Published: (2025)
by: Li, Pengteng, et al.
Published: (2025)
Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math
by: Song, Dingjie, et al.
Published: (2026)
by: Song, Dingjie, et al.
Published: (2026)
Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation
by: Wang, Xinkun, et al.
Published: (2025)
by: Wang, Xinkun, et al.
Published: (2025)
Can LLMs Grade Short-Answer Reading Comprehension Questions : An Empirical Study with a Novel Dataset
by: Henkel, Owen, et al.
Published: (2023)
by: Henkel, Owen, et al.
Published: (2023)
Learning to Align: Addressing Character Frequency Distribution Shifts in Handwritten Text Recognition
by: Kaliosis, Panagiotis, et al.
Published: (2025)
by: Kaliosis, Panagiotis, et al.
Published: (2025)
Online Handwritten Signature Verification Based on Temporal-Spatial Graph Attention Transformer
by: Yuan, Hai-jie, et al.
Published: (2025)
by: Yuan, Hai-jie, et al.
Published: (2025)
Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription
by: Gutteridge, Benjamin, et al.
Published: (2025)
by: Gutteridge, Benjamin, et al.
Published: (2025)
Similar Items
-
Can AI Assistance Aid in the Grading of Handwritten Answer Sheets?
by: Sil, Pritam, et al.
Published: (2024) -
Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability to Mark Short Answer Questions in K-12 Education
by: Henkel, Owen, et al.
Published: (2024) -
Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning
by: Guo, Xingang, et al.
Published: (2025) -
EDU-CIRCUIT-HW: Evaluating Multimodal Large Language Models on Real-World University-Level STEM Student Handwritten Solutions
by: Sun, Weiyu, et al.
Published: (2026) -
Do MLLMs Really See It: Reinforcing Visual Attention in Multimodal LLMs
by: Ou, Siqu, et al.
Published: (2026)