Saved in:
| Main Authors: | Steinberg, Jonathan, Gal, Oren |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.22918 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Typhoon OCR: Open Vision-Language Model For Thai Document Extraction
by: Nonesung, Surapon, et al.
Published: (2026)
by: Nonesung, Surapon, et al.
Published: (2026)
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models
by: Poznanski, Jake, et al.
Published: (2025)
by: Poznanski, Jake, et al.
Published: (2025)
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding
by: Shi, Yuling, et al.
Published: (2026)
by: Shi, Yuling, et al.
Published: (2026)
Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR
by: Hennara, Khalil, et al.
Published: (2025)
by: Hennara, Khalil, et al.
Published: (2025)
Efficient Inference for Large Vision-Language Models: Bottlenecks, Techniques, and Prospects
by: Zhang, Jun, et al.
Published: (2026)
by: Zhang, Jun, et al.
Published: (2026)
Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning
by: Yu, Haiyang, et al.
Published: (2025)
by: Yu, Haiyang, et al.
Published: (2025)
Diagnosing Bottlenecks in Data Visualization Understanding by Vision-Language Models
by: Tartaglini, Alexa R., et al.
Published: (2025)
by: Tartaglini, Alexa R., et al.
Published: (2025)
Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models
by: Ding, Yi, et al.
Published: (2025)
by: Ding, Yi, et al.
Published: (2025)
Where Do Self-Supervised Speech Models Become Unfair?
by: Herron, Felix, et al.
Published: (2026)
by: Herron, Felix, et al.
Published: (2026)
MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents
by: Steinberg, Jonathan, et al.
Published: (2026)
by: Steinberg, Jonathan, et al.
Published: (2026)
Semantic Denial of Service in LLM-controlled robots
by: Steinberg, Jonathan, et al.
Published: (2026)
by: Steinberg, Jonathan, et al.
Published: (2026)
Adaptive Vision-Language Model Routing for Computer Use Agents
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
The Illusion-Illusion: Vision Language Models See Illusions Where There are None
by: Ullman, Tomer
Published: (2024)
by: Ullman, Tomer
Published: (2024)
Where do Large Vision-Language Models Look at when Answering Questions?
by: Xing, Xiaoying, et al.
Published: (2025)
by: Xing, Xiaoying, et al.
Published: (2025)
Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models
by: Nigam, Shubham Kumar, et al.
Published: (2025)
by: Nigam, Shubham Kumar, et al.
Published: (2025)
GutenOCR: A Grounded Vision-Language Front-End for Documents
by: Heidenreich, Hunter, et al.
Published: (2026)
by: Heidenreich, Hunter, et al.
Published: (2026)
Text Prompt Injection of Vision Language Models
by: Zhu, Ruizhe
Published: (2025)
by: Zhu, Ruizhe
Published: (2025)
Probing the Prompt KV Cache: Where It Becomes Dispensable
by: Kumar, Vinayshekhar Bannihatti, et al.
Published: (2026)
by: Kumar, Vinayshekhar Bannihatti, et al.
Published: (2026)
Improving OCR for Historical Texts of Multiple Languages
by: Westerdijk, Hylke, et al.
Published: (2025)
by: Westerdijk, Hylke, et al.
Published: (2025)
Scrambled text: training Language Models to correct OCR errors using synthetic data
by: Bourne, Jonathan
Published: (2024)
by: Bourne, Jonathan
Published: (2024)
Reading or Guessing? Visual Grounding Failures of Vision-Language Models for OCR in Ancient Greek Editions
by: Karamolegkou, Antonia, et al.
Published: (2026)
by: Karamolegkou, Antonia, et al.
Published: (2026)
CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models
by: Bourne, Jonathan
Published: (2024)
by: Bourne, Jonathan
Published: (2024)
VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?
by: Zhao, Hongbo, et al.
Published: (2025)
by: Zhao, Hongbo, et al.
Published: (2025)
Can Vision Replace Text in Working Memory? Evidence from Spatial n-Back in Vision-Language Models
by: Liang, Sichu, et al.
Published: (2026)
by: Liang, Sichu, et al.
Published: (2026)
VL-RouterBench: A Benchmark for Vision-Language Model Routing
by: Huang, Zhehao, et al.
Published: (2025)
by: Huang, Zhehao, et al.
Published: (2025)
Superscopes: Amplifying Internal Feature Representations for Language Model Interpretation
by: Jacobi, Jonathan, et al.
Published: (2025)
by: Jacobi, Jonathan, et al.
Published: (2025)
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
by: Deng, Ailin, et al.
Published: (2025)
by: Deng, Ailin, et al.
Published: (2025)
Arabic-Nougat: Fine-Tuning Vision Transformers for Arabic OCR and Markdown Extraction
by: Rashad, Mohamed
Published: (2024)
by: Rashad, Mohamed
Published: (2024)
From Language To Vision: A Case Study of Text Animation
by: Chen, Ping, et al.
Published: (2025)
by: Chen, Ping, et al.
Published: (2025)
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
by: Jia, Mengzhao, et al.
Published: (2024)
by: Jia, Mengzhao, et al.
Published: (2024)
Mostly Text, Smart Visuals: Asymmetric Text-Visual Pruning for Large Vision-Language Models
by: Li, Sijie, et al.
Published: (2026)
by: Li, Sijie, et al.
Published: (2026)
How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?
by: Lee, Seongyun, et al.
Published: (2024)
by: Lee, Seongyun, et al.
Published: (2024)
Can Vision Language Models Understand Mimed Actions?
by: Cho, Hyundong, et al.
Published: (2025)
by: Cho, Hyundong, et al.
Published: (2025)
Learning to Read Where to Look: Disease-Aware Vision-Language Pretraining for 3D CT
by: Ging, Simon, et al.
Published: (2026)
by: Ging, Simon, et al.
Published: (2026)
Vision Language Models Map Logos to Text via Semantic Entanglement in the Visual Projector
by: Li, Sifan, et al.
Published: (2025)
by: Li, Sifan, et al.
Published: (2025)
Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation
by: Lee, Seongyun, et al.
Published: (2024)
by: Lee, Seongyun, et al.
Published: (2024)
VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models
by: Zhou, Chenyu, et al.
Published: (2024)
by: Zhou, Chenyu, et al.
Published: (2024)
Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality
by: Calderon, Nitay, et al.
Published: (2026)
by: Calderon, Nitay, et al.
Published: (2026)
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support
by: Wu, Xiaojun, et al.
Published: (2024)
by: Wu, Xiaojun, et al.
Published: (2024)
Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models
by: Yamabe, Shojiro, et al.
Published: (2025)
by: Yamabe, Shojiro, et al.
Published: (2025)
Similar Items
-
Typhoon OCR: Open Vision-Language Model For Thai Document Extraction
by: Nonesung, Surapon, et al.
Published: (2026) -
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models
by: Poznanski, Jake, et al.
Published: (2025) -
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding
by: Shi, Yuling, et al.
Published: (2026) -
Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR
by: Hennara, Khalil, et al.
Published: (2025) -
Efficient Inference for Large Vision-Language Models: Bottlenecks, Techniques, and Prospects
by: Zhang, Jun, et al.
Published: (2026)