Saved in:
| Main Authors: | Wei, Haoran, Sun, Yaofeng, Li, Yukun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.20552 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DeepSeek-OCR: Contexts Optical Compression
by: Wei, Haoran, et al.
Published: (2025)
by: Wei, Haoran, et al.
Published: (2025)
Visual Merit or Linguistic Crutch? A Close Look at DeepSeek-OCR
by: Liang, Yunhao, et al.
Published: (2026)
by: Liang, Yunhao, et al.
Published: (2026)
Fine-tuning DeepSeek-OCR-2 for Molecular Structure Recognition
by: Tang, Haocheng, et al.
Published: (2026)
by: Tang, Haocheng, et al.
Published: (2026)
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
by: Wu, Zhiyu, et al.
Published: (2024)
by: Wu, Zhiyu, et al.
Published: (2024)
DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities
by: Islam, Chashi Mahiul, et al.
Published: (2025)
by: Islam, Chashi Mahiul, et al.
Published: (2025)
Can LLMs Assist Computer Education? an Empirical Case Study of DeepSeek
by: Xiao, Dongfu, et al.
Published: (2025)
by: Xiao, Dongfu, et al.
Published: (2025)
Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek
by: Li, Xueyang, et al.
Published: (2025)
by: Li, Xueyang, et al.
Published: (2025)
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
by: Wei, Haoran, et al.
Published: (2024)
by: Wei, Haoran, et al.
Published: (2024)
From ChatGPT to DeepSeek AI: A Comprehensive Analysis of Evolution, Deviation, and Future Implications in AI-Language Models
by: Singh, Simrandeep, et al.
Published: (2025)
by: Singh, Simrandeep, et al.
Published: (2025)
MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models
by: Fan, Xiaoran, et al.
Published: (2026)
by: Fan, Xiaoran, et al.
Published: (2026)
Can DeepSeek Reason Like a Surgeon? An Empirical Evaluation for Vision-Language Understanding in Robotic-Assisted Surgery
by: Ma, Boyi, et al.
Published: (2025)
by: Ma, Boyi, et al.
Published: (2025)
DeepSeek-Inspired Exploration of RL-based LLMs and Synergy with Wireless Networks: A Survey
by: Qiao, Yu, et al.
Published: (2025)
by: Qiao, Yu, et al.
Published: (2025)
Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow
by: Liu, Ruyang, et al.
Published: (2025)
by: Liu, Ruyang, et al.
Published: (2025)
Comparative Analysis of OpenAI GPT-4o and DeepSeek R1 for Scientific Text Categorization Using Prompt Engineering
by: Maiti, Aniruddha, et al.
Published: (2025)
by: Maiti, Aniruddha, et al.
Published: (2025)
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation
by: Su, Yaofeng, et al.
Published: (2026)
by: Su, Yaofeng, et al.
Published: (2026)
When Good OCR Is Not Enough: Benchmarking OCR Robustness for Retrieval-Augmented Generation
by: Sun, Lin, et al.
Published: (2026)
by: Sun, Lin, et al.
Published: (2026)
olmOCR 2: Unit Test Rewards for Document OCR
by: Poznanski, Jake, et al.
Published: (2025)
by: Poznanski, Jake, et al.
Published: (2025)
Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding
by: Park, Jaeyoo, et al.
Published: (2024)
by: Park, Jaeyoo, et al.
Published: (2024)
OCR-Agent: Agentic OCR with Capability and Memory Reflection
by: Wen, Shimin, et al.
Published: (2026)
by: Wen, Shimin, et al.
Published: (2026)
OmniOCR: Generalist OCR for Ethnic Minority Languages
by: Liu, Bonan, et al.
Published: (2026)
by: Liu, Bonan, et al.
Published: (2026)
Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models
by: Xu, Longwei, et al.
Published: (2026)
by: Xu, Longwei, et al.
Published: (2026)
KG-ViP: Bridging Knowledge Grounding and Visual Perception in Multi-modal LLMs for Visual Question Answering
by: Li, Zhiyang, et al.
Published: (2026)
by: Li, Zhiyang, et al.
Published: (2026)
FinCriticalED: A Visual Benchmark for Financial Fact-Level OCR
by: He, Yueru, et al.
Published: (2025)
by: He, Yueru, et al.
Published: (2025)
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
by: Zhang, Junyuan, et al.
Published: (2024)
by: Zhang, Junyuan, et al.
Published: (2024)
ABot-OCR Technical Report
by: Jiang, Kaitao, et al.
Published: (2026)
by: Jiang, Kaitao, et al.
Published: (2026)
EFF-Grasp: Energy-Field Flow Matching for Physics-Aware Dexterous Grasp Generation
by: Zhao, Yukun, et al.
Published: (2026)
by: Zhao, Yukun, et al.
Published: (2026)
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios
by: Shi, Yang, et al.
Published: (2025)
by: Shi, Yang, et al.
Published: (2025)
Seeking and Updating with Live Visual Knowledge
by: Fu, Mingyang, et al.
Published: (2025)
by: Fu, Mingyang, et al.
Published: (2025)
Ocean-OCR: Towards General OCR Application via a Vision-Language Model
by: Chen, Song, et al.
Published: (2025)
by: Chen, Song, et al.
Published: (2025)
Context-Independent OCR with Multimodal LLMs: Effects of Image Resolution and Visual Complexity
by: Inoue, Kotaro
Published: (2025)
by: Inoue, Kotaro
Published: (2025)
DEVICE: Depth and Visual Concepts Aware Transformer for OCR-based Image Captioning
by: Xu, Dongsheng, et al.
Published: (2023)
by: Xu, Dongsheng, et al.
Published: (2023)
FireRed-OCR Technical Report
by: Wu, Hao, et al.
Published: (2026)
by: Wu, Hao, et al.
Published: (2026)
Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering
by: Shen, Zhixuan, et al.
Published: (2024)
by: Shen, Zhixuan, et al.
Published: (2024)
Agentar-Fin-OCR
by: Qian, Siyi, et al.
Published: (2026)
by: Qian, Siyi, et al.
Published: (2026)
CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy
by: Yang, Zhibo, et al.
Published: (2024)
by: Yang, Zhibo, et al.
Published: (2024)
Causal Transformer for Fusion and Pose Estimation in Deep Visual Inertial Odometry
by: Kurt, Yunus Bilge, et al.
Published: (2024)
by: Kurt, Yunus Bilge, et al.
Published: (2024)
An Automated Deep Segmentation and Spatial-Statistics Approach for Post-Blast Rock Fragmentation Assessment
by: Yang, Yukun
Published: (2025)
by: Yang, Yukun
Published: (2025)
The Devil is in the Details -- From OCR for Old Church Slavonic to Purely Visual Stemma Reconstruction
by: Hoenen, Armin
Published: (2026)
by: Hoenen, Armin
Published: (2026)
Counterfeit Answers: Adversarial Forgery against OCR-Free Document Visual Question Answering
by: Pintore, Marco, et al.
Published: (2025)
by: Pintore, Marco, et al.
Published: (2025)
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
by: Sargent, Kyle, et al.
Published: (2025)
by: Sargent, Kyle, et al.
Published: (2025)
Similar Items
-
DeepSeek-OCR: Contexts Optical Compression
by: Wei, Haoran, et al.
Published: (2025) -
Visual Merit or Linguistic Crutch? A Close Look at DeepSeek-OCR
by: Liang, Yunhao, et al.
Published: (2026) -
Fine-tuning DeepSeek-OCR-2 for Molecular Structure Recognition
by: Tang, Haocheng, et al.
Published: (2026) -
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
by: Wu, Zhiyu, et al.
Published: (2024) -
DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities
by: Islam, Chashi Mahiul, et al.
Published: (2025)