Saved in:
| Main Authors: | Lamott, Marcel, Weweler, Yves-Noel, Ulges, Adrian, Shafait, Faisal, Krechel, Dirk, Obradovic, Darko |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.09841 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
by: Wang, Baode, et al.
Published: (2025)
by: Wang, Baode, et al.
Published: (2025)
Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs
by: Lopez-Duran, Miguel, et al.
Published: (2025)
by: Lopez-Duran, Miguel, et al.
Published: (2025)
DocDjinn: Controllable Synthetic Document Generation with VLMs and Handwriting Diffusion
by: Lamott, Marcel, et al.
Published: (2026)
by: Lamott, Marcel, et al.
Published: (2026)
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding
by: Fujitake, Masato
Published: (2024)
by: Fujitake, Masato
Published: (2024)
Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5
by: Lamott, Marcel, et al.
Published: (2024)
by: Lamott, Marcel, et al.
Published: (2024)
Multilingual OCR-Aware Fine-Tuning and Prompt-Guided Chain-of-Thought Reasoning for Multimodal Large Language Models
by: Xu, Qinwu, et al.
Published: (2026)
by: Xu, Qinwu, et al.
Published: (2026)
SharpZO: Hybrid Sharpness-Aware Vision Language Model Prompt Tuning via Forward-Only Passes
by: Yang, Yifan, et al.
Published: (2025)
by: Yang, Yifan, et al.
Published: (2025)
Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models
by: Liu, Xinyang, et al.
Published: (2023)
by: Liu, Xinyang, et al.
Published: (2023)
PromptTA: Prompt-driven Text Adapter for Source-free Domain Generalization
by: Zhang, Haoran, et al.
Published: (2024)
by: Zhang, Haoran, et al.
Published: (2024)
Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation
by: Cho, Jaemin, et al.
Published: (2023)
by: Cho, Jaemin, et al.
Published: (2023)
Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models
by: Nigam, Shubham Kumar, et al.
Published: (2025)
by: Nigam, Shubham Kumar, et al.
Published: (2025)
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction
by: Poesina, Eduard, et al.
Published: (2024)
by: Poesina, Eduard, et al.
Published: (2024)
Prompt as Free Lunch: Enhancing Diversity in Source-Free Cross-domain Few-shot Learning through Semantic-Guided Prompting
by: Zhuo, Linhai, et al.
Published: (2024)
by: Zhuo, Linhai, et al.
Published: (2024)
Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation
by: Lin, Ci-Siang, et al.
Published: (2024)
by: Lin, Ci-Siang, et al.
Published: (2024)
Answering Questions in Stages: Prompt Chaining for Contract QA
by: Roegiest, Adam, et al.
Published: (2024)
by: Roegiest, Adam, et al.
Published: (2024)
IPO: Interpretable Prompt Optimization for Vision-Language Models
by: Du, Yingjun, et al.
Published: (2024)
by: Du, Yingjun, et al.
Published: (2024)
Sensitivity of Generative VLMs to Semantically and Lexically Altered Prompts
by: Dumpala, Sri Harsha, et al.
Published: (2024)
by: Dumpala, Sri Harsha, et al.
Published: (2024)
MoPD: Mixture-of-Prompts Distillation for Vision-Language Models
by: Chen, Yang, et al.
Published: (2024)
by: Chen, Yang, et al.
Published: (2024)
VPO: Aligning Text-to-Video Generation Models with Prompt Optimization
by: Cheng, Jiale, et al.
Published: (2025)
by: Cheng, Jiale, et al.
Published: (2025)
Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering
by: Beliaev, Mark, et al.
Published: (2025)
by: Beliaev, Mark, et al.
Published: (2025)
Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
by: Jie, Shibo, et al.
Published: (2024)
by: Jie, Shibo, et al.
Published: (2024)
One Category One Prompt: Dataset Distillation using Diffusion Models
by: Abbasi, Ali, et al.
Published: (2024)
by: Abbasi, Ali, et al.
Published: (2024)
Context-Aware Multimodal Pretraining
by: Roth, Karsten, et al.
Published: (2024)
by: Roth, Karsten, et al.
Published: (2024)
Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
by: Luo, Jun, et al.
Published: (2024)
by: Luo, Jun, et al.
Published: (2024)
LLM as a Complementary Optimizer to Gradient Descent: A Case Study in Prompt Tuning
by: Guo, Zixian, et al.
Published: (2024)
by: Guo, Zixian, et al.
Published: (2024)
Training-Free Generation of Diverse and High-Fidelity Images via Prompt Semantic Space Optimization
by: Meng, Debin, et al.
Published: (2025)
by: Meng, Debin, et al.
Published: (2025)
VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization
by: Chen, Menglan, et al.
Published: (2025)
by: Chen, Menglan, et al.
Published: (2025)
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
by: Yu, Zhou, et al.
Published: (2023)
by: Yu, Zhou, et al.
Published: (2023)
When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs
by: Khayatan, Pegah, et al.
Published: (2026)
by: Khayatan, Pegah, et al.
Published: (2026)
Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering
by: Zhu, He, et al.
Published: (2024)
by: Zhu, He, et al.
Published: (2024)
Mitigating the Modality Gap: Few-Shot Out-of-Distribution Detection with Multi-modal Prototypes and Image Bias Estimation
by: Wang, Yimu, et al.
Published: (2025)
by: Wang, Yimu, et al.
Published: (2025)
PALP: Prompt Aligned Personalization of Text-to-Image Models
by: Arar, Moab, et al.
Published: (2024)
by: Arar, Moab, et al.
Published: (2024)
Enhancing Post-Training Quantization via Future Activation Awareness
by: Lv, Zheqi, et al.
Published: (2026)
by: Lv, Zheqi, et al.
Published: (2026)
HATFormer: Historic Handwritten Arabic Text Recognition with Transformers
by: Chan, Adrian, et al.
Published: (2024)
by: Chan, Adrian, et al.
Published: (2024)
DocAtlas: Multilingual Document Understanding Across 80+ Languages
by: Heakl, Ahmed, et al.
Published: (2026)
by: Heakl, Ahmed, et al.
Published: (2026)
Robustness of Structured Data Extraction from Perspectively Distorted Documents
by: Nakada, Hyakka, et al.
Published: (2025)
by: Nakada, Hyakka, et al.
Published: (2025)
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
by: Nasiriany, Soroush, et al.
Published: (2024)
by: Nasiriany, Soroush, et al.
Published: (2024)
Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting
by: Hamed, Omar, et al.
Published: (2024)
by: Hamed, Omar, et al.
Published: (2024)
Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models
by: Zhang, Huatian, et al.
Published: (2026)
by: Zhang, Huatian, et al.
Published: (2026)
Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression
by: Du, Yao, et al.
Published: (2026)
by: Du, Yao, et al.
Published: (2026)
Similar Items
-
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
by: Wang, Baode, et al.
Published: (2025) -
Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs
by: Lopez-Duran, Miguel, et al.
Published: (2025) -
DocDjinn: Controllable Synthetic Document Generation with VLMs and Handwriting Diffusion
by: Lamott, Marcel, et al.
Published: (2026) -
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding
by: Fujitake, Masato
Published: (2024) -
Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5
by: Lamott, Marcel, et al.
Published: (2024)