:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lamott, Marcel, Weweler, Yves-Noel, Ulges, Adrian, Shafait, Faisal, Krechel, Dirk, Obradovic, Darko
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2402.09841
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
by: Wang, Baode, et al.
Published: (2025)

Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs
by: Lopez-Duran, Miguel, et al.
Published: (2025)

DocDjinn: Controllable Synthetic Document Generation with VLMs and Handwriting Diffusion
by: Lamott, Marcel, et al.
Published: (2026)

LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding
by: Fujitake, Masato
Published: (2024)

Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5
by: Lamott, Marcel, et al.
Published: (2024)

Multilingual OCR-Aware Fine-Tuning and Prompt-Guided Chain-of-Thought Reasoning for Multimodal Large Language Models
by: Xu, Qinwu, et al.
Published: (2026)

SharpZO: Hybrid Sharpness-Aware Vision Language Model Prompt Tuning via Forward-Only Passes
by: Yang, Yifan, et al.
Published: (2025)

Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models
by: Liu, Xinyang, et al.
Published: (2023)

PromptTA: Prompt-driven Text Adapter for Source-free Domain Generalization
by: Zhang, Haoran, et al.
Published: (2024)

Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation
by: Cho, Jaemin, et al.
Published: (2023)

Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models
by: Nigam, Shubham Kumar, et al.
Published: (2025)

PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction
by: Poesina, Eduard, et al.
Published: (2024)

Prompt as Free Lunch: Enhancing Diversity in Source-Free Cross-domain Few-shot Learning through Semantic-Guided Prompting
by: Zhuo, Linhai, et al.
Published: (2024)

Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation
by: Lin, Ci-Siang, et al.
Published: (2024)

Answering Questions in Stages: Prompt Chaining for Contract QA
by: Roegiest, Adam, et al.
Published: (2024)

IPO: Interpretable Prompt Optimization for Vision-Language Models
by: Du, Yingjun, et al.
Published: (2024)

Sensitivity of Generative VLMs to Semantically and Lexically Altered Prompts
by: Dumpala, Sri Harsha, et al.
Published: (2024)

MoPD: Mixture-of-Prompts Distillation for Vision-Language Models
by: Chen, Yang, et al.
Published: (2024)

VPO: Aligning Text-to-Video Generation Models with Prompt Optimization
by: Cheng, Jiale, et al.
Published: (2025)

Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering
by: Beliaev, Mark, et al.
Published: (2025)

Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
by: Jie, Shibo, et al.
Published: (2024)

One Category One Prompt: Dataset Distillation using Diffusion Models
by: Abbasi, Ali, et al.
Published: (2024)

Context-Aware Multimodal Pretraining
by: Roth, Karsten, et al.
Published: (2024)

Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
by: Luo, Jun, et al.
Published: (2024)

LLM as a Complementary Optimizer to Gradient Descent: A Case Study in Prompt Tuning
by: Guo, Zixian, et al.
Published: (2024)

Training-Free Generation of Diverse and High-Fidelity Images via Prompt Semantic Space Optimization
by: Meng, Debin, et al.
Published: (2025)

VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization
by: Chen, Menglan, et al.
Published: (2025)

Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
by: Yu, Zhou, et al.
Published: (2023)

When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs
by: Khayatan, Pegah, et al.
Published: (2026)

Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering
by: Zhu, He, et al.
Published: (2024)

Mitigating the Modality Gap: Few-Shot Out-of-Distribution Detection with Multi-modal Prototypes and Image Bias Estimation
by: Wang, Yimu, et al.
Published: (2025)

PALP: Prompt Aligned Personalization of Text-to-Image Models
by: Arar, Moab, et al.
Published: (2024)

Enhancing Post-Training Quantization via Future Activation Awareness
by: Lv, Zheqi, et al.
Published: (2026)

HATFormer: Historic Handwritten Arabic Text Recognition with Transformers
by: Chan, Adrian, et al.
Published: (2024)

DocAtlas: Multilingual Document Understanding Across 80+ Languages
by: Heakl, Ahmed, et al.
Published: (2026)

Robustness of Structured Data Extraction from Perspectively Distorted Documents
by: Nakada, Hyakka, et al.
Published: (2025)

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
by: Nasiriany, Soroush, et al.
Published: (2024)

Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting
by: Hamed, Omar, et al.
Published: (2024)

Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models
by: Zhang, Huatian, et al.
Published: (2026)

Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression
by: Du, Yao, et al.
Published: (2026)