Saved in:
| Main Authors: | Horn, Pius, Keuper, Janis |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.18652 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Benchmarking Document Parsers on Mathematical Formula Extraction from PDFs
by: Horn, Pius, et al.
Published: (2025)
by: Horn, Pius, et al.
Published: (2025)
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
by: Ouyang, Linke, et al.
Published: (2024)
by: Ouyang, Linke, et al.
Published: (2024)
TalentMine: LLM-Based Extraction and Question-Answering from Multimodal Talent Tables
by: Mannam, Varun, et al.
Published: (2025)
by: Mannam, Varun, et al.
Published: (2025)
Beyond Unimodal Boundaries: Generative Recommendation with Multimodal Semantics
by: Zhu, Jing, et al.
Published: (2025)
by: Zhu, Jing, et al.
Published: (2025)
PhotoBench: Beyond Visual Matching Towards Personalized Intent-Driven Photo Retrieval
by: Xu, Tianyi, et al.
Published: (2026)
by: Xu, Tianyi, et al.
Published: (2026)
Beyond Final Answers: CRYSTAL Benchmark for Transparent Multimodal Reasoning Evaluation
by: Barrios, Wayner, et al.
Published: (2026)
by: Barrios, Wayner, et al.
Published: (2026)
NovaLAD: A Fast, CPU-Optimized Document Extraction Pipeline for Generative AI and Data Intelligence
by: Ulla, Aman
Published: (2026)
by: Ulla, Aman
Published: (2026)
HotelMatch-LLM: Joint Multi-Task Training of Small and Large Language Models for Efficient Multimodal Hotel Retrieval
by: Askari, Arian, et al.
Published: (2025)
by: Askari, Arian, et al.
Published: (2025)
RE-TRIANGLE: Does TRIANGLE Enable Multimodal Alignment Beyond Cosine Similarity in Retrieval?
by: Ghosh, Arijit, et al.
Published: (2026)
by: Ghosh, Arijit, et al.
Published: (2026)
Leveraging Customer Feedback for Multi-modal Insight Extraction
by: Mukku, Sandeep Sricharan, et al.
Published: (2024)
by: Mukku, Sandeep Sricharan, et al.
Published: (2024)
EfficientPosterGen: Semantic-aware Efficient Poster Generation via Token Compression and Accurate Violation Detection
by: Tang, Wenxin, et al.
Published: (2026)
by: Tang, Wenxin, et al.
Published: (2026)
RAPTOR: Refined Approach for Product Table Object Recognition
by: Thomas, Eliott, et al.
Published: (2025)
by: Thomas, Eliott, et al.
Published: (2025)
INQUIRE: A Natural World Text-to-Image Retrieval Benchmark
by: Vendrow, Edward, et al.
Published: (2024)
by: Vendrow, Edward, et al.
Published: (2024)
Learning Visual Composition through Improved Semantic Guidance
by: Stone, Austin, et al.
Published: (2024)
by: Stone, Austin, et al.
Published: (2024)
ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from Unstructured Financial Documents
by: Pala, Furkan, et al.
Published: (2024)
by: Pala, Furkan, et al.
Published: (2024)
Character-based Outfit Generation with Vision-augmented Style Extraction via LLMs
by: Forouzandehmehr, Najmeh, et al.
Published: (2024)
by: Forouzandehmehr, Najmeh, et al.
Published: (2024)
Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
by: Wen, Tiansheng, et al.
Published: (2025)
by: Wen, Tiansheng, et al.
Published: (2025)
Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond
by: Li, Yongqi, et al.
Published: (2024)
by: Li, Yongqi, et al.
Published: (2024)
EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM
by: Zou, Henry Peng, et al.
Published: (2024)
by: Zou, Henry Peng, et al.
Published: (2024)
A Picture is Worth a Thousand Words? An Empirical Study of Aggregation Strategies for Visual Financial Document Retrieval
by: Lim, Ho Hung, et al.
Published: (2026)
by: Lim, Ho Hung, et al.
Published: (2026)
Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
by: Cui, Cheng, et al.
Published: (2026)
by: Cui, Cheng, et al.
Published: (2026)
From Historical Tabular Image to Knowledge Graphs: A Provenance-Aware Modular Pipeline
by: Shoilee, Sarah Binta Alam, et al.
Published: (2026)
by: Shoilee, Sarah Binta Alam, et al.
Published: (2026)
Pailitao-VL: Unified Embedding and Reranker for Real-Time Multi-Modal Industrial Search
by: Chen, Lei, et al.
Published: (2026)
by: Chen, Lei, et al.
Published: (2026)
Your Embedding Model is SMARTer Than You Think
by: Zhang, Jianrui, et al.
Published: (2026)
by: Zhang, Jianrui, et al.
Published: (2026)
Efficient Logic Gate Networks for Video Copy Detection
by: Fojcik, Katarzyna
Published: (2026)
by: Fojcik, Katarzyna
Published: (2026)
Hierarchical Long Video Understanding with Audiovisual Entity Cohesion and Agentic Search
by: Yin, Xinlei, et al.
Published: (2026)
by: Yin, Xinlei, et al.
Published: (2026)
Good Scores, Bad Data: A Metric for Multimodal Coherence
by: Srinivasan, Vasundra
Published: (2026)
by: Srinivasan, Vasundra
Published: (2026)
ITSELF: Attention Guided Fine-Grained Alignment for Vision-Language Retrieval
by: Nguyen, Tien-Huy, et al.
Published: (2026)
by: Nguyen, Tien-Huy, et al.
Published: (2026)
Multi-task Cross-modal Learning for Chest X-ray Image Retrieval
by: Liang, Zhaohui, et al.
Published: (2026)
by: Liang, Zhaohui, et al.
Published: (2026)
PEARL: Personalized Streaming Video Understanding Model
by: Zheng, Yuanhong, et al.
Published: (2026)
by: Zheng, Yuanhong, et al.
Published: (2026)
A comprehensive multimodal dataset and benchmark for ulcerative colitis scoring in endoscopy
by: Ghatwary, Noha, et al.
Published: (2026)
by: Ghatwary, Noha, et al.
Published: (2026)
CRISP -- Clustering-Based Redundancy-Reduced Instance Sampling for Pathology Case Representation and Retrieval
by: Afzal, Zahra Rahimi, et al.
Published: (2026)
by: Afzal, Zahra Rahimi, et al.
Published: (2026)
Retrieval-Guided Generation for Safer Histopathology Image Captioning
by: Hoq, Md. Enamul, et al.
Published: (2026)
by: Hoq, Md. Enamul, et al.
Published: (2026)
Open-SAT: LLM-Guided Query Embedding Refinement for Open-Vocabulary Object Retrieval in Satellite Imagery
by: Arefeen, Md Adnan, et al.
Published: (2026)
by: Arefeen, Md Adnan, et al.
Published: (2026)
Benchmark Granularity and Model Robustness for Image-Text Retrieval
by: Hendriksen, Mariya, et al.
Published: (2024)
by: Hendriksen, Mariya, et al.
Published: (2024)
Smart Routing for Multimodal Video Retrieval: When to Search What
by: Rosa, Kevin Dela
Published: (2025)
by: Rosa, Kevin Dela
Published: (2025)
Attribute-Aware Implicit Modality Alignment for Text Attribute Person Search
by: Wang, Xin, et al.
Published: (2024)
by: Wang, Xin, et al.
Published: (2024)
Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval
by: Wang, Jinpeng, et al.
Published: (2024)
by: Wang, Jinpeng, et al.
Published: (2024)
Low-Data Classification of Historical Music Manuscripts: A Few-Shot Learning Approach
by: Shatri, Elona, et al.
Published: (2024)
by: Shatri, Elona, et al.
Published: (2024)
Video Enriched Retrieval Augmented Generation Using Aligned Video Captions
by: Rosa, Kevin Dela
Published: (2024)
by: Rosa, Kevin Dela
Published: (2024)
Similar Items
-
Benchmarking Document Parsers on Mathematical Formula Extraction from PDFs
by: Horn, Pius, et al.
Published: (2025) -
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
by: Ouyang, Linke, et al.
Published: (2024) -
TalentMine: LLM-Based Extraction and Question-Answering from Multimodal Talent Tables
by: Mannam, Varun, et al.
Published: (2025) -
Beyond Unimodal Boundaries: Generative Recommendation with Multimodal Semantics
by: Zhu, Jing, et al.
Published: (2025) -
PhotoBench: Beyond Visual Matching Towards Personalized Intent-Driven Photo Retrieval
by: Xu, Tianyi, et al.
Published: (2026)