Saved in:
| Main Authors: | Tilli, Pascal, Mesgar, Mohsen |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.08421 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Bag of Bags: Adaptive Visual Vocabularies for Genizah Join Image Retrieval
by: Gogawale, Sharva, et al.
Published: (2026)
by: Gogawale, Sharva, et al.
Published: (2026)
Explaining Caption-Image Interactions in CLIP Models with Second-Order Attributions
by: Möller, Lucas, et al.
Published: (2024)
by: Möller, Lucas, et al.
Published: (2024)
Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching
by: Choi, Wonseok, et al.
Published: (2025)
by: Choi, Wonseok, et al.
Published: (2025)
HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities
by: Dönmez, Esra, et al.
Published: (2026)
by: Dönmez, Esra, et al.
Published: (2026)
RealCamo: Boosting Real Camouflage Synthesis with Layout Controls and Textual-Visual Guidance
by: Chen, Chunyuan, et al.
Published: (2025)
by: Chen, Chunyuan, et al.
Published: (2025)
Unveil: Unified Visual-Textual Integration and Distillation for Multi-modal Document Retrieval
by: Sun, Hao, et al.
Published: (2026)
by: Sun, Hao, et al.
Published: (2026)
DoPTA: Improving Document Layout Analysis using Patch-Text Alignment
by: SR, Nikitha, et al.
Published: (2024)
by: SR, Nikitha, et al.
Published: (2024)
Fine-grained Motion Retrieval via Joint-Angle Motion Images and Token-Patch Late Interaction
by: Zhang, Yao, et al.
Published: (2026)
by: Zhang, Yao, et al.
Published: (2026)
OmniDocLayout: Towards Diverse Document Layout Generation via Coarse-to-Fine LLM Learning
by: Kang, Hengrui, et al.
Published: (2025)
by: Kang, Hengrui, et al.
Published: (2025)
Beyond the Textual: Generating Coherent Visual Options for MCQs
by: Wang, Wanqiang, et al.
Published: (2025)
by: Wang, Wanqiang, et al.
Published: (2025)
Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
by: Horita, Daichi, et al.
Published: (2023)
by: Horita, Daichi, et al.
Published: (2023)
Patch-Wise Self-Supervised Visual Representation Learning: A Fine-Grained Approach
by: Javidani, Ali, et al.
Published: (2023)
by: Javidani, Ali, et al.
Published: (2023)
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs
by: Yin, Yuanyang, et al.
Published: (2024)
by: Yin, Yuanyang, et al.
Published: (2024)
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
by: Zhao, Zhiyuan, et al.
Published: (2024)
by: Zhao, Zhiyuan, et al.
Published: (2024)
ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training
by: Jiang, Zhouqiang, et al.
Published: (2024)
by: Jiang, Zhouqiang, et al.
Published: (2024)
Visual Late Chunking: An Empirical Study of Contextual Chunking for Efficient Visual Document Retrieval
by: Yan, Yibo, et al.
Published: (2026)
by: Yan, Yibo, et al.
Published: (2026)
LayoutRAG: Retrieval-Augmented Model for Content-agnostic Conditional Layout Generation
by: Wu, Yuxuan, et al.
Published: (2025)
by: Wu, Yuxuan, et al.
Published: (2025)
Visually Guided Generative Text-Layout Pre-training for Document Intelligence
by: Mao, Zhiming, et al.
Published: (2024)
by: Mao, Zhiming, et al.
Published: (2024)
Weakly Supervised Contrastive Learning for Histopathology Patch Embeddings
by: Zhang, Bodong, et al.
Published: (2026)
by: Zhang, Bodong, et al.
Published: (2026)
Semantic Graph Consistency: Going Beyond Patches for Regularizing Self-Supervised Vision Transformers
by: Devaguptapu, Chaitanya, et al.
Published: (2024)
by: Devaguptapu, Chaitanya, et al.
Published: (2024)
LED Benchmark: Diagnosing Structural Layout Errors for Document Layout Analysis
by: Heo, Inbum, et al.
Published: (2025)
by: Heo, Inbum, et al.
Published: (2025)
Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts
by: Li, Gengluo, et al.
Published: (2025)
by: Li, Gengluo, et al.
Published: (2025)
Beyond Global Alignment: Fine-Grained Motion-Language Retrieval via Pyramidal Shapley-Taylor Learning
by: Chen, Hanmo, et al.
Published: (2026)
by: Chen, Hanmo, et al.
Published: (2026)
HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion
by: Su, Junhao, et al.
Published: (2024)
by: Su, Junhao, et al.
Published: (2024)
Patch-Level Kernel Alignment for Dense Self-Supervised Learning
by: Yeo, Juan, et al.
Published: (2025)
by: Yeo, Juan, et al.
Published: (2025)
TableSeq: Unified Generation of Structure, Content, and Layout
by: Hamdi, Laziz, et al.
Published: (2026)
by: Hamdi, Laziz, et al.
Published: (2026)
Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM
by: Wang, Can, et al.
Published: (2024)
by: Wang, Can, et al.
Published: (2024)
PARL: Position-Aware Relation Learning Network for Document Layout Analysis
by: Liu, Fuyuan, et al.
Published: (2026)
by: Liu, Fuyuan, et al.
Published: (2026)
Semi-Supervised 360 Layout Estimation with Panoramic Collaborative Perturbations
by: Zhang, Junsong, et al.
Published: (2025)
by: Zhang, Junsong, et al.
Published: (2025)
PILOT: A Promptable Interleaved Layout-aware OCR Transformer
by: Hamdi, Laziz, et al.
Published: (2025)
by: Hamdi, Laziz, et al.
Published: (2025)
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
by: Luo, Chuwei, et al.
Published: (2024)
by: Luo, Chuwei, et al.
Published: (2024)
A Hybrid Approach for Document Layout Analysis in Document images
by: Shehzadi, Tahira, et al.
Published: (2024)
by: Shehzadi, Tahira, et al.
Published: (2024)
Towards Khmer Scene Document Layout Detection
by: Kong, Marry, et al.
Published: (2026)
by: Kong, Marry, et al.
Published: (2026)
SFDLA: Source-Free Document Layout Analysis
by: Tewes, Sebastian, et al.
Published: (2025)
by: Tewes, Sebastian, et al.
Published: (2025)
Diachronic Document Dataset for Semantic Layout Analysis
by: Clérice, Thibault, et al.
Published: (2024)
by: Clérice, Thibault, et al.
Published: (2024)
Learning to Generate Human-Human-Object Interactions from Textual Descriptions
by: Na, Jeonghyeon, et al.
Published: (2025)
by: Na, Jeonghyeon, et al.
Published: (2025)
VASCAR: Content-Aware Layout Generation via Visual-Aware Self-Correction
by: Zhang, Jiahao, et al.
Published: (2024)
by: Zhang, Jiahao, et al.
Published: (2024)
Improved Bag-of-Words Image Retrieval with Geometric Constraints for Ground Texture Localization
by: Wilhelm, Aaron, et al.
Published: (2025)
by: Wilhelm, Aaron, et al.
Published: (2025)
LITTA: Late-Interaction and Test-Time Alignment for Visually-Grounded Multimodal Retrieval
by: Kim, Seonok
Published: (2026)
by: Kim, Seonok
Published: (2026)
Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
by: Zhu, Wanrong, et al.
Published: (2024)
by: Zhu, Wanrong, et al.
Published: (2024)
Similar Items
-
Bag of Bags: Adaptive Visual Vocabularies for Genizah Join Image Retrieval
by: Gogawale, Sharva, et al.
Published: (2026) -
Explaining Caption-Image Interactions in CLIP Models with Second-Order Attributions
by: Möller, Lucas, et al.
Published: (2024) -
Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching
by: Choi, Wonseok, et al.
Published: (2025) -
HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities
by: Dönmez, Esra, et al.
Published: (2026) -
RealCamo: Boosting Real Camouflage Synthesis with Layout Controls and Textual-Visual Guidance
by: Chen, Chunyuan, et al.
Published: (2025)