:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tilli, Pascal, Mesgar, Mohsen
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2605.08421
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Bag of Bags: Adaptive Visual Vocabularies for Genizah Join Image Retrieval
by: Gogawale, Sharva, et al.
Published: (2026)

Explaining Caption-Image Interactions in CLIP Models with Second-Order Attributions
by: Möller, Lucas, et al.
Published: (2024)

Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching
by: Choi, Wonseok, et al.
Published: (2025)

HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities
by: Dönmez, Esra, et al.
Published: (2026)

RealCamo: Boosting Real Camouflage Synthesis with Layout Controls and Textual-Visual Guidance
by: Chen, Chunyuan, et al.
Published: (2025)

Unveil: Unified Visual-Textual Integration and Distillation for Multi-modal Document Retrieval
by: Sun, Hao, et al.
Published: (2026)

DoPTA: Improving Document Layout Analysis using Patch-Text Alignment
by: SR, Nikitha, et al.
Published: (2024)

Fine-grained Motion Retrieval via Joint-Angle Motion Images and Token-Patch Late Interaction
by: Zhang, Yao, et al.
Published: (2026)

OmniDocLayout: Towards Diverse Document Layout Generation via Coarse-to-Fine LLM Learning
by: Kang, Hengrui, et al.
Published: (2025)

Beyond the Textual: Generating Coherent Visual Options for MCQs
by: Wang, Wanqiang, et al.
Published: (2025)

Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
by: Horita, Daichi, et al.
Published: (2023)

Patch-Wise Self-Supervised Visual Representation Learning: A Fine-Grained Approach
by: Javidani, Ali, et al.
Published: (2023)

SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs
by: Yin, Yuanyang, et al.
Published: (2024)

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
by: Zhao, Zhiyuan, et al.
Published: (2024)

ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training
by: Jiang, Zhouqiang, et al.
Published: (2024)

Visual Late Chunking: An Empirical Study of Contextual Chunking for Efficient Visual Document Retrieval
by: Yan, Yibo, et al.
Published: (2026)

LayoutRAG: Retrieval-Augmented Model for Content-agnostic Conditional Layout Generation
by: Wu, Yuxuan, et al.
Published: (2025)

Visually Guided Generative Text-Layout Pre-training for Document Intelligence
by: Mao, Zhiming, et al.
Published: (2024)

Weakly Supervised Contrastive Learning for Histopathology Patch Embeddings
by: Zhang, Bodong, et al.
Published: (2026)

Semantic Graph Consistency: Going Beyond Patches for Regularizing Self-Supervised Vision Transformers
by: Devaguptapu, Chaitanya, et al.
Published: (2024)

LED Benchmark: Diagnosing Structural Layout Errors for Document Layout Analysis
by: Heo, Inbum, et al.
Published: (2025)

Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts
by: Li, Gengluo, et al.
Published: (2025)

Beyond Global Alignment: Fine-Grained Motion-Language Retrieval via Pyramidal Shapley-Taylor Learning
by: Chen, Hanmo, et al.
Published: (2026)

HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion
by: Su, Junhao, et al.
Published: (2024)

Patch-Level Kernel Alignment for Dense Self-Supervised Learning
by: Yeo, Juan, et al.
Published: (2025)

TableSeq: Unified Generation of Structure, Content, and Layout
by: Hamdi, Laziz, et al.
Published: (2026)

Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM
by: Wang, Can, et al.
Published: (2024)

PARL: Position-Aware Relation Learning Network for Document Layout Analysis
by: Liu, Fuyuan, et al.
Published: (2026)

Semi-Supervised 360 Layout Estimation with Panoramic Collaborative Perturbations
by: Zhang, Junsong, et al.
Published: (2025)

PILOT: A Promptable Interleaved Layout-aware OCR Transformer
by: Hamdi, Laziz, et al.
Published: (2025)

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
by: Luo, Chuwei, et al.
Published: (2024)

A Hybrid Approach for Document Layout Analysis in Document images
by: Shehzadi, Tahira, et al.
Published: (2024)

Towards Khmer Scene Document Layout Detection
by: Kong, Marry, et al.
Published: (2026)

SFDLA: Source-Free Document Layout Analysis
by: Tewes, Sebastian, et al.
Published: (2025)

Diachronic Document Dataset for Semantic Layout Analysis
by: Clérice, Thibault, et al.
Published: (2024)

Learning to Generate Human-Human-Object Interactions from Textual Descriptions
by: Na, Jeonghyeon, et al.
Published: (2025)

VASCAR: Content-Aware Layout Generation via Visual-Aware Self-Correction
by: Zhang, Jiahao, et al.
Published: (2024)

Improved Bag-of-Words Image Retrieval with Geometric Constraints for Ground Texture Localization
by: Wilhelm, Aaron, et al.
Published: (2025)

LITTA: Late-Interaction and Test-Time Alignment for Visually-Grounded Multimodal Retrieval
by: Kim, Seonok
Published: (2026)

Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
by: Zhu, Wanrong, et al.
Published: (2024)