:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Strohmeyer, Tim, Morin, Lucas, Meijer, Gerhard Ingmar, Weber, Valéry, Nassar, Ahmed, Staar, Peter
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.28550
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures
by: Morin, Lucas, et al.
Published: (2025)

MolGrapher: Graph-based Visual Recognition of Chemical Structures
by: Morin, Lucas, et al.
Published: (2023)

SubGrapher: Visual Fingerprinting of Chemical Structures
by: Morin, Lucas, et al.
Published: (2025)

Advanced Layout Analysis Models for Docling
by: Livathinos, Nikolaos, et al.
Published: (2025)

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
by: Nassar, Ahmed, et al.
Published: (2025)

ScreenParse: Moving Beyond Sparse Grounding with Complete Screen Parsing Supervision
by: Gurbuz, A. Said, et al.
Published: (2026)

Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion
by: Livathinos, Nikolaos, et al.
Published: (2025)

Docling Technical Report
by: Auer, Christoph, et al.
Published: (2024)

MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild
by: Fang, Xi, et al.
Published: (2024)

ESG Accountability Made Easy: DocQA at Your Service
by: Mishra, Lokesh, et al.
Published: (2023)

An Effective End-to-End Solution for Multimodal Action Recognition
by: Wang, Songping, et al.
Published: (2025)

DocAtlas: Multilingual Document Understanding Across 80+ Languages
by: Heakl, Ahmed, et al.
Published: (2026)

Generalized Trajectory Scoring for End-to-end Multimodal Planning
by: Li, Zhenxin, et al.
Published: (2025)

Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
by: Zhou, Jiaming, et al.
Published: (2023)

Uncovering the Handwritten Text in the Margins: End-to-end Handwritten Text Detection and Recognition
by: Cheng, Liang, et al.
Published: (2023)

End-to-end Training for Text-to-Image Synthesis using Dual-Text Embeddings
by: Ahmed, Yeruru Asrar, et al.
Published: (2025)

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
by: Li, Zhenxin, et al.
Published: (2024)

End2end-ALARA: Approaching the ALARA Law in CT Imaging with End-to-end Learning
by: Tao, Xi, et al.
Published: (2025)

Multi-Scale High-Resolution Logarithmic Grapher Module for Efficient Vision GNNs
by: Munir, Mustafa, et al.
Published: (2025)

End-to-End Chess Recognition
by: Masouris, Athanasios, et al.
Published: (2023)

Latent Diffusion for Medical Image Segmentation: End to end learning for fast sampling and accuracy
by: Zaman, Fahim Ahmed, et al.
Published: (2024)

End-to-end information extraction in handwritten documents: Understanding Paris marriage records from 1880 to 1940
by: Constum, Thomas, et al.
Published: (2024)

End4: End-to-end Denoising Diffusion for Diffusion-Based Inpainting Detection
by: Wang, Fei, et al.
Published: (2025)

E2E-GMNER: End-to-End Generative Grounded Multimodal Named Entity Recognition
by: Zhang, Meng, et al.
Published: (2026)

End-to-end Surface Optimization for Light Control
by: Sun, Yuou, et al.
Published: (2024)

Exploring Automated Recognition of Instructional Activity and Discourse from Multimodal Classroom Data
by: Bueno, Ivo, et al.
Published: (2025)

End-to-end Semantic-centric Video-based Multimodal Affective Computing
by: Lin, Ronghao, et al.
Published: (2024)

DREAM: Document Reconstruction via End-to-end Autoregressive Model
by: Li, Xin, et al.
Published: (2025)

Closing the Navigation Compliance Gap in End-to-end Autonomous Driving
by: Wu, Hanfeng, et al.
Published: (2025)

Prototyping an End-to-End Multi-Modal Tiny-CNN for Cardiovascular Sensor Patches
by: Ibrahim, Mustafa Fuad Rifet, et al.
Published: (2025)

CFVNet: An End-to-End Cancelable Finger Vein Network for Recognition
by: Wang, Yifan, et al.
Published: (2024)

VAPO: End-to-end Slide-Enhanced Speech Recognition with Omni-modal Large Language Models
by: Hu, Rui, et al.
Published: (2025)

MMIS: Multimodal Dataset for Interior Scene Visual Generation and Recognition
by: Kassab, Hozaifa, et al.
Published: (2024)

RFL: Simplifying Chemical Structure Recognition with Ring-Free Language
by: Chang, Qikai, et al.
Published: (2024)

Atom-Level Optical Chemical Structure Recognition with Limited Supervision
by: Oldenhof, Martijn, et al.
Published: (2024)

HALO: Human-Aligned End-to-end Image Retargeting with Layered Transformations
by: Xu, Yiran, et al.
Published: (2025)

Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss
by: Cai, Zhi, et al.
Published: (2023)

DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation
by: Hayder, Zeeshan, et al.
Published: (2024)

Better Sampling, towards Better End-to-end Small Object Detection
by: Huang, Zile, et al.
Published: (2024)

End-to-end multi-modal product matching in fashion e-commerce
by: Tóth, Sándor, et al.
Published: (2024)