Saved in:
| Main Authors: | Strohmeyer, Tim, Morin, Lucas, Meijer, Gerhard Ingmar, Weber, Valéry, Nassar, Ahmed, Staar, Peter |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.28550 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures
by: Morin, Lucas, et al.
Published: (2025)
by: Morin, Lucas, et al.
Published: (2025)
MolGrapher: Graph-based Visual Recognition of Chemical Structures
by: Morin, Lucas, et al.
Published: (2023)
by: Morin, Lucas, et al.
Published: (2023)
SubGrapher: Visual Fingerprinting of Chemical Structures
by: Morin, Lucas, et al.
Published: (2025)
by: Morin, Lucas, et al.
Published: (2025)
Advanced Layout Analysis Models for Docling
by: Livathinos, Nikolaos, et al.
Published: (2025)
by: Livathinos, Nikolaos, et al.
Published: (2025)
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
by: Nassar, Ahmed, et al.
Published: (2025)
by: Nassar, Ahmed, et al.
Published: (2025)
ScreenParse: Moving Beyond Sparse Grounding with Complete Screen Parsing Supervision
by: Gurbuz, A. Said, et al.
Published: (2026)
by: Gurbuz, A. Said, et al.
Published: (2026)
Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion
by: Livathinos, Nikolaos, et al.
Published: (2025)
by: Livathinos, Nikolaos, et al.
Published: (2025)
Docling Technical Report
by: Auer, Christoph, et al.
Published: (2024)
by: Auer, Christoph, et al.
Published: (2024)
MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild
by: Fang, Xi, et al.
Published: (2024)
by: Fang, Xi, et al.
Published: (2024)
ESG Accountability Made Easy: DocQA at Your Service
by: Mishra, Lokesh, et al.
Published: (2023)
by: Mishra, Lokesh, et al.
Published: (2023)
An Effective End-to-End Solution for Multimodal Action Recognition
by: Wang, Songping, et al.
Published: (2025)
by: Wang, Songping, et al.
Published: (2025)
DocAtlas: Multilingual Document Understanding Across 80+ Languages
by: Heakl, Ahmed, et al.
Published: (2026)
by: Heakl, Ahmed, et al.
Published: (2026)
Generalized Trajectory Scoring for End-to-end Multimodal Planning
by: Li, Zhenxin, et al.
Published: (2025)
by: Li, Zhenxin, et al.
Published: (2025)
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
by: Zhou, Jiaming, et al.
Published: (2023)
by: Zhou, Jiaming, et al.
Published: (2023)
Uncovering the Handwritten Text in the Margins: End-to-end Handwritten Text Detection and Recognition
by: Cheng, Liang, et al.
Published: (2023)
by: Cheng, Liang, et al.
Published: (2023)
End-to-end Training for Text-to-Image Synthesis using Dual-Text Embeddings
by: Ahmed, Yeruru Asrar, et al.
Published: (2025)
by: Ahmed, Yeruru Asrar, et al.
Published: (2025)
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
by: Li, Zhenxin, et al.
Published: (2024)
by: Li, Zhenxin, et al.
Published: (2024)
End2end-ALARA: Approaching the ALARA Law in CT Imaging with End-to-end Learning
by: Tao, Xi, et al.
Published: (2025)
by: Tao, Xi, et al.
Published: (2025)
Multi-Scale High-Resolution Logarithmic Grapher Module for Efficient Vision GNNs
by: Munir, Mustafa, et al.
Published: (2025)
by: Munir, Mustafa, et al.
Published: (2025)
End-to-End Chess Recognition
by: Masouris, Athanasios, et al.
Published: (2023)
by: Masouris, Athanasios, et al.
Published: (2023)
Latent Diffusion for Medical Image Segmentation: End to end learning for fast sampling and accuracy
by: Zaman, Fahim Ahmed, et al.
Published: (2024)
by: Zaman, Fahim Ahmed, et al.
Published: (2024)
End-to-end information extraction in handwritten documents: Understanding Paris marriage records from 1880 to 1940
by: Constum, Thomas, et al.
Published: (2024)
by: Constum, Thomas, et al.
Published: (2024)
End4: End-to-end Denoising Diffusion for Diffusion-Based Inpainting Detection
by: Wang, Fei, et al.
Published: (2025)
by: Wang, Fei, et al.
Published: (2025)
E2E-GMNER: End-to-End Generative Grounded Multimodal Named Entity Recognition
by: Zhang, Meng, et al.
Published: (2026)
by: Zhang, Meng, et al.
Published: (2026)
End-to-end Surface Optimization for Light Control
by: Sun, Yuou, et al.
Published: (2024)
by: Sun, Yuou, et al.
Published: (2024)
Exploring Automated Recognition of Instructional Activity and Discourse from Multimodal Classroom Data
by: Bueno, Ivo, et al.
Published: (2025)
by: Bueno, Ivo, et al.
Published: (2025)
End-to-end Semantic-centric Video-based Multimodal Affective Computing
by: Lin, Ronghao, et al.
Published: (2024)
by: Lin, Ronghao, et al.
Published: (2024)
DREAM: Document Reconstruction via End-to-end Autoregressive Model
by: Li, Xin, et al.
Published: (2025)
by: Li, Xin, et al.
Published: (2025)
Closing the Navigation Compliance Gap in End-to-end Autonomous Driving
by: Wu, Hanfeng, et al.
Published: (2025)
by: Wu, Hanfeng, et al.
Published: (2025)
Prototyping an End-to-End Multi-Modal Tiny-CNN for Cardiovascular Sensor Patches
by: Ibrahim, Mustafa Fuad Rifet, et al.
Published: (2025)
by: Ibrahim, Mustafa Fuad Rifet, et al.
Published: (2025)
CFVNet: An End-to-End Cancelable Finger Vein Network for Recognition
by: Wang, Yifan, et al.
Published: (2024)
by: Wang, Yifan, et al.
Published: (2024)
VAPO: End-to-end Slide-Enhanced Speech Recognition with Omni-modal Large Language Models
by: Hu, Rui, et al.
Published: (2025)
by: Hu, Rui, et al.
Published: (2025)
MMIS: Multimodal Dataset for Interior Scene Visual Generation and Recognition
by: Kassab, Hozaifa, et al.
Published: (2024)
by: Kassab, Hozaifa, et al.
Published: (2024)
RFL: Simplifying Chemical Structure Recognition with Ring-Free Language
by: Chang, Qikai, et al.
Published: (2024)
by: Chang, Qikai, et al.
Published: (2024)
Atom-Level Optical Chemical Structure Recognition with Limited Supervision
by: Oldenhof, Martijn, et al.
Published: (2024)
by: Oldenhof, Martijn, et al.
Published: (2024)
HALO: Human-Aligned End-to-end Image Retargeting with Layered Transformations
by: Xu, Yiran, et al.
Published: (2025)
by: Xu, Yiran, et al.
Published: (2025)
Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss
by: Cai, Zhi, et al.
Published: (2023)
by: Cai, Zhi, et al.
Published: (2023)
DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation
by: Hayder, Zeeshan, et al.
Published: (2024)
by: Hayder, Zeeshan, et al.
Published: (2024)
Better Sampling, towards Better End-to-end Small Object Detection
by: Huang, Zile, et al.
Published: (2024)
by: Huang, Zile, et al.
Published: (2024)
End-to-end multi-modal product matching in fashion e-commerce
by: Tóth, Sándor, et al.
Published: (2024)
by: Tóth, Sándor, et al.
Published: (2024)
Similar Items
-
MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures
by: Morin, Lucas, et al.
Published: (2025) -
MolGrapher: Graph-based Visual Recognition of Chemical Structures
by: Morin, Lucas, et al.
Published: (2023) -
SubGrapher: Visual Fingerprinting of Chemical Structures
by: Morin, Lucas, et al.
Published: (2025) -
Advanced Layout Analysis Models for Docling
by: Livathinos, Nikolaos, et al.
Published: (2025) -
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
by: Nassar, Ahmed, et al.
Published: (2025)