:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Morin, Lucas, Weber, Valéry, Nassar, Ahmed, Meijer, Gerhard Ingmar, Van Gool, Luc, Li, Yawei, Staar, Peter
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.16096
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MarkushGrapher-2: End-to-end Multimodal Recognition of Chemical Structures
by: Strohmeyer, Tim, et al.
Published: (2026)

SubGrapher: Visual Fingerprinting of Chemical Structures
by: Morin, Lucas, et al.
Published: (2025)

MolGrapher: Graph-based Visual Recognition of Chemical Structures
by: Morin, Lucas, et al.
Published: (2023)

Маркуш Олександр Іванович [Markush Oleksandr Ivanovych]
by: В. В. Ґабор
Published: (2018)

ScreenParse: Moving Beyond Sparse Grounding with Complete Screen Parsing Supervision
by: Gurbuz, A. Said, et al.
Published: (2026)

Advanced Layout Analysis Models for Docling
by: Livathinos, Nikolaos, et al.
Published: (2025)

Shapley Pruning for Neural Network Compression
by: Adamczewski, Kamil, et al.
Published: (2024)

Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion
by: Livathinos, Nikolaos, et al.
Published: (2025)

Docling Technical Report
by: Auer, Christoph, et al.
Published: (2024)

Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits
by: Balauca, Ada-Astrid, et al.
Published: (2024)

Fine-Grained Spatial and Verbal Losses for 3D Visual Grounding
by: Dey, Sombit, et al.
Published: (2024)

LocalViT: Analyzing Locality in Vision Transformers
by: Li, Yawei, et al.
Published: (2021)

Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding
by: Unal, Ozan, et al.
Published: (2023)

Test-time Training for Hyperspectral Image Super-resolution
by: Li, Ke, et al.
Published: (2024)

Optimizing against Infeasible Inclusions from Data for Semantic Segmentation through Morphology
by: Basu, Shamik, et al.
Published: (2024)

Bayesian Self-Training for Semi-Supervised 3D Segmentation
by: Unal, Ozan, et al.
Published: (2024)

TrafficBots V1.5: Traffic Simulation via Conditional VAEs and Transformers with Relative Pose Encoding
by: Zhang, Zhejun, et al.
Published: (2024)

Investigating the Effectiveness of Cross-Attention to Unlock Zero-Shot Editing of Text-to-Video Diffusion Models
by: Motamed, Saman, et al.
Published: (2024)

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
by: Nassar, Ahmed, et al.
Published: (2025)

EvenNICER-SLAM: Event-based Neural Implicit Encoding SLAM
by: Chen, Shi, et al.
Published: (2024)

ESG Accountability Made Easy: DocQA at Your Service
by: Mishra, Lokesh, et al.
Published: (2023)

Visual and Textual Prompts in VLLMs for Enhancing Emotion Recognition
by: Wang, Zhifeng, et al.
Published: (2025)

Empowering Image Recovery_ A Multi-Attention Approach
by: Wen, Juan, et al.
Published: (2024)

ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
by: Dey, Sombit, et al.
Published: (2024)

Enhanced Multi-Scale Cross-Attention for Person Image Generation
by: Tang, Hao, et al.
Published: (2025)

Towards Online Real-Time Memory-based Video Inpainting Transformers
by: Thiry, Guillaume, et al.
Published: (2024)

Condition-Invariant Semantic Segmentation
by: Sakaridis, Christos, et al.
Published: (2023)

Graph Transformer GANs with Graph Masked Modeling for Architectural Layout Generation
by: Tang, Hao, et al.
Published: (2024)

MatIR: A Hybrid Mamba-Transformer Image Restoration Model
by: Wen, Juan, et al.
Published: (2025)

CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes
by: Broedermann, Tim, et al.
Published: (2024)

Sun Off, Lights On: Photorealistic Monocular Nighttime Simulation for Robust Semantic Perception
by: Tzevelekakis, Konstantinos, et al.
Published: (2024)

A Simple and Generalist Approach for Panoptic Segmentation
by: Prisadnikov, Nedyalko, et al.
Published: (2024)

DocAtlas: Multilingual Document Understanding Across 80+ Languages
by: Heakl, Ahmed, et al.
Published: (2026)

Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning
by: Ren, Bin, et al.
Published: (2024)

Sharing Key Semantics in Transformer Makes Efficient Image Restoration
by: Ren, Bin, et al.
Published: (2024)

Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes
by: Ma, Qi, et al.
Published: (2024)

Continuous Pose for Monocular Cameras in Neural Implicit Representation
by: Ma, Qi, et al.
Published: (2023)

From Synchrony to Sequence: Exo-to-Ego Generation via Interpolation
by: Mahdi, Mohammad, et al.
Published: (2026)

Vision encoders should be image size agnostic and task driven
by: Prisadnikov, Nedyalko, et al.
Published: (2025)

Self-supervised pretraining for an iterative image size agnostic vision transformer
by: Prisadnikov, Nedyalko, et al.
Published: (2026)