Saved in:
| Main Authors: | Morin, Lucas, Meijer, Gerhard Ingmar, Weber, Valéry, Van Gool, Luc, Staar, Peter W. J. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.19695 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures
by: Morin, Lucas, et al.
Published: (2025)
by: Morin, Lucas, et al.
Published: (2025)
MarkushGrapher-2: End-to-end Multimodal Recognition of Chemical Structures
by: Strohmeyer, Tim, et al.
Published: (2026)
by: Strohmeyer, Tim, et al.
Published: (2026)
MolGrapher: Graph-based Visual Recognition of Chemical Structures
by: Morin, Lucas, et al.
Published: (2023)
by: Morin, Lucas, et al.
Published: (2023)
Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits
by: Balauca, Ada-Astrid, et al.
Published: (2024)
by: Balauca, Ada-Astrid, et al.
Published: (2024)
Fine-Grained Spatial and Verbal Losses for 3D Visual Grounding
by: Dey, Sombit, et al.
Published: (2024)
by: Dey, Sombit, et al.
Published: (2024)
Advanced Layout Analysis Models for Docling
by: Livathinos, Nikolaos, et al.
Published: (2025)
by: Livathinos, Nikolaos, et al.
Published: (2025)
Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding
by: Unal, Ozan, et al.
Published: (2023)
by: Unal, Ozan, et al.
Published: (2023)
Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion
by: Livathinos, Nikolaos, et al.
Published: (2025)
by: Livathinos, Nikolaos, et al.
Published: (2025)
Test-time Training for Hyperspectral Image Super-resolution
by: Li, Ke, et al.
Published: (2024)
by: Li, Ke, et al.
Published: (2024)
Optimizing against Infeasible Inclusions from Data for Semantic Segmentation through Morphology
by: Basu, Shamik, et al.
Published: (2024)
by: Basu, Shamik, et al.
Published: (2024)
Bayesian Self-Training for Semi-Supervised 3D Segmentation
by: Unal, Ozan, et al.
Published: (2024)
by: Unal, Ozan, et al.
Published: (2024)
TrafficBots V1.5: Traffic Simulation via Conditional VAEs and Transformers with Relative Pose Encoding
by: Zhang, Zhejun, et al.
Published: (2024)
by: Zhang, Zhejun, et al.
Published: (2024)
EvenNICER-SLAM: Event-based Neural Implicit Encoding SLAM
by: Chen, Shi, et al.
Published: (2024)
by: Chen, Shi, et al.
Published: (2024)
Docling Technical Report
by: Auer, Christoph, et al.
Published: (2024)
by: Auer, Christoph, et al.
Published: (2024)
Investigating the Effectiveness of Cross-Attention to Unlock Zero-Shot Editing of Text-to-Video Diffusion Models
by: Motamed, Saman, et al.
Published: (2024)
by: Motamed, Saman, et al.
Published: (2024)
Enhanced Multi-Scale Cross-Attention for Person Image Generation
by: Tang, Hao, et al.
Published: (2025)
by: Tang, Hao, et al.
Published: (2025)
Towards Online Real-Time Memory-based Video Inpainting Transformers
by: Thiry, Guillaume, et al.
Published: (2024)
by: Thiry, Guillaume, et al.
Published: (2024)
Condition-Invariant Semantic Segmentation
by: Sakaridis, Christos, et al.
Published: (2023)
by: Sakaridis, Christos, et al.
Published: (2023)
Graph Transformer GANs with Graph Masked Modeling for Architectural Layout Generation
by: Tang, Hao, et al.
Published: (2024)
by: Tang, Hao, et al.
Published: (2024)
MatIR: A Hybrid Mamba-Transformer Image Restoration Model
by: Wen, Juan, et al.
Published: (2025)
by: Wen, Juan, et al.
Published: (2025)
CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes
by: Broedermann, Tim, et al.
Published: (2024)
by: Broedermann, Tim, et al.
Published: (2024)
Sun Off, Lights On: Photorealistic Monocular Nighttime Simulation for Robust Semantic Perception
by: Tzevelekakis, Konstantinos, et al.
Published: (2024)
by: Tzevelekakis, Konstantinos, et al.
Published: (2024)
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
by: Dey, Sombit, et al.
Published: (2024)
by: Dey, Sombit, et al.
Published: (2024)
A Simple and Generalist Approach for Panoptic Segmentation
by: Prisadnikov, Nedyalko, et al.
Published: (2024)
by: Prisadnikov, Nedyalko, et al.
Published: (2024)
Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes
by: Ma, Qi, et al.
Published: (2024)
by: Ma, Qi, et al.
Published: (2024)
Continuous Pose for Monocular Cameras in Neural Implicit Representation
by: Ma, Qi, et al.
Published: (2023)
by: Ma, Qi, et al.
Published: (2023)
From Synchrony to Sequence: Exo-to-Ego Generation via Interpolation
by: Mahdi, Mohammad, et al.
Published: (2026)
by: Mahdi, Mohammad, et al.
Published: (2026)
Vision encoders should be image size agnostic and task driven
by: Prisadnikov, Nedyalko, et al.
Published: (2025)
by: Prisadnikov, Nedyalko, et al.
Published: (2025)
Self-supervised pretraining for an iterative image size agnostic vision transformer
by: Prisadnikov, Nedyalko, et al.
Published: (2026)
by: Prisadnikov, Nedyalko, et al.
Published: (2026)
HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud
by: Cheng, Wencan, et al.
Published: (2024)
by: Cheng, Wencan, et al.
Published: (2024)
Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
by: Motamed, Saman, et al.
Published: (2023)
by: Motamed, Saman, et al.
Published: (2023)
Spatial-Temporal Graph Mamba for Music-Guided Dance Video Synthesis
by: Tang, Hao, et al.
Published: (2025)
by: Tang, Hao, et al.
Published: (2025)
PBR-NeRF: Inverse Rendering with Physics-Based Neural Fields
by: Wu, Sean, et al.
Published: (2024)
by: Wu, Sean, et al.
Published: (2024)
Inferring Compositional 4D Scenes without Ever Seeing One
by: Gokmen, Ahmet Berke, et al.
Published: (2025)
by: Gokmen, Ahmet Berke, et al.
Published: (2025)
Self-supervised Shape Completion via Involution and Implicit Correspondences
by: Liu, Mengya, et al.
Published: (2024)
by: Liu, Mengya, et al.
Published: (2024)
Video Depth Propagation
by: Piccinelli, Luigi, et al.
Published: (2025)
by: Piccinelli, Luigi, et al.
Published: (2025)
Camera-Only 3D Panoptic Scene Completion for Autonomous Driving through Differentiable Object Shapes
by: Marinello, Nicola, et al.
Published: (2025)
by: Marinello, Nicola, et al.
Published: (2025)
RICO: Two Realistic Benchmarks and an In-Depth Analysis for Incremental Learning in Object Detection
by: Neuwirth-Trapp, Matthias, et al.
Published: (2025)
by: Neuwirth-Trapp, Matthias, et al.
Published: (2025)
Incremental Object Detection with Prompt-based Methods
by: Neuwirth-Trapp, Matthias, et al.
Published: (2025)
by: Neuwirth-Trapp, Matthias, et al.
Published: (2025)
Occam's LGS: An Efficient Approach for Language Gaussian Splatting
by: Cheng, Jiahuan, et al.
Published: (2024)
by: Cheng, Jiahuan, et al.
Published: (2024)
Similar Items
-
MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures
by: Morin, Lucas, et al.
Published: (2025) -
MarkushGrapher-2: End-to-end Multimodal Recognition of Chemical Structures
by: Strohmeyer, Tim, et al.
Published: (2026) -
MolGrapher: Graph-based Visual Recognition of Chemical Structures
by: Morin, Lucas, et al.
Published: (2023) -
Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits
by: Balauca, Ada-Astrid, et al.
Published: (2024) -
Fine-Grained Spatial and Verbal Losses for 3D Visual Grounding
by: Dey, Sombit, et al.
Published: (2024)