:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tóth, Sándor, Wilson, Stephen, Tsoukara, Alexia, Moreu, Enric, Masalovich, Anton, Roemheld, Lars
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2403.11593
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

METDrive: Multi-modal End-to-end Autonomous Driving with Temporal Guidance
by: Guo, Ziang, et al.
Published: (2024)

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
by: Nassar, Ahmed, et al.
Published: (2025)

End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting
by: Wang, Yongqi, et al.
Published: (2024)

Methodology to Deploy CNN-Based Computer Vision Models on Immersive Wearable Devices
by: Malek, Kaveh, et al.
Published: (2024)

OneVision: An End-to-End Generative Framework for Multi-view E-commerce Vision Search
by: Zheng, Zexin, et al.
Published: (2025)

DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view Representation
by: Luo, Yueru, et al.
Published: (2024)

End2end-ALARA: Approaching the ALARA Law in CT Imaging with End-to-end Learning
by: Tao, Xi, et al.
Published: (2025)

End4: End-to-end Denoising Diffusion for Diffusion-Based Inpainting Detection
by: Wang, Fei, et al.
Published: (2025)

Fashion130K: An E-commerce Fashion Dataset for Outfit Generation with Unified Multi-modal Condition
by: He, Yu, et al.
Published: (2026)

Prompt2Fashion: An automatically generated fashion dataset
by: Argyrou, Georgia, et al.
Published: (2024)

TransDiffuser: Diverse Trajectory Generation with Decorrelated Multi-modal Representation for End-to-end Autonomous Driving
by: Jiang, Xuefeng, et al.
Published: (2025)

VAPO: End-to-end Slide-Enhanced Speech Recognition with Omni-modal Large Language Models
by: Hu, Rui, et al.
Published: (2025)

End-to-end Surface Optimization for Light Control
by: Sun, Yuou, et al.
Published: (2024)

DREAM: Document Reconstruction via End-to-end Autoregressive Model
by: Li, Xin, et al.
Published: (2025)

Closing the Navigation Compliance Gap in End-to-end Autonomous Driving
by: Wu, Hanfeng, et al.
Published: (2025)

Enhancing Weakly Supervised Semantic Segmentation with Multi-modal Foundation Models: An End-to-End Approach
by: Ravanbakhsh, Elham, et al.
Published: (2024)

DETRPose: Real-time end-to-end transformer model for multi-person pose estimation
by: Janampa, Sebastian, et al.
Published: (2025)

Generalized Trajectory Scoring for End-to-end Multimodal Planning
by: Li, Zhenxin, et al.
Published: (2025)

HALO: Human-Aligned End-to-end Image Retargeting with Layered Transformations
by: Xu, Yiran, et al.
Published: (2025)

Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss
by: Cai, Zhi, et al.
Published: (2023)

DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation
by: Hayder, Zeeshan, et al.
Published: (2024)

Better Sampling, towards Better End-to-end Small Object Detection
by: Huang, Zile, et al.
Published: (2024)

VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing
by: Yu, Meng, et al.
Published: (2024)

MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild
by: Fang, Xi, et al.
Published: (2024)

MarkushGrapher-2: End-to-end Multimodal Recognition of Chemical Structures
by: Strohmeyer, Tim, et al.
Published: (2026)

GraphAD: Interaction Scene Graph for End-to-end Autonomous Driving
by: Zhang, Yunpeng, et al.
Published: (2024)

Serial fusion of multi-modal biometric systems
by: Marcialis, Gian Luca, et al.
Published: (2024)

EREBUS: End-to-end Robust Event Based Underwater Simulation
by: Kyatham, Hitesh, et al.
Published: (2025)

Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
by: Zhou, Jiaming, et al.
Published: (2023)

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
by: Li, Zhenxin, et al.
Published: (2024)

End-to-end Training for Text-to-Image Synthesis using Dual-Text Embeddings
by: Ahmed, Yeruru Asrar, et al.
Published: (2025)

End-to-end Feature Alignment: A Simple CNN with Intrinsic Class Attribution
by: Farvardin, Parniyan, et al.
Published: (2026)

Uncovering the Handwritten Text in the Margins: End-to-end Handwritten Text Detection and Recognition
by: Cheng, Liang, et al.
Published: (2023)

SEMPose: A Single End-to-end Network for Multi-object Pose Estimation
by: Liu, Xin, et al.
Published: (2024)

UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving
by: Lu, Hao, et al.
Published: (2025)

GazeHTA: End-to-end Gaze Target Detection with Head-Target Association
by: Lin, Zhi-Yi, et al.
Published: (2024)

Large Spatial Model: End-to-end Unposed Images to Semantic 3D
by: Fan, Zhiwen, et al.
Published: (2024)

End-to-end differentiable design of geometric waveguide displays
by: Yang, Xinge, et al.
Published: (2026)

SGTR+: End-to-end Scene Graph Generation with Transformer
by: Li, Rongjie, et al.
Published: (2024)

PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving
by: Chen, Zhili, et al.
Published: (2023)