Saved in:
| Main Authors: | Tóth, Sándor, Wilson, Stephen, Tsoukara, Alexia, Moreu, Enric, Masalovich, Anton, Roemheld, Lars |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.11593 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
METDrive: Multi-modal End-to-end Autonomous Driving with Temporal Guidance
by: Guo, Ziang, et al.
Published: (2024)
by: Guo, Ziang, et al.
Published: (2024)
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
by: Nassar, Ahmed, et al.
Published: (2025)
by: Nassar, Ahmed, et al.
Published: (2025)
End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting
by: Wang, Yongqi, et al.
Published: (2024)
by: Wang, Yongqi, et al.
Published: (2024)
Methodology to Deploy CNN-Based Computer Vision Models on Immersive Wearable Devices
by: Malek, Kaveh, et al.
Published: (2024)
by: Malek, Kaveh, et al.
Published: (2024)
OneVision: An End-to-End Generative Framework for Multi-view E-commerce Vision Search
by: Zheng, Zexin, et al.
Published: (2025)
by: Zheng, Zexin, et al.
Published: (2025)
DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view Representation
by: Luo, Yueru, et al.
Published: (2024)
by: Luo, Yueru, et al.
Published: (2024)
End2end-ALARA: Approaching the ALARA Law in CT Imaging with End-to-end Learning
by: Tao, Xi, et al.
Published: (2025)
by: Tao, Xi, et al.
Published: (2025)
End4: End-to-end Denoising Diffusion for Diffusion-Based Inpainting Detection
by: Wang, Fei, et al.
Published: (2025)
by: Wang, Fei, et al.
Published: (2025)
Fashion130K: An E-commerce Fashion Dataset for Outfit Generation with Unified Multi-modal Condition
by: He, Yu, et al.
Published: (2026)
by: He, Yu, et al.
Published: (2026)
Prompt2Fashion: An automatically generated fashion dataset
by: Argyrou, Georgia, et al.
Published: (2024)
by: Argyrou, Georgia, et al.
Published: (2024)
TransDiffuser: Diverse Trajectory Generation with Decorrelated Multi-modal Representation for End-to-end Autonomous Driving
by: Jiang, Xuefeng, et al.
Published: (2025)
by: Jiang, Xuefeng, et al.
Published: (2025)
VAPO: End-to-end Slide-Enhanced Speech Recognition with Omni-modal Large Language Models
by: Hu, Rui, et al.
Published: (2025)
by: Hu, Rui, et al.
Published: (2025)
End-to-end Surface Optimization for Light Control
by: Sun, Yuou, et al.
Published: (2024)
by: Sun, Yuou, et al.
Published: (2024)
DREAM: Document Reconstruction via End-to-end Autoregressive Model
by: Li, Xin, et al.
Published: (2025)
by: Li, Xin, et al.
Published: (2025)
Closing the Navigation Compliance Gap in End-to-end Autonomous Driving
by: Wu, Hanfeng, et al.
Published: (2025)
by: Wu, Hanfeng, et al.
Published: (2025)
Enhancing Weakly Supervised Semantic Segmentation with Multi-modal Foundation Models: An End-to-End Approach
by: Ravanbakhsh, Elham, et al.
Published: (2024)
by: Ravanbakhsh, Elham, et al.
Published: (2024)
DETRPose: Real-time end-to-end transformer model for multi-person pose estimation
by: Janampa, Sebastian, et al.
Published: (2025)
by: Janampa, Sebastian, et al.
Published: (2025)
Generalized Trajectory Scoring for End-to-end Multimodal Planning
by: Li, Zhenxin, et al.
Published: (2025)
by: Li, Zhenxin, et al.
Published: (2025)
HALO: Human-Aligned End-to-end Image Retargeting with Layered Transformations
by: Xu, Yiran, et al.
Published: (2025)
by: Xu, Yiran, et al.
Published: (2025)
Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss
by: Cai, Zhi, et al.
Published: (2023)
by: Cai, Zhi, et al.
Published: (2023)
DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation
by: Hayder, Zeeshan, et al.
Published: (2024)
by: Hayder, Zeeshan, et al.
Published: (2024)
Better Sampling, towards Better End-to-end Small Object Detection
by: Huang, Zile, et al.
Published: (2024)
by: Huang, Zile, et al.
Published: (2024)
VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing
by: Yu, Meng, et al.
Published: (2024)
by: Yu, Meng, et al.
Published: (2024)
MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild
by: Fang, Xi, et al.
Published: (2024)
by: Fang, Xi, et al.
Published: (2024)
MarkushGrapher-2: End-to-end Multimodal Recognition of Chemical Structures
by: Strohmeyer, Tim, et al.
Published: (2026)
by: Strohmeyer, Tim, et al.
Published: (2026)
GraphAD: Interaction Scene Graph for End-to-end Autonomous Driving
by: Zhang, Yunpeng, et al.
Published: (2024)
by: Zhang, Yunpeng, et al.
Published: (2024)
Serial fusion of multi-modal biometric systems
by: Marcialis, Gian Luca, et al.
Published: (2024)
by: Marcialis, Gian Luca, et al.
Published: (2024)
EREBUS: End-to-end Robust Event Based Underwater Simulation
by: Kyatham, Hitesh, et al.
Published: (2025)
by: Kyatham, Hitesh, et al.
Published: (2025)
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
by: Zhou, Jiaming, et al.
Published: (2023)
by: Zhou, Jiaming, et al.
Published: (2023)
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
by: Li, Zhenxin, et al.
Published: (2024)
by: Li, Zhenxin, et al.
Published: (2024)
End-to-end Training for Text-to-Image Synthesis using Dual-Text Embeddings
by: Ahmed, Yeruru Asrar, et al.
Published: (2025)
by: Ahmed, Yeruru Asrar, et al.
Published: (2025)
End-to-end Feature Alignment: A Simple CNN with Intrinsic Class Attribution
by: Farvardin, Parniyan, et al.
Published: (2026)
by: Farvardin, Parniyan, et al.
Published: (2026)
Uncovering the Handwritten Text in the Margins: End-to-end Handwritten Text Detection and Recognition
by: Cheng, Liang, et al.
Published: (2023)
by: Cheng, Liang, et al.
Published: (2023)
SEMPose: A Single End-to-end Network for Multi-object Pose Estimation
by: Liu, Xin, et al.
Published: (2024)
by: Liu, Xin, et al.
Published: (2024)
UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving
by: Lu, Hao, et al.
Published: (2025)
by: Lu, Hao, et al.
Published: (2025)
GazeHTA: End-to-end Gaze Target Detection with Head-Target Association
by: Lin, Zhi-Yi, et al.
Published: (2024)
by: Lin, Zhi-Yi, et al.
Published: (2024)
Large Spatial Model: End-to-end Unposed Images to Semantic 3D
by: Fan, Zhiwen, et al.
Published: (2024)
by: Fan, Zhiwen, et al.
Published: (2024)
End-to-end differentiable design of geometric waveguide displays
by: Yang, Xinge, et al.
Published: (2026)
by: Yang, Xinge, et al.
Published: (2026)
SGTR+: End-to-end Scene Graph Generation with Transformer
by: Li, Rongjie, et al.
Published: (2024)
by: Li, Rongjie, et al.
Published: (2024)
PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving
by: Chen, Zhili, et al.
Published: (2023)
by: Chen, Zhili, et al.
Published: (2023)
Similar Items
-
METDrive: Multi-modal End-to-end Autonomous Driving with Temporal Guidance
by: Guo, Ziang, et al.
Published: (2024) -
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
by: Nassar, Ahmed, et al.
Published: (2025) -
End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting
by: Wang, Yongqi, et al.
Published: (2024) -
Methodology to Deploy CNN-Based Computer Vision Models on Immersive Wearable Devices
by: Malek, Kaveh, et al.
Published: (2024) -
OneVision: An End-to-End Generative Framework for Multi-view E-commerce Vision Search
by: Zheng, Zexin, et al.
Published: (2025)