Saved in:
| Main Authors: | Trein, Tobias, Garcia, Luan Fonseca |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.02112 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Next-Generation License Plate Detection and Recognition System using YOLOv8
by: Amin, Arslan, et al.
Published: (2025)
by: Amin, Arslan, et al.
Published: (2025)
3D Adaptive Structural Convolution Network for Domain-Invariant Point Cloud Recognition
by: Kim, Younggun, et al.
Published: (2024)
by: Kim, Younggun, et al.
Published: (2024)
LightFFDNets: Lightweight Convolutional Neural Networks for Rapid Facial Forgery Detection
by: Jabbarlı, Günel, et al.
Published: (2024)
by: Jabbarlı, Günel, et al.
Published: (2024)
Enhancing Long-Term Re-Identification Robustness Using Synthetic Data: A Comparative Analysis
by: Pionzewski, Christian, et al.
Published: (2025)
by: Pionzewski, Christian, et al.
Published: (2025)
treeX: Unsupervised Tree Instance Segmentation in Dense Forest Point Clouds
by: Burmeister, Josafat-Mattias, et al.
Published: (2025)
by: Burmeister, Josafat-Mattias, et al.
Published: (2025)
Sora as a World Model? A Complete Survey on Text-to-Video Generation
by: Puspitasari, Fachrina Dewi, et al.
Published: (2024)
by: Puspitasari, Fachrina Dewi, et al.
Published: (2024)
From Prompt to Production:Automating Brand-Safe Marketing Imagery with Text-to-Image Models
by: Atighehchian, Parmida, et al.
Published: (2026)
by: Atighehchian, Parmida, et al.
Published: (2026)
MFTF: Mask-free Training-free Object Level Layout Control Diffusion Model
by: Yang, Shan
Published: (2024)
by: Yang, Shan
Published: (2024)
ProtoFlow: Interpretable and Robust Surgical Workflow Modeling with Learned Dynamic Scene Graph Prototypes
by: Holm, Felix, et al.
Published: (2025)
by: Holm, Felix, et al.
Published: (2025)
Automated User Identification from Facial Thermograms with Siamese Networks
by: Prozorova, Elizaveta, et al.
Published: (2025)
by: Prozorova, Elizaveta, et al.
Published: (2025)
3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Understanding
by: Chen, Yiping, et al.
Published: (2026)
by: Chen, Yiping, et al.
Published: (2026)
U-Net-Like Spiking Neural Networks for Single Image Dehazing
by: Li, Huibin, et al.
Published: (2025)
by: Li, Huibin, et al.
Published: (2025)
Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
by: Eymaël, Alexandre, et al.
Published: (2024)
by: Eymaël, Alexandre, et al.
Published: (2024)
Deep Learning methodology for the identification of wood species using high-resolution macroscopic images
by: Herrera-Poyatos, David, et al.
Published: (2024)
by: Herrera-Poyatos, David, et al.
Published: (2024)
Explainable AI for Analyzing Person-Specific Patterns in Facial Recognition Tasks
by: Borsukiewicz, Paweł Jakub, et al.
Published: (2025)
by: Borsukiewicz, Paweł Jakub, et al.
Published: (2025)
Towards Global Localization using Multi-Modal Object-Instance Re-Identification
by: Chavan, Aneesh, et al.
Published: (2024)
by: Chavan, Aneesh, et al.
Published: (2024)
Unsupervised Decomposition and Recombination with Discriminator-Driven Diffusion Models
by: Wang, Archer, et al.
Published: (2026)
by: Wang, Archer, et al.
Published: (2026)
Multi-modal Loop Closure Detection with Foundation Models in Severely Unstructured Environments
by: Gonzalez, Laura Alejandra Encinar, et al.
Published: (2025)
by: Gonzalez, Laura Alejandra Encinar, et al.
Published: (2025)
Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language Models
by: Bu, Weijue, et al.
Published: (2025)
by: Bu, Weijue, et al.
Published: (2025)
Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking
by: Khurdula, Harsha Vardhan, et al.
Published: (2024)
by: Khurdula, Harsha Vardhan, et al.
Published: (2024)
Demo-Pose: Depth-Monocular Modality Fusion For Object Pose Estimation
by: Agarwal, Rachit, et al.
Published: (2026)
by: Agarwal, Rachit, et al.
Published: (2026)
SITUATE -- Synthetic Object Counting Dataset for VLM training
by: Peinl, René, et al.
Published: (2026)
by: Peinl, René, et al.
Published: (2026)
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views
by: Chen, Zhangquan, et al.
Published: (2025)
by: Chen, Zhangquan, et al.
Published: (2025)
Image Segmentation and Classification of E-waste for Training Robots for Waste Segregation
by: Tripathi, Prakriti
Published: (2025)
by: Tripathi, Prakriti
Published: (2025)
Evaluation of Environmental Conditions on Object Detection using Oriented Bounding Boxes for AR Applications
by: Li, Vladislav, et al.
Published: (2023)
by: Li, Vladislav, et al.
Published: (2023)
Appearance-based gaze estimation enhanced with synthetic images using deep neural networks
by: Herashchenko, Dmytro, et al.
Published: (2023)
by: Herashchenko, Dmytro, et al.
Published: (2023)
Attentive VQ-VAE
by: Hoyos, Angello, et al.
Published: (2023)
by: Hoyos, Angello, et al.
Published: (2023)
TexTailor: Customized Text-aligned Texturing via Effective Resampling
by: Lee, Suin, et al.
Published: (2025)
by: Lee, Suin, et al.
Published: (2025)
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
by: Chen, Zhangquan, et al.
Published: (2025)
by: Chen, Zhangquan, et al.
Published: (2025)
OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention
by: Chen, Zhangquan, et al.
Published: (2026)
by: Chen, Zhangquan, et al.
Published: (2026)
CLIP Embeddings for AI-Generated Image Detection: A Few-Shot Study with Lightweight Classifier
by: Ou, Ziyang
Published: (2025)
by: Ou, Ziyang
Published: (2025)
Rethinking Multimodal Point Cloud Completion: A Completion-by-Correction Perspective
by: Luo, Wang, et al.
Published: (2025)
by: Luo, Wang, et al.
Published: (2025)
CoMViT: An Efficient Vision Backbone for Supervised Classification in Medical Imaging
by: Safdar, Aon, et al.
Published: (2025)
by: Safdar, Aon, et al.
Published: (2025)
VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding
by: He, Jianxiang, et al.
Published: (2025)
by: He, Jianxiang, et al.
Published: (2025)
Instruction-based Image Editing with Planning, Reasoning, and Generation
by: Ji, Liya, et al.
Published: (2026)
by: Ji, Liya, et al.
Published: (2026)
Fre-Res: Frequency-Residual Video Token Compression for Efficient Video MLLMs
by: Feng, Yigui, et al.
Published: (2026)
by: Feng, Yigui, et al.
Published: (2026)
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
by: Ma, Jie, et al.
Published: (2023)
by: Ma, Jie, et al.
Published: (2023)
Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization
by: Liu, Yisu, et al.
Published: (2024)
by: Liu, Yisu, et al.
Published: (2024)
Unified Auto-Encoding with Masked Diffusion
by: Hansen-Estruch, Philippe, et al.
Published: (2024)
by: Hansen-Estruch, Philippe, et al.
Published: (2024)
A Lightweight Multi-Module Fusion Approach for Korean Character Recognition
by: Park, Inho Jake, et al.
Published: (2025)
by: Park, Inho Jake, et al.
Published: (2025)
Similar Items
-
Next-Generation License Plate Detection and Recognition System using YOLOv8
by: Amin, Arslan, et al.
Published: (2025) -
3D Adaptive Structural Convolution Network for Domain-Invariant Point Cloud Recognition
by: Kim, Younggun, et al.
Published: (2024) -
LightFFDNets: Lightweight Convolutional Neural Networks for Rapid Facial Forgery Detection
by: Jabbarlı, Günel, et al.
Published: (2024) -
Enhancing Long-Term Re-Identification Robustness Using Synthetic Data: A Comparative Analysis
by: Pionzewski, Christian, et al.
Published: (2025) -
treeX: Unsupervised Tree Instance Segmentation in Dense Forest Point Clouds
by: Burmeister, Josafat-Mattias, et al.
Published: (2025)