:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Trein, Tobias, Garcia, Luan Fonseca
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence I.2.10
Online Access:	https://arxiv.org/abs/2501.02112
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Next-Generation License Plate Detection and Recognition System using YOLOv8
by: Amin, Arslan, et al.
Published: (2025)

3D Adaptive Structural Convolution Network for Domain-Invariant Point Cloud Recognition
by: Kim, Younggun, et al.
Published: (2024)

LightFFDNets: Lightweight Convolutional Neural Networks for Rapid Facial Forgery Detection
by: Jabbarlı, Günel, et al.
Published: (2024)

Enhancing Long-Term Re-Identification Robustness Using Synthetic Data: A Comparative Analysis
by: Pionzewski, Christian, et al.
Published: (2025)

treeX: Unsupervised Tree Instance Segmentation in Dense Forest Point Clouds
by: Burmeister, Josafat-Mattias, et al.
Published: (2025)

Sora as a World Model? A Complete Survey on Text-to-Video Generation
by: Puspitasari, Fachrina Dewi, et al.
Published: (2024)

From Prompt to Production:Automating Brand-Safe Marketing Imagery with Text-to-Image Models
by: Atighehchian, Parmida, et al.
Published: (2026)

MFTF: Mask-free Training-free Object Level Layout Control Diffusion Model
by: Yang, Shan
Published: (2024)

ProtoFlow: Interpretable and Robust Surgical Workflow Modeling with Learned Dynamic Scene Graph Prototypes
by: Holm, Felix, et al.
Published: (2025)

Automated User Identification from Facial Thermograms with Siamese Networks
by: Prozorova, Elizaveta, et al.
Published: (2025)

3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Understanding
by: Chen, Yiping, et al.
Published: (2026)

U-Net-Like Spiking Neural Networks for Single Image Dehazing
by: Li, Huibin, et al.
Published: (2025)

Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
by: Eymaël, Alexandre, et al.
Published: (2024)

Deep Learning methodology for the identification of wood species using high-resolution macroscopic images
by: Herrera-Poyatos, David, et al.
Published: (2024)

Explainable AI for Analyzing Person-Specific Patterns in Facial Recognition Tasks
by: Borsukiewicz, Paweł Jakub, et al.
Published: (2025)

Towards Global Localization using Multi-Modal Object-Instance Re-Identification
by: Chavan, Aneesh, et al.
Published: (2024)

Unsupervised Decomposition and Recombination with Discriminator-Driven Diffusion Models
by: Wang, Archer, et al.
Published: (2026)

Multi-modal Loop Closure Detection with Foundation Models in Severely Unstructured Environments
by: Gonzalez, Laura Alejandra Encinar, et al.
Published: (2025)

Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language Models
by: Bu, Weijue, et al.
Published: (2025)

Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking
by: Khurdula, Harsha Vardhan, et al.
Published: (2024)

Demo-Pose: Depth-Monocular Modality Fusion For Object Pose Estimation
by: Agarwal, Rachit, et al.
Published: (2026)

SITUATE -- Synthetic Object Counting Dataset for VLM training
by: Peinl, René, et al.
Published: (2026)

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views
by: Chen, Zhangquan, et al.
Published: (2025)

Image Segmentation and Classification of E-waste for Training Robots for Waste Segregation
by: Tripathi, Prakriti
Published: (2025)

Evaluation of Environmental Conditions on Object Detection using Oriented Bounding Boxes for AR Applications
by: Li, Vladislav, et al.
Published: (2023)

Appearance-based gaze estimation enhanced with synthetic images using deep neural networks
by: Herashchenko, Dmytro, et al.
Published: (2023)

Attentive VQ-VAE
by: Hoyos, Angello, et al.
Published: (2023)

TexTailor: Customized Text-aligned Texturing via Effective Resampling
by: Lee, Suin, et al.
Published: (2025)

SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
by: Chen, Zhangquan, et al.
Published: (2025)

OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention
by: Chen, Zhangquan, et al.
Published: (2026)

CLIP Embeddings for AI-Generated Image Detection: A Few-Shot Study with Lightweight Classifier
by: Ou, Ziyang
Published: (2025)

Rethinking Multimodal Point Cloud Completion: A Completion-by-Correction Perspective
by: Luo, Wang, et al.
Published: (2025)

CoMViT: An Efficient Vision Backbone for Supervised Classification in Medical Imaging
by: Safdar, Aon, et al.
Published: (2025)

VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding
by: He, Jianxiang, et al.
Published: (2025)

Instruction-based Image Editing with Planning, Reasoning, and Generation
by: Ji, Liya, et al.
Published: (2026)

Fre-Res: Frequency-Residual Video Token Compression for Efficient Video MLLMs
by: Feng, Yigui, et al.
Published: (2026)

Robust Visual Question Answering: Datasets, Methods, and Future Challenges
by: Ma, Jie, et al.
Published: (2023)

Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization
by: Liu, Yisu, et al.
Published: (2024)

Unified Auto-Encoding with Masked Diffusion
by: Hansen-Estruch, Philippe, et al.
Published: (2024)

A Lightweight Multi-Module Fusion Approach for Korean Character Recognition
by: Park, Inho Jake, et al.
Published: (2025)