:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Manghotay, Reyhaneh Ahani, Liang, Jie
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning I.2.10; I.4.8; I.2.6
Online Access:	https://arxiv.org/abs/2604.01118
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Neuromorphic Monocular Depth Estimation with Uncertainty Modeling
by: Bergkvist, Viktor, et al.
Published: (2026)

Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation
by: Hou, Zhangcheng, et al.
Published: (2026)

CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)

Smooth regularization for efficient video recognition
by: Goldman, Gil, et al.
Published: (2025)

Optimal Transport-Guided Source-Free Adaptation for Face Anti-Spoofing
by: Li, Zhuowei, et al.
Published: (2025)

Decoupling Vision and Language: Codebook Anchored Visual Adaptation
by: Wu, Jason, et al.
Published: (2026)

Caption-Driven Explainability: Probing CNNs for Bias via CLIP
by: Koller, Patrick, et al.
Published: (2025)

FeedbackSTS-Det: Sparse Frames-Based Spatio-Temporal Semantic Feedback Network for Moving Infrared Small Target Detection
by: Huang, Yian, et al.
Published: (2026)

SERA-H: Beyond Native Sentinel Spatial Limits for High-Resolution Canopy Height Mapping
by: Boudras, Thomas, et al.
Published: (2025)

MaSC: A Masked Similarity Metric for Evaluating Concept-Driven Generation
by: Bartkowiak, Patryk, et al.
Published: (2026)

Implementing Adaptations for Vision AutoRegressive Model
by: Shaikh, Kaif, et al.
Published: (2025)

OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models
by: Koroglu, Mathis, et al.
Published: (2024)

Perceptual Flow Network for Visually Grounded Reasoning
by: Li, Yangfu, et al.
Published: (2026)

Butter: Frequency Consistency and Hierarchical Fusion for Autonomous Driving Object Detection
by: Lin, Xiaojian, et al.
Published: (2025)

WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents
by: Liu, Bingnan, et al.
Published: (2026)

UnCageNet: Tracking and Pose Estimation of Caged Animal
by: Dutta, Sayak, et al.
Published: (2025)

Efficient Attention: Attention with Linear Complexities
by: Shen, Zhuoran, et al.
Published: (2018)

Learning 3D object-centric representation through prediction
by: Day, John, et al.
Published: (2024)

Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention
by: Durrani, Hamza Ahmed, et al.
Published: (2026)

Hierarchical Image-Guided 3D Point Cloud Segmentation in Industrial Scenes via Multi-View Bayesian Fusion
by: Zhu, Yu, et al.
Published: (2025)

Object detection in adverse weather conditions for autonomous vehicles using Instruct Pix2Pix
by: Gurbindo, Unai, et al.
Published: (2025)

Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey
by: Rajapaksha, Uchitha, et al.
Published: (2024)

An Analysis of Layer-Freezing Strategies for Enhanced Transfer Learning in YOLO Architectures
by: Dobrzycki, Andrzej D., et al.
Published: (2025)

In Context Learning with Vision Transformers: Case Study
by: Zhao, Antony, et al.
Published: (2025)

Car Object Counting and Position Estimation via Extension of the CLIP-EBC Framework
by: Jung, Seoik, et al.
Published: (2025)

GLoT: A Novel Gated-Logarithmic Transformer for Efficient Sign Language Translation
by: Shahin, Nada, et al.
Published: (2025)

SpectralCA: Bi-Directional Cross-Attention for Next-Generation UAV Hyperspectral Vision
by: Brovko, D. V.
Published: (2025)

How Can One Choose the Best CAM-Based Explainability Method for a CNN Model?
by: Costa, Daniel da Silva, et al.
Published: (2026)

SPMamba-YOLO: An Underwater Object Detection Network Based on Multi-Scale Feature Enhancement and Global Context Modeling
by: Liao, Guanghao, et al.
Published: (2026)

THIRDEYE: Cue-Aware Monocular Depth Estimation via Brain-Inspired Multi-Stage Fusion
by: Ioan, Calin Teodor
Published: (2025)

Do Generative Metrics Predict YOLO Performance? An Evaluation Across Models, Augmentation Ratios, and Dataset Complexity
by: Marian, Vasile, et al.
Published: (2026)

Joint Learning of Depth, Pose, and Local Radiance Field for Large Scale Monocular 3D Reconstruction
by: Syed, Shahram Najam, et al.
Published: (2025)

Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
by: Arnaud, Sergio, et al.
Published: (2025)

Motion-Guided Semantic Alignment with Negative Prompts for Zero-Shot Video Action Recognition
by: Wang, Yiming, et al.
Published: (2026)

Spiking Neural Networks for event-based action recognition: A new task to understand their advantage
by: Vicente-Sola, Alex, et al.
Published: (2022)

ADAT: Time-Series-Aware Adaptive Transformer Architecture for Sign Language Translation
by: Shahin, Nada, et al.
Published: (2025)

Predictive Modeling of Maritime Radar Data Using Transformer Architecture
by: Qesaraku, Bjorna, et al.
Published: (2025)

Enhancing Spatial Reasoning in Vision-Language Models via Chain-of-Thought Prompting and Reinforcement Learning
by: Ji, Binbin, et al.
Published: (2025)

Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labeling
by: Jung, Seoik, et al.
Published: (2025)

Single-Shot Metric Depth from Focused Plenoptic Cameras
by: Lasheras-Hernandez, Blanca, et al.
Published: (2024)