Saved in:
| Main Authors: | Manghotay, Reyhaneh Ahani, Liang, Jie |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.01118 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Neuromorphic Monocular Depth Estimation with Uncertainty Modeling
by: Bergkvist, Viktor, et al.
Published: (2026)
by: Bergkvist, Viktor, et al.
Published: (2026)
Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation
by: Hou, Zhangcheng, et al.
Published: (2026)
by: Hou, Zhangcheng, et al.
Published: (2026)
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)
by: Raoufi, Behnam, et al.
Published: (2025)
Smooth regularization for efficient video recognition
by: Goldman, Gil, et al.
Published: (2025)
by: Goldman, Gil, et al.
Published: (2025)
Optimal Transport-Guided Source-Free Adaptation for Face Anti-Spoofing
by: Li, Zhuowei, et al.
Published: (2025)
by: Li, Zhuowei, et al.
Published: (2025)
Decoupling Vision and Language: Codebook Anchored Visual Adaptation
by: Wu, Jason, et al.
Published: (2026)
by: Wu, Jason, et al.
Published: (2026)
Caption-Driven Explainability: Probing CNNs for Bias via CLIP
by: Koller, Patrick, et al.
Published: (2025)
by: Koller, Patrick, et al.
Published: (2025)
FeedbackSTS-Det: Sparse Frames-Based Spatio-Temporal Semantic Feedback Network for Moving Infrared Small Target Detection
by: Huang, Yian, et al.
Published: (2026)
by: Huang, Yian, et al.
Published: (2026)
SERA-H: Beyond Native Sentinel Spatial Limits for High-Resolution Canopy Height Mapping
by: Boudras, Thomas, et al.
Published: (2025)
by: Boudras, Thomas, et al.
Published: (2025)
MaSC: A Masked Similarity Metric for Evaluating Concept-Driven Generation
by: Bartkowiak, Patryk, et al.
Published: (2026)
by: Bartkowiak, Patryk, et al.
Published: (2026)
Implementing Adaptations for Vision AutoRegressive Model
by: Shaikh, Kaif, et al.
Published: (2025)
by: Shaikh, Kaif, et al.
Published: (2025)
OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models
by: Koroglu, Mathis, et al.
Published: (2024)
by: Koroglu, Mathis, et al.
Published: (2024)
Perceptual Flow Network for Visually Grounded Reasoning
by: Li, Yangfu, et al.
Published: (2026)
by: Li, Yangfu, et al.
Published: (2026)
Butter: Frequency Consistency and Hierarchical Fusion for Autonomous Driving Object Detection
by: Lin, Xiaojian, et al.
Published: (2025)
by: Lin, Xiaojian, et al.
Published: (2025)
WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents
by: Liu, Bingnan, et al.
Published: (2026)
by: Liu, Bingnan, et al.
Published: (2026)
UnCageNet: Tracking and Pose Estimation of Caged Animal
by: Dutta, Sayak, et al.
Published: (2025)
by: Dutta, Sayak, et al.
Published: (2025)
Efficient Attention: Attention with Linear Complexities
by: Shen, Zhuoran, et al.
Published: (2018)
by: Shen, Zhuoran, et al.
Published: (2018)
Learning 3D object-centric representation through prediction
by: Day, John, et al.
Published: (2024)
by: Day, John, et al.
Published: (2024)
Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention
by: Durrani, Hamza Ahmed, et al.
Published: (2026)
by: Durrani, Hamza Ahmed, et al.
Published: (2026)
Hierarchical Image-Guided 3D Point Cloud Segmentation in Industrial Scenes via Multi-View Bayesian Fusion
by: Zhu, Yu, et al.
Published: (2025)
by: Zhu, Yu, et al.
Published: (2025)
Object detection in adverse weather conditions for autonomous vehicles using Instruct Pix2Pix
by: Gurbindo, Unai, et al.
Published: (2025)
by: Gurbindo, Unai, et al.
Published: (2025)
Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey
by: Rajapaksha, Uchitha, et al.
Published: (2024)
by: Rajapaksha, Uchitha, et al.
Published: (2024)
An Analysis of Layer-Freezing Strategies for Enhanced Transfer Learning in YOLO Architectures
by: Dobrzycki, Andrzej D., et al.
Published: (2025)
by: Dobrzycki, Andrzej D., et al.
Published: (2025)
In Context Learning with Vision Transformers: Case Study
by: Zhao, Antony, et al.
Published: (2025)
by: Zhao, Antony, et al.
Published: (2025)
Car Object Counting and Position Estimation via Extension of the CLIP-EBC Framework
by: Jung, Seoik, et al.
Published: (2025)
by: Jung, Seoik, et al.
Published: (2025)
GLoT: A Novel Gated-Logarithmic Transformer for Efficient Sign Language Translation
by: Shahin, Nada, et al.
Published: (2025)
by: Shahin, Nada, et al.
Published: (2025)
SpectralCA: Bi-Directional Cross-Attention for Next-Generation UAV Hyperspectral Vision
by: Brovko, D. V.
Published: (2025)
by: Brovko, D. V.
Published: (2025)
How Can One Choose the Best CAM-Based Explainability Method for a CNN Model?
by: Costa, Daniel da Silva, et al.
Published: (2026)
by: Costa, Daniel da Silva, et al.
Published: (2026)
SPMamba-YOLO: An Underwater Object Detection Network Based on Multi-Scale Feature Enhancement and Global Context Modeling
by: Liao, Guanghao, et al.
Published: (2026)
by: Liao, Guanghao, et al.
Published: (2026)
THIRDEYE: Cue-Aware Monocular Depth Estimation via Brain-Inspired Multi-Stage Fusion
by: Ioan, Calin Teodor
Published: (2025)
by: Ioan, Calin Teodor
Published: (2025)
Do Generative Metrics Predict YOLO Performance? An Evaluation Across Models, Augmentation Ratios, and Dataset Complexity
by: Marian, Vasile, et al.
Published: (2026)
by: Marian, Vasile, et al.
Published: (2026)
Joint Learning of Depth, Pose, and Local Radiance Field for Large Scale Monocular 3D Reconstruction
by: Syed, Shahram Najam, et al.
Published: (2025)
by: Syed, Shahram Najam, et al.
Published: (2025)
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
by: Arnaud, Sergio, et al.
Published: (2025)
by: Arnaud, Sergio, et al.
Published: (2025)
Motion-Guided Semantic Alignment with Negative Prompts for Zero-Shot Video Action Recognition
by: Wang, Yiming, et al.
Published: (2026)
by: Wang, Yiming, et al.
Published: (2026)
Spiking Neural Networks for event-based action recognition: A new task to understand their advantage
by: Vicente-Sola, Alex, et al.
Published: (2022)
by: Vicente-Sola, Alex, et al.
Published: (2022)
ADAT: Time-Series-Aware Adaptive Transformer Architecture for Sign Language Translation
by: Shahin, Nada, et al.
Published: (2025)
by: Shahin, Nada, et al.
Published: (2025)
Predictive Modeling of Maritime Radar Data Using Transformer Architecture
by: Qesaraku, Bjorna, et al.
Published: (2025)
by: Qesaraku, Bjorna, et al.
Published: (2025)
Enhancing Spatial Reasoning in Vision-Language Models via Chain-of-Thought Prompting and Reinforcement Learning
by: Ji, Binbin, et al.
Published: (2025)
by: Ji, Binbin, et al.
Published: (2025)
Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labeling
by: Jung, Seoik, et al.
Published: (2025)
by: Jung, Seoik, et al.
Published: (2025)
Single-Shot Metric Depth from Focused Plenoptic Cameras
by: Lasheras-Hernandez, Blanca, et al.
Published: (2024)
by: Lasheras-Hernandez, Blanca, et al.
Published: (2024)
Similar Items
-
Neuromorphic Monocular Depth Estimation with Uncertainty Modeling
by: Bergkvist, Viktor, et al.
Published: (2026) -
Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation
by: Hou, Zhangcheng, et al.
Published: (2026) -
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025) -
Smooth regularization for efficient video recognition
by: Goldman, Gil, et al.
Published: (2025) -
Optimal Transport-Guided Source-Free Adaptation for Face Anti-Spoofing
by: Li, Zhuowei, et al.
Published: (2025)