:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Yiyue, Zhang, Shaoting, Li, Kang, Lao, Qicheng
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition 68T45 I.2.10
Online Access:	https://arxiv.org/abs/2502.01201
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DiffYOLO: Object Detection for Anti-Noise via YOLO and Diffusion Models
by: Liu, Yichen, et al.
Published: (2024)

Unsupervised Anomaly Detection Using Diffusion Trend Analysis for Display Inspection
by: Kim, Eunwoo, et al.
Published: (2024)

Synthetic Industrial Object Detection: GenAI vs. Feature-Based Methods
by: Araya-Martinez, Jose Moises, et al.
Published: (2025)

UrbanAlign: Post-hoc Semantic Calibration for VLM-Human Preference Alignment
by: Zhang, Yecheng, et al.
Published: (2026)

Revisiting Energy-Based Model for Out-of-Distribution Detection
by: Wu, Yifan, et al.
Published: (2024)

SurgVLM: A Large Vision-Language Model and Systematic Evaluation Benchmark for Surgical Intelligence
by: Zeng, Zhitao, et al.
Published: (2025)

Semantic Prioritization in Visual Counterfactual Explanations with Weighted Segmentation and Auto-Adaptive Region Selection
by: Zhang, Lintong, et al.
Published: (2025)

Learning Association via Track-Detection Matching for Multi-Object Tracking
by: Adžemović, Momir
Published: (2025)

ClustViT: Clustering-based Token Merging for Semantic Segmentation
by: Montello, Fabio, et al.
Published: (2025)

Zero-Shot Multi-Criteria Visual Quality Inspection for Semi-Controlled Industrial Environments via Real-Time 3D Digital Twin Simulation
by: Araya-Martinez, Jose Moises, et al.
Published: (2025)

Multi-scale Temporal Prediction via Incremental Generation and Multi-agent Collaboration
by: Zeng, Zhitao, et al.
Published: (2025)

Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think
by: Tian, Jie, et al.
Published: (2025)

Deep Learning Approaches for Human Action Recognition in Video Data
by: Xie, Yufei
Published: (2024)

LatentForensics: Towards frugal deepfake detection in the StyleGAN latent space
by: Delmas, Matthieu, et al.
Published: (2023)

PlaneSAM: Multimodal Plane Instance Segmentation Using the Segment Anything Model
by: Deng, Zhongchen, et al.
Published: (2024)

SynthRender and IRIS: Open-Source Framework and Dataset for Bidirectional Sim-Real Transfer in Industrial Object Perception
by: Araya-Martinez, Jose Moises, et al.
Published: (2026)

Video-CoE: Reinforcing Video Event Prediction via Chain of Events
by: Su, Qile, et al.
Published: (2026)

VA-$π$: Variational Policy Alignment for Pixel-Aware Autoregressive Generation
by: Liao, Xinyao, et al.
Published: (2025)

Parking Space Detection in the City of Granada
by: Luis, Crespo-Orti, et al.
Published: (2025)

DOD-SA: Infrared-Visible Decoupled Object Detection with Single-Modality Annotations
by: Jin, Hang, et al.
Published: (2025)

Learning Discriminative Spatio-temporal Representations for Semi-supervised Action Recognition
by: Wang, Yu, et al.
Published: (2024)

Smelly, dense, and spreaded: The Object Detection for Olfactory References (ODOR) dataset
by: Zinnen, Mathias, et al.
Published: (2025)

When Less is Enough: Adaptive Token Reduction for Efficient Image Representation
by: Allakhverdov, Eduard, et al.
Published: (2025)

Image Reconstruction as a Tool for Feature Analysis
by: Allakhverdov, Eduard, et al.
Published: (2025)

Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
by: Li, Jinhao, et al.
Published: (2024)

ARTPS: Depth-Enhanced Hybrid Anomaly Detection and Learnable Curiosity Score for Autonomous Rover Target Prioritization
by: Baydemir, Poyraz
Published: (2025)

TRACES: Temporal Recall with Contextual Embeddings for Real-Time Video Anomaly Detection
by: Siddiqui, Yousuf Ahmed, et al.
Published: (2025)

Hierarchical Point-Patch Fusion with Adaptive Patch Codebook for 3D Shape Anomaly Detection
by: Kang, Xueyang, et al.
Published: (2026)

Sequence Matters: Harnessing Video Models in 3D Super-Resolution
by: Ko, Hyun-kyu, et al.
Published: (2024)

Attend, Distill, Detect: Attention-aware Entropy Distillation for Anomaly Detection
by: Jena, Sushovan, et al.
Published: (2024)

A Hierarchically Feature Reconstructed Autoencoder for Unsupervised Anomaly Detection
by: Chen, Honghui, et al.
Published: (2024)

From Gaze to Insight: Bridging Human Visual Attention and Vision Language Model Explanation for Weakly-Supervised Medical Image Segmentation
by: Chen, Jingkun, et al.
Published: (2025)

On the Inherent Robustness of One-Stage Object Detection against Out-of-Distribution Data
by: Martinez-Seras, Aitor, et al.
Published: (2024)

DSER: Spectral Epipolar Representation for Efficient Light Field Depth Estimation
by: Mohammad, Noor Islam S., et al.
Published: (2025)

Hierarchical Spatial Algorithms for High-Resolution Image Quantization and Feature Extraction
by: Mohammad, Noor Islam S.
Published: (2025)

VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models
by: Su, Yuetong, et al.
Published: (2025)

A Survey on Dynamic Neural Networks: from Computer Vision to Multi-modal Sensor Fusion
by: Montello, Fabio, et al.
Published: (2025)

VDPP: Video Depth Post-Processing for Speed and Scalability
by: Yoon, Daewon, et al.
Published: (2026)

Do Generative Metrics Predict YOLO Performance? An Evaluation Across Models, Augmentation Ratios, and Dataset Complexity
by: Marian, Vasile, et al.
Published: (2026)

Quantifying and Narrowing the Unknown: Interactive Text-to-Video Retrieval via Uncertainty Minimization
by: Zhang, Bingqing, et al.
Published: (2025)