:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Imran, Muhammad, Lee, Chi, Lee, Yugyung
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Computer Vision and Pattern Recognition Artificial Intelligence 68T45, 68U10, 92C55 I.2.10; I.4.8; H.2.8; J.3
Online-Zugang:	https://arxiv.org/abs/2601.11666
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Prompt to Polyp: Medical Text-Conditioned Image Synthesis with Diffusion Models
von: Chaichuk, Mikhail, et al.
Veröffentlicht: (2025)

Learning Continuous Receive Apodization Weights via Implicit Neural Representation for Ultrafast ICE Ultrasound Imaging
von: Delaunay, Rémi, et al.
Veröffentlicht: (2025)

Modulated INR with Prior Embeddings for Ultrasound Imaging Reconstruction
von: Delaunay, Rémi, et al.
Veröffentlicht: (2025)

DSER: Spectral Epipolar Representation for Efficient Light Field Depth Estimation
von: Mohammad, Noor Islam S., et al.
Veröffentlicht: (2025)

Hierarchical Spatial Algorithms for High-Resolution Image Quantization and Feature Extraction
von: Mohammad, Noor Islam S.
Veröffentlicht: (2025)

Technical Report: Automated Optical Inspection of Surgical Instruments
von: Shafqat, Zunaira, et al.
Veröffentlicht: (2026)

A large-scale, physically-based synthetic dataset for satellite pose estimation
von: Velkei, Szabolcs, et al.
Veröffentlicht: (2025)

ShapBPT: Image Feature Attributions Using Data-Aware Binary Partition Trees
von: Rashid, Muhammad, et al.
Veröffentlicht: (2026)

OpenFusion++: An Open-vocabulary Real-time Scene Understanding System
von: Jin, Xiaofeng, et al.
Veröffentlicht: (2025)

Real Time Human Detection by Unmanned Aerial Vehicles
von: Guettala, Walid, et al.
Veröffentlicht: (2024)

From Gaze to Insight: Bridging Human Visual Attention and Vision Language Model Explanation for Weakly-Supervised Medical Image Segmentation
von: Chen, Jingkun, et al.
Veröffentlicht: (2025)

Leonardo vindicated: Pythagorean trees for minimal reconstruction of the natural branching structures
von: Ruta, Dymitr, et al.
Veröffentlicht: (2024)

UVLM: A Universal Vision-Language Model Loader for Reproducible Multimodal Benchmarking
von: Perez, Joan, et al.
Veröffentlicht: (2026)

Sat-JEPA-Diff: Bridging Self-Supervised Learning and Generative Diffusion for Remote Sensing
von: Komurcu, Kursat, et al.
Veröffentlicht: (2026)

WildfireVLM: AI-powered Analysis for Early Wildfire Detection and Risk Assessment Using Satellite Imagery
von: Ayanzadeh, Aydin, et al.
Veröffentlicht: (2026)

Splat and Distill: Augmenting Teachers with Feed-Forward 3D Reconstruction For 3D-Aware Distillation
von: Shavin, David, et al.
Veröffentlicht: (2026)

Deep Learning in Automated Power Line Inspection: A Review
von: Faisal, Md. Ahasan Atick, et al.
Veröffentlicht: (2025)

Can Local Vision-Language Models improve Activity Recognition over Vision Transformers? -- Case Study on Newborn Resuscitation
von: Guerriero, Enrico, et al.
Veröffentlicht: (2026)

Dual-sensing driving detection model
von: K, Leon C. C., et al.
Veröffentlicht: (2025)

ARTPS: Depth-Enhanced Hybrid Anomaly Detection and Learnable Curiosity Score for Autonomous Rover Target Prioritization
von: Baydemir, Poyraz
Veröffentlicht: (2025)

Beyond RGB: Leveraging Vision Transformers for Thermal Weapon Segmentation
von: Kambhatla, Akhila, et al.
Veröffentlicht: (2025)

Corn Ear Detection and Orientation Estimation Using Deep Learning
von: Sprague, Nathan, et al.
Veröffentlicht: (2024)

Gr-IoU: Ground-Intersection over Union for Robust Multi-Object Tracking with 3D Geometric Constraints
von: Toida, Keisuke, et al.
Veröffentlicht: (2024)

TRACES: Temporal Recall with Contextual Embeddings for Real-Time Video Anomaly Detection
von: Siddiqui, Yousuf Ahmed, et al.
Veröffentlicht: (2025)

DeepShade: Enable Shade Simulation by Text-conditioned Image Generation
von: Da, Longchao, et al.
Veröffentlicht: (2025)

Image-based Facial Rig Inversion
von: Yang, Tianxiang, et al.
Veröffentlicht: (2025)

Point, Detect, Count: Multi-Task Medical Image Understanding with Instruction-Tuned Vision-Language Models
von: Gautam, Sushant, et al.
Veröffentlicht: (2025)

TACIT Benchmark: A Programmatic Visual Reasoning Benchmark for Generative and Discriminative Models
von: Medeiros, Daniel Nobrega
Veröffentlicht: (2026)

CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
von: Raoufi, Behnam, et al.
Veröffentlicht: (2025)

Smelly, dense, and spreaded: The Object Detection for Olfactory References (ODOR) dataset
von: Zinnen, Mathias, et al.
Veröffentlicht: (2025)

YOLO Ensemble for UAV-based Multispectral Defect Detection in Wind Turbine Components
von: Svystun, Serhii, et al.
Veröffentlicht: (2025)

Transforming faces into video stories -- VideoFace2.0
von: Brkljač, Branko, et al.
Veröffentlicht: (2025)

VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models
von: Su, Yuetong, et al.
Veröffentlicht: (2025)

Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labeling
von: Jung, Seoik, et al.
Veröffentlicht: (2025)

VDPP: Video Depth Post-Processing for Speed and Scalability
von: Yoon, Daewon, et al.
Veröffentlicht: (2026)

Do Generative Metrics Predict YOLO Performance? An Evaluation Across Models, Augmentation Ratios, and Dataset Complexity
von: Marian, Vasile, et al.
Veröffentlicht: (2026)

Shaded Route Planning Using Active Segmentation and Identification of Satellite Images
von: Da, Longchao, et al.
Veröffentlicht: (2024)

ShadeBench: A Benchmark Dataset for Building Shade Simulation in Sustainable Society
von: Da, Longchao, et al.
Veröffentlicht: (2026)

Exploring the Capabilities of Large Language Model Encoders for Image-Text Retrieval in Chest X-rays
von: Ko, Hanbin, et al.
Veröffentlicht: (2025)

LRCP: Low-Rank Compressibility Guided Visual Token Pruning for Efficient LVLMs
von: Lu, Hongyu, et al.
Veröffentlicht: (2026)