:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jamali, Mahtab, Davidsson, Paul, Khoshkangini, Reza, Ljungqvist, Martin Georg, Mihailescu, Radu-Casian
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition I.2.10; I.4.8
Online Access:	https://arxiv.org/abs/2503.23249
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)

SelvaBox: A high-resolution dataset for tropical tree crown detection
by: Baudchon, Hugo, et al.
Published: (2025)

From eye to AI: studying rodent social behavior in the era of machine Learning
by: Chindemi, Giuseppe, et al.
Published: (2025)

SPMamba-YOLO: An Underwater Object Detection Network Based on Multi-Scale Feature Enhancement and Global Context Modeling
by: Liao, Guanghao, et al.
Published: (2026)

Optimizing the image correction pipeline for pedestrian detection in the thermal-infrared domain
by: Karam, Christophe, et al.
Published: (2024)

Mask-Conditioned Voxel Diffusion for Joint Geometry and Color Inpainting
by: Sumuk, Aarya
Published: (2026)

Pedestrian Detection in Low-Light Conditions: A Comprehensive Survey
by: Ghari, Bahareh, et al.
Published: (2024)

PhysVideoGenerator: Towards Physically Aware Video Generation via Latent Physics Guidance
by: Satish, Siddarth Nilol Kundur, et al.
Published: (2026)

FlowIBR: Leveraging Pre-Training for Efficient Neural Image-Based Rendering of Dynamic Scenes
by: Büsching, Marcel, et al.
Published: (2023)

IMASHRIMP: Automatic White Shrimp (Penaeus vannamei) Biometrical Analysis from Laboratory Images Using Computer Vision and Deep Learning
by: González, Abiam Remache, et al.
Published: (2025)

OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance
by: Wang, Chaoyi, et al.
Published: (2025)

NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models
by: Lee, Kyuho, et al.
Published: (2025)

Reducing Object Hallucination in LVLMs via Emphasizing Image-negative Tokens
by: Shen, Meng, et al.
Published: (2026)

Towards a Generalizable Fusion Architecture for Multimodal Object Detection
by: Berjawi, Jad, et al.
Published: (2025)

A Simple Baseline for Streaming Video Understanding
by: Shen, Yujiao, et al.
Published: (2026)

EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis
by: Guo, Yijie, et al.
Published: (2025)

SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model
by: Li, Xinqing, et al.
Published: (2025)

SCA-Net: Spatial-Contextual Aggregation Network for Enhanced Small Building and Road Change Detection
by: Gholibeigi, Emad, et al.
Published: (2026)

Synthetic-Child: An AIGC-Based Synthetic Data Pipeline for Privacy-Preserving Child Posture Estimation
by: Zeng, Taowen
Published: (2026)

Exploring Surround-View Fisheye Camera 3D Object Detection
by: Li, Changcai, et al.
Published: (2025)

A Vision-Language Model for Focal Liver Lesion Classification
by: Jian, Song, et al.
Published: (2025)

Car Object Counting and Position Estimation via Extension of the CLIP-EBC Framework
by: Jung, Seoik, et al.
Published: (2025)

Pixel-Level Pavement Distress Assessment Using Instance Segmentation
by: Dewick, Logan, et al.
Published: (2026)

Vi-SAFE: A Spatial-Temporal Framework for Efficient Violence Detection in Public Surveillance
by: Chang, Ligang, et al.
Published: (2025)

A Reverse Causal Framework to Mitigate Spurious Correlations for Debiasing Scene Graph Generation
by: Sun, Shuzhou, et al.
Published: (2025)

Action Anticipation from SoccerNet Football Video Broadcasts
by: Dalal, Mohamad, et al.
Published: (2025)

A Recipe for Geometry-Aware 3D Mesh Transformers
by: Farazi, Mohammad, et al.
Published: (2024)

Application of YOLOv8 in monocular downward multiple Car Target detection
by: Lyu, Shijie
Published: (2025)

Motion-Guided Semantic Alignment with Negative Prompts for Zero-Shot Video Action Recognition
by: Wang, Yiming, et al.
Published: (2026)

UnCageNet: Tracking and Pose Estimation of Caged Animal
by: Dutta, Sayak, et al.
Published: (2025)

A Light Perspective for 3D Object Detection
by: Pederiva, Marcelo Eduardo, et al.
Published: (2025)

Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos
by: Li, Yayuan, et al.
Published: (2025)

Combining Absolute and Semi-Generalized Relative Poses for Visual Localization
by: Panek, Vojtech, et al.
Published: (2024)

A Guide to Structureless Visual Localization
by: Panek, Vojtech, et al.
Published: (2025)

DeltaVLM: Interactive Remote Sensing Image Change Analysis via Instruction-guided Difference Perception
by: Deng, Pei, et al.
Published: (2025)

Facial Attribute Based Text Guided Face Anonymization
by: Muştu, Mustafa İzzet, et al.
Published: (2025)

Hierarchical Image-Guided 3D Point Cloud Segmentation in Industrial Scenes via Multi-View Bayesian Fusion
by: Zhu, Yu, et al.
Published: (2025)

Image-Based Leopard Seal Recognition: Approaches and Challenges in Current Automated Systems
by: Salazar, Jorge Yero, et al.
Published: (2024)

Dense Motion Captioning
by: Xu, Shiyao, et al.
Published: (2025)

TD3Net: A temporal densely connected multi-dilated convolutional network for lipreading
by: Lee, Byung Hoon, et al.
Published: (2025)