Saved in:
| Main Authors: | Jamali, Mahtab, Davidsson, Paul, Khoshkangini, Reza, Ljungqvist, Martin Georg, Mihailescu, Radu-Casian |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.23249 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)
by: Raoufi, Behnam, et al.
Published: (2025)
SelvaBox: A high-resolution dataset for tropical tree crown detection
by: Baudchon, Hugo, et al.
Published: (2025)
by: Baudchon, Hugo, et al.
Published: (2025)
From eye to AI: studying rodent social behavior in the era of machine Learning
by: Chindemi, Giuseppe, et al.
Published: (2025)
by: Chindemi, Giuseppe, et al.
Published: (2025)
SPMamba-YOLO: An Underwater Object Detection Network Based on Multi-Scale Feature Enhancement and Global Context Modeling
by: Liao, Guanghao, et al.
Published: (2026)
by: Liao, Guanghao, et al.
Published: (2026)
Optimizing the image correction pipeline for pedestrian detection in the thermal-infrared domain
by: Karam, Christophe, et al.
Published: (2024)
by: Karam, Christophe, et al.
Published: (2024)
Mask-Conditioned Voxel Diffusion for Joint Geometry and Color Inpainting
by: Sumuk, Aarya
Published: (2026)
by: Sumuk, Aarya
Published: (2026)
Pedestrian Detection in Low-Light Conditions: A Comprehensive Survey
by: Ghari, Bahareh, et al.
Published: (2024)
by: Ghari, Bahareh, et al.
Published: (2024)
PhysVideoGenerator: Towards Physically Aware Video Generation via Latent Physics Guidance
by: Satish, Siddarth Nilol Kundur, et al.
Published: (2026)
by: Satish, Siddarth Nilol Kundur, et al.
Published: (2026)
FlowIBR: Leveraging Pre-Training for Efficient Neural Image-Based Rendering of Dynamic Scenes
by: Büsching, Marcel, et al.
Published: (2023)
by: Büsching, Marcel, et al.
Published: (2023)
IMASHRIMP: Automatic White Shrimp (Penaeus vannamei) Biometrical Analysis from Laboratory Images Using Computer Vision and Deep Learning
by: González, Abiam Remache, et al.
Published: (2025)
by: González, Abiam Remache, et al.
Published: (2025)
OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance
by: Wang, Chaoyi, et al.
Published: (2025)
by: Wang, Chaoyi, et al.
Published: (2025)
NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models
by: Lee, Kyuho, et al.
Published: (2025)
by: Lee, Kyuho, et al.
Published: (2025)
Reducing Object Hallucination in LVLMs via Emphasizing Image-negative Tokens
by: Shen, Meng, et al.
Published: (2026)
by: Shen, Meng, et al.
Published: (2026)
Towards a Generalizable Fusion Architecture for Multimodal Object Detection
by: Berjawi, Jad, et al.
Published: (2025)
by: Berjawi, Jad, et al.
Published: (2025)
A Simple Baseline for Streaming Video Understanding
by: Shen, Yujiao, et al.
Published: (2026)
by: Shen, Yujiao, et al.
Published: (2026)
EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis
by: Guo, Yijie, et al.
Published: (2025)
by: Guo, Yijie, et al.
Published: (2025)
SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model
by: Li, Xinqing, et al.
Published: (2025)
by: Li, Xinqing, et al.
Published: (2025)
SCA-Net: Spatial-Contextual Aggregation Network for Enhanced Small Building and Road Change Detection
by: Gholibeigi, Emad, et al.
Published: (2026)
by: Gholibeigi, Emad, et al.
Published: (2026)
Synthetic-Child: An AIGC-Based Synthetic Data Pipeline for Privacy-Preserving Child Posture Estimation
by: Zeng, Taowen
Published: (2026)
by: Zeng, Taowen
Published: (2026)
Exploring Surround-View Fisheye Camera 3D Object Detection
by: Li, Changcai, et al.
Published: (2025)
by: Li, Changcai, et al.
Published: (2025)
A Vision-Language Model for Focal Liver Lesion Classification
by: Jian, Song, et al.
Published: (2025)
by: Jian, Song, et al.
Published: (2025)
Car Object Counting and Position Estimation via Extension of the CLIP-EBC Framework
by: Jung, Seoik, et al.
Published: (2025)
by: Jung, Seoik, et al.
Published: (2025)
Pixel-Level Pavement Distress Assessment Using Instance Segmentation
by: Dewick, Logan, et al.
Published: (2026)
by: Dewick, Logan, et al.
Published: (2026)
Vi-SAFE: A Spatial-Temporal Framework for Efficient Violence Detection in Public Surveillance
by: Chang, Ligang, et al.
Published: (2025)
by: Chang, Ligang, et al.
Published: (2025)
A Reverse Causal Framework to Mitigate Spurious Correlations for Debiasing Scene Graph Generation
by: Sun, Shuzhou, et al.
Published: (2025)
by: Sun, Shuzhou, et al.
Published: (2025)
Action Anticipation from SoccerNet Football Video Broadcasts
by: Dalal, Mohamad, et al.
Published: (2025)
by: Dalal, Mohamad, et al.
Published: (2025)
A Recipe for Geometry-Aware 3D Mesh Transformers
by: Farazi, Mohammad, et al.
Published: (2024)
by: Farazi, Mohammad, et al.
Published: (2024)
Application of YOLOv8 in monocular downward multiple Car Target detection
by: Lyu, Shijie
Published: (2025)
by: Lyu, Shijie
Published: (2025)
Motion-Guided Semantic Alignment with Negative Prompts for Zero-Shot Video Action Recognition
by: Wang, Yiming, et al.
Published: (2026)
by: Wang, Yiming, et al.
Published: (2026)
UnCageNet: Tracking and Pose Estimation of Caged Animal
by: Dutta, Sayak, et al.
Published: (2025)
by: Dutta, Sayak, et al.
Published: (2025)
A Light Perspective for 3D Object Detection
by: Pederiva, Marcelo Eduardo, et al.
Published: (2025)
by: Pederiva, Marcelo Eduardo, et al.
Published: (2025)
Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos
by: Li, Yayuan, et al.
Published: (2025)
by: Li, Yayuan, et al.
Published: (2025)
Combining Absolute and Semi-Generalized Relative Poses for Visual Localization
by: Panek, Vojtech, et al.
Published: (2024)
by: Panek, Vojtech, et al.
Published: (2024)
A Guide to Structureless Visual Localization
by: Panek, Vojtech, et al.
Published: (2025)
by: Panek, Vojtech, et al.
Published: (2025)
DeltaVLM: Interactive Remote Sensing Image Change Analysis via Instruction-guided Difference Perception
by: Deng, Pei, et al.
Published: (2025)
by: Deng, Pei, et al.
Published: (2025)
Facial Attribute Based Text Guided Face Anonymization
by: Muştu, Mustafa İzzet, et al.
Published: (2025)
by: Muştu, Mustafa İzzet, et al.
Published: (2025)
Hierarchical Image-Guided 3D Point Cloud Segmentation in Industrial Scenes via Multi-View Bayesian Fusion
by: Zhu, Yu, et al.
Published: (2025)
by: Zhu, Yu, et al.
Published: (2025)
Image-Based Leopard Seal Recognition: Approaches and Challenges in Current Automated Systems
by: Salazar, Jorge Yero, et al.
Published: (2024)
by: Salazar, Jorge Yero, et al.
Published: (2024)
Dense Motion Captioning
by: Xu, Shiyao, et al.
Published: (2025)
by: Xu, Shiyao, et al.
Published: (2025)
TD3Net: A temporal densely connected multi-dilated convolutional network for lipreading
by: Lee, Byung Hoon, et al.
Published: (2025)
by: Lee, Byung Hoon, et al.
Published: (2025)
Similar Items
-
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025) -
SelvaBox: A high-resolution dataset for tropical tree crown detection
by: Baudchon, Hugo, et al.
Published: (2025) -
From eye to AI: studying rodent social behavior in the era of machine Learning
by: Chindemi, Giuseppe, et al.
Published: (2025) -
SPMamba-YOLO: An Underwater Object Detection Network Based on Multi-Scale Feature Enhancement and Global Context Modeling
by: Liao, Guanghao, et al.
Published: (2026) -
Optimizing the image correction pipeline for pedestrian detection in the thermal-infrared domain
by: Karam, Christophe, et al.
Published: (2024)