:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Foss, Aaron, Evans, Chloe, Mitts, Sasha, Sinha, Koustuv, Rizvi, Ammar, Kao, Justine T.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence I.2.10; I.4.8
Online Access:	https://arxiv.org/abs/2506.09943
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)

A Reverse Causal Framework to Mitigate Spurious Correlations for Debiasing Scene Graph Generation
by: Sun, Shuzhou, et al.
Published: (2025)

PhysVideoGenerator: Towards Physically Aware Video Generation via Latent Physics Guidance
by: Satish, Siddarth Nilol Kundur, et al.
Published: (2026)

NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models
by: Lee, Kyuho, et al.
Published: (2025)

From eye to AI: studying rodent social behavior in the era of machine Learning
by: Chindemi, Giuseppe, et al.
Published: (2025)

Semi supervised GAN for smart microscopy, fast and data efficient cell cycle classification
by: Manick, Rajeev, et al.
Published: (2026)

Perceptual Flow Network for Visually Grounded Reasoning
by: Li, Yangfu, et al.
Published: (2026)

A Simple Baseline for Streaming Video Understanding
by: Shen, Yujiao, et al.
Published: (2026)

Action Anticipation from SoccerNet Football Video Broadcasts
by: Dalal, Mohamad, et al.
Published: (2025)

SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model
by: Li, Xinqing, et al.
Published: (2025)

Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention
by: Durrani, Hamza Ahmed, et al.
Published: (2026)

Prompt Sensitivity in Vision-Language Grounding: How Small Changes in Wording Affect Object Detection
by: Deka, Dawar Jyoti, et al.
Published: (2026)

Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos
by: Li, Yayuan, et al.
Published: (2025)

OmniAcc: Personalized Accessibility Assistant Using Generative AI
by: Karki, Siddhant, et al.
Published: (2025)

DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification
by: Ho, Darryl, et al.
Published: (2025)

VLM-VPI: A Vision-Language Reasoning Framework for Improving Automated Vehicle-Pedestrian Interactions
by: Pu, Qingwen, et al.
Published: (2026)

Motion-Guided Semantic Alignment with Negative Prompts for Zero-Shot Video Action Recognition
by: Wang, Yiming, et al.
Published: (2026)

Beyond Localization: A Comprehensive Diagnosis of Perspective-Conditioned Spatial Reasoning in MLLMs from Omnidirectional Images
by: Chen, Yuangong, et al.
Published: (2026)

PhysicsNeRF: Physics-Guided 3D Reconstruction from Sparse Views
by: Barhdadi, Mohamed Rayan, et al.
Published: (2025)

Context in object detection: a systematic literature review
by: Jamali, Mahtab, et al.
Published: (2025)

Mask-Conditioned Voxel Diffusion for Joint Geometry and Color Inpainting
by: Sumuk, Aarya
Published: (2026)

Pedestrian Detection in Low-Light Conditions: A Comprehensive Survey
by: Ghari, Bahareh, et al.
Published: (2024)

FlowIBR: Leveraging Pre-Training for Efficient Neural Image-Based Rendering of Dynamic Scenes
by: Büsching, Marcel, et al.
Published: (2023)

IMASHRIMP: Automatic White Shrimp (Penaeus vannamei) Biometrical Analysis from Laboratory Images Using Computer Vision and Deep Learning
by: González, Abiam Remache, et al.
Published: (2025)

OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance
by: Wang, Chaoyi, et al.
Published: (2025)

Reducing Object Hallucination in LVLMs via Emphasizing Image-negative Tokens
by: Shen, Meng, et al.
Published: (2026)

Towards a Generalizable Fusion Architecture for Multimodal Object Detection
by: Berjawi, Jad, et al.
Published: (2025)

EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis
by: Guo, Yijie, et al.
Published: (2025)

SCA-Net: Spatial-Contextual Aggregation Network for Enhanced Small Building and Road Change Detection
by: Gholibeigi, Emad, et al.
Published: (2026)

Synthetic-Child: An AIGC-Based Synthetic Data Pipeline for Privacy-Preserving Child Posture Estimation
by: Zeng, Taowen
Published: (2026)

Exploring Surround-View Fisheye Camera 3D Object Detection
by: Li, Changcai, et al.
Published: (2025)

A Vision-Language Model for Focal Liver Lesion Classification
by: Jian, Song, et al.
Published: (2025)

Car Object Counting and Position Estimation via Extension of the CLIP-EBC Framework
by: Jung, Seoik, et al.
Published: (2025)

Pixel-Level Pavement Distress Assessment Using Instance Segmentation
by: Dewick, Logan, et al.
Published: (2026)

Vi-SAFE: A Spatial-Temporal Framework for Efficient Violence Detection in Public Surveillance
by: Chang, Ligang, et al.
Published: (2025)

A Recipe for Geometry-Aware 3D Mesh Transformers
by: Farazi, Mohammad, et al.
Published: (2024)

TACIT Benchmark: A Programmatic Visual Reasoning Benchmark for Generative and Discriminative Models
by: Medeiros, Daniel Nobrega
Published: (2026)

VDPP: Video Depth Post-Processing for Speed and Scalability
by: Yoon, Daewon, et al.
Published: (2026)

Habitat Classification from Ground-Level Imagery Using Deep Neural Networks
by: Shi, Hongrui, et al.
Published: (2025)

Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey
by: Rajapaksha, Uchitha, et al.
Published: (2024)