Saved in:
| Main Authors: | Foss, Aaron, Evans, Chloe, Mitts, Sasha, Sinha, Koustuv, Rizvi, Ammar, Kao, Justine T. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.09943 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)
by: Raoufi, Behnam, et al.
Published: (2025)
A Reverse Causal Framework to Mitigate Spurious Correlations for Debiasing Scene Graph Generation
by: Sun, Shuzhou, et al.
Published: (2025)
by: Sun, Shuzhou, et al.
Published: (2025)
PhysVideoGenerator: Towards Physically Aware Video Generation via Latent Physics Guidance
by: Satish, Siddarth Nilol Kundur, et al.
Published: (2026)
by: Satish, Siddarth Nilol Kundur, et al.
Published: (2026)
NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models
by: Lee, Kyuho, et al.
Published: (2025)
by: Lee, Kyuho, et al.
Published: (2025)
From eye to AI: studying rodent social behavior in the era of machine Learning
by: Chindemi, Giuseppe, et al.
Published: (2025)
by: Chindemi, Giuseppe, et al.
Published: (2025)
Semi supervised GAN for smart microscopy, fast and data efficient cell cycle classification
by: Manick, Rajeev, et al.
Published: (2026)
by: Manick, Rajeev, et al.
Published: (2026)
Perceptual Flow Network for Visually Grounded Reasoning
by: Li, Yangfu, et al.
Published: (2026)
by: Li, Yangfu, et al.
Published: (2026)
A Simple Baseline for Streaming Video Understanding
by: Shen, Yujiao, et al.
Published: (2026)
by: Shen, Yujiao, et al.
Published: (2026)
Action Anticipation from SoccerNet Football Video Broadcasts
by: Dalal, Mohamad, et al.
Published: (2025)
by: Dalal, Mohamad, et al.
Published: (2025)
SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model
by: Li, Xinqing, et al.
Published: (2025)
by: Li, Xinqing, et al.
Published: (2025)
Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention
by: Durrani, Hamza Ahmed, et al.
Published: (2026)
by: Durrani, Hamza Ahmed, et al.
Published: (2026)
Prompt Sensitivity in Vision-Language Grounding: How Small Changes in Wording Affect Object Detection
by: Deka, Dawar Jyoti, et al.
Published: (2026)
by: Deka, Dawar Jyoti, et al.
Published: (2026)
Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos
by: Li, Yayuan, et al.
Published: (2025)
by: Li, Yayuan, et al.
Published: (2025)
OmniAcc: Personalized Accessibility Assistant Using Generative AI
by: Karki, Siddhant, et al.
Published: (2025)
by: Karki, Siddhant, et al.
Published: (2025)
DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification
by: Ho, Darryl, et al.
Published: (2025)
by: Ho, Darryl, et al.
Published: (2025)
VLM-VPI: A Vision-Language Reasoning Framework for Improving Automated Vehicle-Pedestrian Interactions
by: Pu, Qingwen, et al.
Published: (2026)
by: Pu, Qingwen, et al.
Published: (2026)
Motion-Guided Semantic Alignment with Negative Prompts for Zero-Shot Video Action Recognition
by: Wang, Yiming, et al.
Published: (2026)
by: Wang, Yiming, et al.
Published: (2026)
Beyond Localization: A Comprehensive Diagnosis of Perspective-Conditioned Spatial Reasoning in MLLMs from Omnidirectional Images
by: Chen, Yuangong, et al.
Published: (2026)
by: Chen, Yuangong, et al.
Published: (2026)
PhysicsNeRF: Physics-Guided 3D Reconstruction from Sparse Views
by: Barhdadi, Mohamed Rayan, et al.
Published: (2025)
by: Barhdadi, Mohamed Rayan, et al.
Published: (2025)
Context in object detection: a systematic literature review
by: Jamali, Mahtab, et al.
Published: (2025)
by: Jamali, Mahtab, et al.
Published: (2025)
Mask-Conditioned Voxel Diffusion for Joint Geometry and Color Inpainting
by: Sumuk, Aarya
Published: (2026)
by: Sumuk, Aarya
Published: (2026)
Pedestrian Detection in Low-Light Conditions: A Comprehensive Survey
by: Ghari, Bahareh, et al.
Published: (2024)
by: Ghari, Bahareh, et al.
Published: (2024)
FlowIBR: Leveraging Pre-Training for Efficient Neural Image-Based Rendering of Dynamic Scenes
by: Büsching, Marcel, et al.
Published: (2023)
by: Büsching, Marcel, et al.
Published: (2023)
IMASHRIMP: Automatic White Shrimp (Penaeus vannamei) Biometrical Analysis from Laboratory Images Using Computer Vision and Deep Learning
by: González, Abiam Remache, et al.
Published: (2025)
by: González, Abiam Remache, et al.
Published: (2025)
OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance
by: Wang, Chaoyi, et al.
Published: (2025)
by: Wang, Chaoyi, et al.
Published: (2025)
Reducing Object Hallucination in LVLMs via Emphasizing Image-negative Tokens
by: Shen, Meng, et al.
Published: (2026)
by: Shen, Meng, et al.
Published: (2026)
Towards a Generalizable Fusion Architecture for Multimodal Object Detection
by: Berjawi, Jad, et al.
Published: (2025)
by: Berjawi, Jad, et al.
Published: (2025)
EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis
by: Guo, Yijie, et al.
Published: (2025)
by: Guo, Yijie, et al.
Published: (2025)
SCA-Net: Spatial-Contextual Aggregation Network for Enhanced Small Building and Road Change Detection
by: Gholibeigi, Emad, et al.
Published: (2026)
by: Gholibeigi, Emad, et al.
Published: (2026)
Synthetic-Child: An AIGC-Based Synthetic Data Pipeline for Privacy-Preserving Child Posture Estimation
by: Zeng, Taowen
Published: (2026)
by: Zeng, Taowen
Published: (2026)
Exploring Surround-View Fisheye Camera 3D Object Detection
by: Li, Changcai, et al.
Published: (2025)
by: Li, Changcai, et al.
Published: (2025)
A Vision-Language Model for Focal Liver Lesion Classification
by: Jian, Song, et al.
Published: (2025)
by: Jian, Song, et al.
Published: (2025)
Car Object Counting and Position Estimation via Extension of the CLIP-EBC Framework
by: Jung, Seoik, et al.
Published: (2025)
by: Jung, Seoik, et al.
Published: (2025)
Pixel-Level Pavement Distress Assessment Using Instance Segmentation
by: Dewick, Logan, et al.
Published: (2026)
by: Dewick, Logan, et al.
Published: (2026)
Vi-SAFE: A Spatial-Temporal Framework for Efficient Violence Detection in Public Surveillance
by: Chang, Ligang, et al.
Published: (2025)
by: Chang, Ligang, et al.
Published: (2025)
A Recipe for Geometry-Aware 3D Mesh Transformers
by: Farazi, Mohammad, et al.
Published: (2024)
by: Farazi, Mohammad, et al.
Published: (2024)
TACIT Benchmark: A Programmatic Visual Reasoning Benchmark for Generative and Discriminative Models
by: Medeiros, Daniel Nobrega
Published: (2026)
by: Medeiros, Daniel Nobrega
Published: (2026)
VDPP: Video Depth Post-Processing for Speed and Scalability
by: Yoon, Daewon, et al.
Published: (2026)
by: Yoon, Daewon, et al.
Published: (2026)
Habitat Classification from Ground-Level Imagery Using Deep Neural Networks
by: Shi, Hongrui, et al.
Published: (2025)
by: Shi, Hongrui, et al.
Published: (2025)
Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey
by: Rajapaksha, Uchitha, et al.
Published: (2024)
by: Rajapaksha, Uchitha, et al.
Published: (2024)
Similar Items
-
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025) -
A Reverse Causal Framework to Mitigate Spurious Correlations for Debiasing Scene Graph Generation
by: Sun, Shuzhou, et al.
Published: (2025) -
PhysVideoGenerator: Towards Physically Aware Video Generation via Latent Physics Guidance
by: Satish, Siddarth Nilol Kundur, et al.
Published: (2026) -
NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models
by: Lee, Kyuho, et al.
Published: (2025) -
From eye to AI: studying rodent social behavior in the era of machine Learning
by: Chindemi, Giuseppe, et al.
Published: (2025)