:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ghazanfari, Mahyar, Wei, Peng
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.13292
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Computer Vision Approach for Autonomous Cars to Drive Safe at Construction Zone
by: Ahammed, Abu Shad, et al.
Published: (2024)

SpotEdit: Evaluating Visually-Guided Image Editing Methods
by: Ghazanfari, Sara, et al.
Published: (2025)

VLN-Pilot: Large Vision-Language Model as an Autonomous Indoor Drone Operator
by: Dominguez-Dager, Bessie, et al.
Published: (2026)

Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors
by: Wang, Xiangchen, et al.
Published: (2025)

UCorr: Wire Detection and Depth Estimation for Autonomous Drones
by: Kolbeinsson, Benedikt, et al.
Published: (2025)

Do You See What I Say? Generalizable Deepfake Detection based on Visual Speech Recognition
by: Bora, Maheswar, et al.
Published: (2025)

YOLOMG: Vision-based Drone-to-Drone Detection with Appearance and Pixel-Level Motion Fusion
by: Guo, Hanqing, et al.
Published: (2025)

DroneMOT: Drone-based Multi-Object Tracking Considering Detection Difficulties and Simultaneous Moving of Drones and Objects
by: Wang, Peng, et al.
Published: (2024)

Seeing the Evidence, Missing the Answer: Tool-Guided Vision-Language Models on Visual Illusions
by: Wang, Xuesong, et al.
Published: (2026)

See-in-Pairs: Reference Image-Guided Comparative Vision-Language Models for Medical Diagnosis
by: Jin, Ruinan, et al.
Published: (2025)

C2FDrone: Coarse-to-Fine Drone-to-Drone Detection using Vision Transformer Networks
by: Rebbapragada, Sairam VC, et al.
Published: (2024)

Are VLMs Seeing or Just Saying? Uncovering the Illusion of Visual Re-examination
by: Shi, Chufan, et al.
Published: (2026)

An Autonomous Drone Swarm for Detecting and Tracking Anomalies among Dense Vegetation
by: Nathan, Rakesh John Amala Arokia, et al.
Published: (2024)

Vision-Language Models Can't See the Obvious
by: Dahou, Yasser, et al.
Published: (2025)

Seeing Beyond the Scene: Enhancing Vision-Language Models with Interactional Reasoning
by: Liang, Dayong, et al.
Published: (2025)

Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations
by: Singh, Jaisidh, et al.
Published: (2024)

C-CoT: Counterfactual Chain-of-Thought with Vision-Language Models for Safe Autonomous Driving
by: Tian, Kefei, et al.
Published: (2026)

Tables Guide Vision: Learning to See the Heart through Tabular Data
by: Hasny, Marta, et al.
Published: (2025)

Saliency-Guided Deep Learning for Bridge Defect Detection in Drone Imagery
by: Hebbache, Loucif, et al.
Published: (2025)

See It, Say It, Sorted: An Iterative Training-Free Framework for Visually-Grounded Multimodal Reasoning in LVLMs
by: Zhang, Yongchang, et al.
Published: (2026)

SYNCR: A Cross-Video Reasoning Benchmark with Synthetic Grounding
by: Ghazanfari, Sara, et al.
Published: (2026)

SynDroneVision: A Synthetic Dataset for Image-Based Drone Detection
by: Lenhard, Tamara R., et al.
Published: (2024)

Spatial-aware Vision Language Model for Autonomous Driving
by: Wei, Weijie, et al.
Published: (2025)

Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning
by: Wang, Haozhe, et al.
Published: (2026)

Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models
by: Cao, Sihan, et al.
Published: (2026)

Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
by: Chu, Meng, et al.
Published: (2023)

Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays
by: Liu, Kang, et al.
Published: (2026)

See it. Say it. Sorted: Agentic System for Compositional Diagram Generation
by: Zhang, Hantao, et al.
Published: (2025)

Drone Detection with Event Cameras
by: Magrini, Gabriele, et al.
Published: (2025)

Low-Cost Stereo Vision for Robust 3D Positioning of Thin Radiata Pine Branches in Autonomous Drone Pruning
by: Lin, Yida, et al.
Published: (2026)

Toward Autonomous Laboratory Safety Monitoring with Vision Language Models: Learning to See Hazards Through Scene Structure
by: Chakraborty, Trishna, et al.
Published: (2026)

Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations
by: Zhang, Xuesong, et al.
Published: (2024)

How Well Can Vision Language Models See Image Details?
by: Gou, Chenhui, et al.
Published: (2024)

BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision
by: Zhao, Xin, et al.
Published: (2024)

Seeing Right but Saying Wrong: Inter- and Intra-Layer Refinement in MLLMs without Training
by: Song, Shezheng, et al.
Published: (2026)

HCCM: Hierarchical Cross-Granularity Contrastive and Matching Learning for Natural Language-Guided Drones
by: Ruan, Hao, et al.
Published: (2025)

See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias
by: Kwon, JuneHyoung, et al.
Published: (2025)

Seeing No Evil: Blinding Large Vision-Language Models to Safety Instructions via Adversarial Attention Hijacking
by: Li, Jingru, et al.
Published: (2026)

Seeing the Abstract: Translating the Abstract Language for Vision Language Models
by: Talon, Davide, et al.
Published: (2025)

Seeing What You Say: Expressive Image Generation from Speech
by: Lee, Jiyoung, et al.
Published: (2025)