Saved in:
| Main Authors: | Ghazanfari, Mahyar, Wei, Peng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.13292 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Computer Vision Approach for Autonomous Cars to Drive Safe at Construction Zone
by: Ahammed, Abu Shad, et al.
Published: (2024)
by: Ahammed, Abu Shad, et al.
Published: (2024)
SpotEdit: Evaluating Visually-Guided Image Editing Methods
by: Ghazanfari, Sara, et al.
Published: (2025)
by: Ghazanfari, Sara, et al.
Published: (2025)
VLN-Pilot: Large Vision-Language Model as an Autonomous Indoor Drone Operator
by: Dominguez-Dager, Bessie, et al.
Published: (2026)
by: Dominguez-Dager, Bessie, et al.
Published: (2026)
Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors
by: Wang, Xiangchen, et al.
Published: (2025)
by: Wang, Xiangchen, et al.
Published: (2025)
UCorr: Wire Detection and Depth Estimation for Autonomous Drones
by: Kolbeinsson, Benedikt, et al.
Published: (2025)
by: Kolbeinsson, Benedikt, et al.
Published: (2025)
Do You See What I Say? Generalizable Deepfake Detection based on Visual Speech Recognition
by: Bora, Maheswar, et al.
Published: (2025)
by: Bora, Maheswar, et al.
Published: (2025)
YOLOMG: Vision-based Drone-to-Drone Detection with Appearance and Pixel-Level Motion Fusion
by: Guo, Hanqing, et al.
Published: (2025)
by: Guo, Hanqing, et al.
Published: (2025)
DroneMOT: Drone-based Multi-Object Tracking Considering Detection Difficulties and Simultaneous Moving of Drones and Objects
by: Wang, Peng, et al.
Published: (2024)
by: Wang, Peng, et al.
Published: (2024)
Seeing the Evidence, Missing the Answer: Tool-Guided Vision-Language Models on Visual Illusions
by: Wang, Xuesong, et al.
Published: (2026)
by: Wang, Xuesong, et al.
Published: (2026)
See-in-Pairs: Reference Image-Guided Comparative Vision-Language Models for Medical Diagnosis
by: Jin, Ruinan, et al.
Published: (2025)
by: Jin, Ruinan, et al.
Published: (2025)
C2FDrone: Coarse-to-Fine Drone-to-Drone Detection using Vision Transformer Networks
by: Rebbapragada, Sairam VC, et al.
Published: (2024)
by: Rebbapragada, Sairam VC, et al.
Published: (2024)
Are VLMs Seeing or Just Saying? Uncovering the Illusion of Visual Re-examination
by: Shi, Chufan, et al.
Published: (2026)
by: Shi, Chufan, et al.
Published: (2026)
An Autonomous Drone Swarm for Detecting and Tracking Anomalies among Dense Vegetation
by: Nathan, Rakesh John Amala Arokia, et al.
Published: (2024)
by: Nathan, Rakesh John Amala Arokia, et al.
Published: (2024)
Vision-Language Models Can't See the Obvious
by: Dahou, Yasser, et al.
Published: (2025)
by: Dahou, Yasser, et al.
Published: (2025)
Seeing Beyond the Scene: Enhancing Vision-Language Models with Interactional Reasoning
by: Liang, Dayong, et al.
Published: (2025)
by: Liang, Dayong, et al.
Published: (2025)
Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations
by: Singh, Jaisidh, et al.
Published: (2024)
by: Singh, Jaisidh, et al.
Published: (2024)
C-CoT: Counterfactual Chain-of-Thought with Vision-Language Models for Safe Autonomous Driving
by: Tian, Kefei, et al.
Published: (2026)
by: Tian, Kefei, et al.
Published: (2026)
Tables Guide Vision: Learning to See the Heart through Tabular Data
by: Hasny, Marta, et al.
Published: (2025)
by: Hasny, Marta, et al.
Published: (2025)
Saliency-Guided Deep Learning for Bridge Defect Detection in Drone Imagery
by: Hebbache, Loucif, et al.
Published: (2025)
by: Hebbache, Loucif, et al.
Published: (2025)
See It, Say It, Sorted: An Iterative Training-Free Framework for Visually-Grounded Multimodal Reasoning in LVLMs
by: Zhang, Yongchang, et al.
Published: (2026)
by: Zhang, Yongchang, et al.
Published: (2026)
SYNCR: A Cross-Video Reasoning Benchmark with Synthetic Grounding
by: Ghazanfari, Sara, et al.
Published: (2026)
by: Ghazanfari, Sara, et al.
Published: (2026)
SynDroneVision: A Synthetic Dataset for Image-Based Drone Detection
by: Lenhard, Tamara R., et al.
Published: (2024)
by: Lenhard, Tamara R., et al.
Published: (2024)
Spatial-aware Vision Language Model for Autonomous Driving
by: Wei, Weijie, et al.
Published: (2025)
by: Wei, Weijie, et al.
Published: (2025)
Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning
by: Wang, Haozhe, et al.
Published: (2026)
by: Wang, Haozhe, et al.
Published: (2026)
Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models
by: Cao, Sihan, et al.
Published: (2026)
by: Cao, Sihan, et al.
Published: (2026)
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
by: Chu, Meng, et al.
Published: (2023)
by: Chu, Meng, et al.
Published: (2023)
Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays
by: Liu, Kang, et al.
Published: (2026)
by: Liu, Kang, et al.
Published: (2026)
See it. Say it. Sorted: Agentic System for Compositional Diagram Generation
by: Zhang, Hantao, et al.
Published: (2025)
by: Zhang, Hantao, et al.
Published: (2025)
Drone Detection with Event Cameras
by: Magrini, Gabriele, et al.
Published: (2025)
by: Magrini, Gabriele, et al.
Published: (2025)
Low-Cost Stereo Vision for Robust 3D Positioning of Thin Radiata Pine Branches in Autonomous Drone Pruning
by: Lin, Yida, et al.
Published: (2026)
by: Lin, Yida, et al.
Published: (2026)
Toward Autonomous Laboratory Safety Monitoring with Vision Language Models: Learning to See Hazards Through Scene Structure
by: Chakraborty, Trishna, et al.
Published: (2026)
by: Chakraborty, Trishna, et al.
Published: (2026)
Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations
by: Zhang, Xuesong, et al.
Published: (2024)
by: Zhang, Xuesong, et al.
Published: (2024)
How Well Can Vision Language Models See Image Details?
by: Gou, Chenhui, et al.
Published: (2024)
by: Gou, Chenhui, et al.
Published: (2024)
BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision
by: Zhao, Xin, et al.
Published: (2024)
by: Zhao, Xin, et al.
Published: (2024)
Seeing Right but Saying Wrong: Inter- and Intra-Layer Refinement in MLLMs without Training
by: Song, Shezheng, et al.
Published: (2026)
by: Song, Shezheng, et al.
Published: (2026)
HCCM: Hierarchical Cross-Granularity Contrastive and Matching Learning for Natural Language-Guided Drones
by: Ruan, Hao, et al.
Published: (2025)
by: Ruan, Hao, et al.
Published: (2025)
See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias
by: Kwon, JuneHyoung, et al.
Published: (2025)
by: Kwon, JuneHyoung, et al.
Published: (2025)
Seeing No Evil: Blinding Large Vision-Language Models to Safety Instructions via Adversarial Attention Hijacking
by: Li, Jingru, et al.
Published: (2026)
by: Li, Jingru, et al.
Published: (2026)
Seeing the Abstract: Translating the Abstract Language for Vision Language Models
by: Talon, Davide, et al.
Published: (2025)
by: Talon, Davide, et al.
Published: (2025)
Seeing What You Say: Expressive Image Generation from Speech
by: Lee, Jiyoung, et al.
Published: (2025)
by: Lee, Jiyoung, et al.
Published: (2025)
Similar Items
-
A Computer Vision Approach for Autonomous Cars to Drive Safe at Construction Zone
by: Ahammed, Abu Shad, et al.
Published: (2024) -
SpotEdit: Evaluating Visually-Guided Image Editing Methods
by: Ghazanfari, Sara, et al.
Published: (2025) -
VLN-Pilot: Large Vision-Language Model as an Autonomous Indoor Drone Operator
by: Dominguez-Dager, Bessie, et al.
Published: (2026) -
Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors
by: Wang, Xiangchen, et al.
Published: (2025) -
UCorr: Wire Detection and Depth Estimation for Autonomous Drones
by: Kolbeinsson, Benedikt, et al.
Published: (2025)