Saved in:
| Main Authors: | Khan, Zaid, Fu, Yun |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.10193 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Variational Visual Question Answering for Uncertainty-Aware Selective Prediction
by: Wieczorek, Tobias Jan, et al.
Published: (2025)
by: Wieczorek, Tobias Jan, et al.
Published: (2025)
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
by: Khan, Zaid, et al.
Published: (2024)
by: Khan, Zaid, et al.
Published: (2024)
Analyzing the Sensitivity of Vision Language Models in Visual Question Answering
by: Shah, Monika, et al.
Published: (2025)
by: Shah, Monika, et al.
Published: (2025)
Selectively Answering Visual Questions
by: Eisenschlos, Julian Martin, et al.
Published: (2024)
by: Eisenschlos, Julian Martin, et al.
Published: (2024)
Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering
by: Hao, Dongze, et al.
Published: (2024)
by: Hao, Dongze, et al.
Published: (2024)
Human Uncertainty-Aware Data Selection and Automatic Labeling in Visual Question Answering
by: Lan, Jian, et al.
Published: (2025)
by: Lan, Jian, et al.
Published: (2025)
Large Vision-Language Models for Remote Sensing Visual Question Answering
by: Siripong, Surasakdi, et al.
Published: (2024)
by: Siripong, Surasakdi, et al.
Published: (2024)
BLaVe-CoT: Consistency-Aware Visual Question Answering for Blind and Low Vision Users
by: Cheng, Wanyin, et al.
Published: (2025)
by: Cheng, Wanyin, et al.
Published: (2025)
Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering
by: Ha, Cuong Nhat, et al.
Published: (2024)
by: Ha, Cuong Nhat, et al.
Published: (2024)
HORNet: Task-Guided Frame Selection for Video Question Answering with Vision-Language Models
by: Bai, Xiangyu, et al.
Published: (2026)
by: Bai, Xiangyu, et al.
Published: (2026)
Adapting Lightweight Vision Language Models for Radiological Visual Question Answering
by: Shourya, Aditya, et al.
Published: (2025)
by: Shourya, Aditya, et al.
Published: (2025)
Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models
by: Meng, Tian, et al.
Published: (2024)
by: Meng, Tian, et al.
Published: (2024)
Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models
by: Woo, Sangmin, et al.
Published: (2025)
by: Woo, Sangmin, et al.
Published: (2025)
SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving
by: Zheng, Peiru, et al.
Published: (2024)
by: Zheng, Peiru, et al.
Published: (2024)
Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays
by: Cho, Yeongjae, et al.
Published: (2024)
by: Cho, Yeongjae, et al.
Published: (2024)
Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review
by: Hartsock, Iryna, et al.
Published: (2024)
by: Hartsock, Iryna, et al.
Published: (2024)
Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types
by: Sinha, Neelabh, et al.
Published: (2024)
by: Sinha, Neelabh, et al.
Published: (2024)
Test-Time Hinting for Black-Box Vision-Language Models
by: Hou, Kaihua, et al.
Published: (2026)
by: Hou, Kaihua, et al.
Published: (2026)
Object Retrieval for Visual Question Answering with Outside Knowledge
by: Kan, Shichao, et al.
Published: (2024)
by: Kan, Shichao, et al.
Published: (2024)
Spatially Grounded Explanations in Vision Language Models for Document Visual Question Answering
by: Lagos, Maximiliano Hormazábal, et al.
Published: (2025)
by: Lagos, Maximiliano Hormazábal, et al.
Published: (2025)
Research on Vision-Language Question Answering Models for Industrial Robots
by: Li, Ping, et al.
Published: (2026)
by: Li, Ping, et al.
Published: (2026)
Object Attribute Matters in Visual Question Answering
by: Li, Peize, et al.
Published: (2023)
by: Li, Peize, et al.
Published: (2023)
WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering
by: Zhu, Yingjian, et al.
Published: (2026)
by: Zhu, Yingjian, et al.
Published: (2026)
Where do Large Vision-Language Models Look at when Answering Questions?
by: Xing, Xiaoying, et al.
Published: (2025)
by: Xing, Xiaoying, et al.
Published: (2025)
Questioning the Stability of Visual Question Answering
by: Rosenfeld, Amir, et al.
Published: (2025)
by: Rosenfeld, Amir, et al.
Published: (2025)
3D Question Answering via only 2D Vision-Language Models
by: Wang, Fengyun, et al.
Published: (2025)
by: Wang, Fengyun, et al.
Published: (2025)
Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question Answering
by: Chen, Yixiong, et al.
Published: (2025)
by: Chen, Yixiong, et al.
Published: (2025)
Uncertainty-Guided Self-Questioning and Answering for Video-Language Alignment
by: Chen, Jin, et al.
Published: (2024)
by: Chen, Jin, et al.
Published: (2024)
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering
by: Wang, Yanling, et al.
Published: (2025)
by: Wang, Yanling, et al.
Published: (2025)
Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering
by: Guo, Danfeng, et al.
Published: (2024)
by: Guo, Danfeng, et al.
Published: (2024)
VideoDistill: Language-aware Vision Distillation for Video Question Answering
by: Zou, Bo, et al.
Published: (2024)
by: Zou, Bo, et al.
Published: (2024)
Language Models as Black-Box Optimizers for Vision-Language Models
by: Liu, Shihong, et al.
Published: (2023)
by: Liu, Shihong, et al.
Published: (2023)
A Two-Stage Multitask Vision-Language Framework for Explainable Crop Disease Visual Question Answering
by: Hossain, Md. Zahid, et al.
Published: (2026)
by: Hossain, Md. Zahid, et al.
Published: (2026)
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks
by: Lee, Jusung, et al.
Published: (2024)
by: Lee, Jusung, et al.
Published: (2024)
Trust the Unreliability: Inward Backward Dynamic Unreliability Driven Coreset Selection for Medical Image Classification
by: Liang, Yan, et al.
Published: (2026)
by: Liang, Yan, et al.
Published: (2026)
CoGR-MoE: Concept-Guided Expert Routing with Consistent Selection and Flexible Reasoning for Visual Question Answering
by: Zeng, Xiyin, et al.
Published: (2026)
by: Zeng, Xiyin, et al.
Published: (2026)
Targeted Visual Prompting for Medical Visual Question Answering
by: Tascon-Morales, Sergio, et al.
Published: (2024)
by: Tascon-Morales, Sergio, et al.
Published: (2024)
Visual Robustness Benchmark for Visual Question Answering (VQA)
by: Ishmam, Md Farhan, et al.
Published: (2024)
by: Ishmam, Md Farhan, et al.
Published: (2024)
Visually Interpretable Subtask Reasoning for Visual Question Answering
by: Cheng, Yu, et al.
Published: (2025)
by: Cheng, Yu, et al.
Published: (2025)
Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering
by: Lim, Su Hyeon, et al.
Published: (2024)
by: Lim, Su Hyeon, et al.
Published: (2024)
Similar Items
-
Variational Visual Question Answering for Uncertainty-Aware Selective Prediction
by: Wieczorek, Tobias Jan, et al.
Published: (2025) -
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
by: Khan, Zaid, et al.
Published: (2024) -
Analyzing the Sensitivity of Vision Language Models in Visual Question Answering
by: Shah, Monika, et al.
Published: (2025) -
Selectively Answering Visual Questions
by: Eisenschlos, Julian Martin, et al.
Published: (2024) -
Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering
by: Hao, Dongze, et al.
Published: (2024)