Enregistré dans:
| Auteurs principaux: | Shourya, Aditya, Dumontier, Michel, Sun, Chang |
|---|---|
| Format: | Preprint |
| Publié: |
2025
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2506.14451 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
Documents similaires
Free Form Medical Visual Question Answering in Radiology
par: Narayanan, Abhishek, et autres
Publié: (2024)
par: Narayanan, Abhishek, et autres
Publié: (2024)
Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving
par: Gopalkrishnan, Akshay, et autres
Publié: (2024)
par: Gopalkrishnan, Akshay, et autres
Publié: (2024)
CompDiff: Hierarchical Compositional Diffusion for Fair and Zero-Shot Intersectional Medical Image Generation
par: Ibrahim, Mahmoud, et autres
Publié: (2026)
par: Ibrahim, Mahmoud, et autres
Publié: (2026)
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
par: Wang, Guankun, et autres
Publié: (2024)
par: Wang, Guankun, et autres
Publié: (2024)
Research on Vision-Language Question Answering Models for Industrial Robots
par: Li, Ping, et autres
Publié: (2026)
par: Li, Ping, et autres
Publié: (2026)
CLARIFY: A Specialist-Generalist Framework for Accurate and Lightweight Dermatological Visual Question Answering
par: Saha, Aranya, et autres
Publié: (2025)
par: Saha, Aranya, et autres
Publié: (2025)
Spatially Grounded Explanations in Vision Language Models for Document Visual Question Answering
par: Lagos, Maximiliano Hormazábal, et autres
Publié: (2025)
par: Lagos, Maximiliano Hormazábal, et autres
Publié: (2025)
Hierarchical Question-Answering for Driving Scene Understanding Using Vision-Language Models
par: Mohamud, Safaa Abdullahi Moallim, et autres
Publié: (2025)
par: Mohamud, Safaa Abdullahi Moallim, et autres
Publié: (2025)
Are Large Vision Language Models Truly Grounded in Medical Images? Evidence from Italian Clinical Visual Question Answering
par: Felizzi, Federico, et autres
Publié: (2025)
par: Felizzi, Federico, et autres
Publié: (2025)
SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving
par: Zheng, Peiru, et autres
Publié: (2024)
par: Zheng, Peiru, et autres
Publié: (2024)
RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering
par: Butsanets, Léo, et autres
Publié: (2025)
par: Butsanets, Léo, et autres
Publié: (2025)
Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering
par: Guo, Danfeng, et autres
Publié: (2024)
par: Guo, Danfeng, et autres
Publié: (2024)
RS-MoE: A Vision-Language Model with Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering
par: Lin, Hui, et autres
Publié: (2024)
par: Lin, Hui, et autres
Publié: (2024)
Probing Visual Concepts in Lightweight Vision-Language Models for Automated Driving
par: Theodoridis, Nikos, et autres
Publié: (2026)
par: Theodoridis, Nikos, et autres
Publié: (2026)
VoQA: Visual-only Question Answering
par: An, Jianing, et autres
Publié: (2025)
par: An, Jianing, et autres
Publié: (2025)
Object Attribute Matters in Visual Question Answering
par: Li, Peize, et autres
Publié: (2023)
par: Li, Peize, et autres
Publié: (2023)
VQA$^2$: Visual Question Answering for Video Quality Assessment
par: Jia, Ziheng, et autres
Publié: (2024)
par: Jia, Ziheng, et autres
Publié: (2024)
Bridging Vision Language Models and Symbolic Grounding for Video Question Answering
par: Ma, Haodi, et autres
Publié: (2025)
par: Ma, Haodi, et autres
Publié: (2025)
SCRA-VQA: Summarized Caption-Rerank for Augmented Large Language Models in Visual Question Answering
par: Zhang, Yan, et autres
Publié: (2025)
par: Zhang, Yan, et autres
Publié: (2025)
Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning
par: Zeng, Xingchen, et autres
Publié: (2024)
par: Zeng, Xingchen, et autres
Publié: (2024)
Knowledge Detection by Relevant Question and Image Attributes in Visual Question Answering
par: Ahir, Param, et autres
Publié: (2023)
par: Ahir, Param, et autres
Publié: (2023)
Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention
par: Liu, Ying, et autres
Publié: (2024)
par: Liu, Ying, et autres
Publié: (2024)
PlantVillageVQA: A Visual Question Answering Dataset for Benchmarking Vision-Language Models in Plant Science
par: Sakib, Syed Nazmus, et autres
Publié: (2025)
par: Sakib, Syed Nazmus, et autres
Publié: (2025)
Hierarchical Modeling for Medical Visual Question Answering with Cross-Attention Fusion
par: Zhang, Junkai, et autres
Publié: (2025)
par: Zhang, Junkai, et autres
Publié: (2025)
Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types
par: Sinha, Neelabh, et autres
Publié: (2024)
par: Sinha, Neelabh, et autres
Publié: (2024)
JSynFlow: Japanese Synthesised Flowchart Visual Question Answering Dataset built with Large Language Models
par: Sasaki, Hiroshi
Publié: (2026)
par: Sasaki, Hiroshi
Publié: (2026)
POLYCHARTQA: Benchmarking Large Vision-Language Models with Multilingual Chart Question Answering
par: Xu, Yichen, et autres
Publié: (2025)
par: Xu, Yichen, et autres
Publié: (2025)
Saliency Guided Longitudinal Medical Visual Question Answering
par: Wu, Jialin, et autres
Publié: (2025)
par: Wu, Jialin, et autres
Publié: (2025)
Multi-Sourced Compositional Generalization in Visual Question Answering
par: Li, Chuanhao, et autres
Publié: (2025)
par: Li, Chuanhao, et autres
Publié: (2025)
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
par: Özdemir, Övgü, et autres
Publié: (2024)
par: Özdemir, Övgü, et autres
Publié: (2024)
TM-PATHVQA:90000+ Textless Multilingual Questions for Medical Visual Question Answering
par: Rajkhowa, Tonmoy, et autres
Publié: (2024)
par: Rajkhowa, Tonmoy, et autres
Publié: (2024)
Uncertainty-Guided Self-Questioning and Answering for Video-Language Alignment
par: Chen, Jin, et autres
Publié: (2024)
par: Chen, Jin, et autres
Publié: (2024)
Cause-Effect Driven Optimization for Robust Medical Visual Question Answering with Language Biases
par: Zhu, Huanjia, et autres
Publié: (2025)
par: Zhu, Huanjia, et autres
Publié: (2025)
LaViC: Adapting Large Vision-Language Models to Visually-Aware Conversational Recommendation
par: Jeon, Hyunsik, et autres
Publié: (2025)
par: Jeon, Hyunsik, et autres
Publié: (2025)
Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis
par: Li, Frank, et autres
Publié: (2025)
par: Li, Frank, et autres
Publié: (2025)
RadVLM: A Multitask Conversational Vision-Language Model for Radiology
par: Deperrois, Nicolas, et autres
Publié: (2025)
par: Deperrois, Nicolas, et autres
Publié: (2025)
Hallucination Benchmark in Medical Visual Question Answering
par: Wu, Jinge, et autres
Publié: (2024)
par: Wu, Jinge, et autres
Publié: (2024)
Ask and Remember: A Questions-Only Replay Strategy for Continual Visual Question Answering
par: Marouf, Imad Eddine, et autres
Publié: (2025)
par: Marouf, Imad Eddine, et autres
Publié: (2025)
DarkQA: Benchmarking Vision-Language Models on Visual-Primitive Question Answering in Low-Light Indoor Scenes
par: Park, Yohan, et autres
Publié: (2025)
par: Park, Yohan, et autres
Publié: (2025)
MedLVR: Latent Visual Reasoning for Reliable Medical Visual Question Answering
par: Xi, Suyang, et autres
Publié: (2026)
par: Xi, Suyang, et autres
Publié: (2026)
Documents similaires
-
Free Form Medical Visual Question Answering in Radiology
par: Narayanan, Abhishek, et autres
Publié: (2024) -
Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving
par: Gopalkrishnan, Akshay, et autres
Publié: (2024) -
CompDiff: Hierarchical Compositional Diffusion for Fair and Zero-Shot Intersectional Medical Image Generation
par: Ibrahim, Mahmoud, et autres
Publié: (2026) -
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
par: Wang, Guankun, et autres
Publié: (2024) -
Research on Vision-Language Question Answering Models for Industrial Robots
par: Li, Ping, et autres
Publié: (2026)