Saved in:
| Main Authors: | Cheng, Yu, Goel, Arushi, Bilen, Hakan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.08084 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HumMorph: Generalized Dynamic Human Neural Fields from Few Views
by: Zadrożny, Jakub, et al.
Published: (2025)
by: Zadrożny, Jakub, et al.
Published: (2025)
MV-CoRe: Multimodal Visual-Conceptual Reasoning for Complex Visual Question Answering
by: Peng, Jingwei, et al.
Published: (2025)
by: Peng, Jingwei, et al.
Published: (2025)
Selectively Answering Visual Questions
by: Eisenschlos, Julian Martin, et al.
Published: (2024)
by: Eisenschlos, Julian Martin, et al.
Published: (2024)
Targeted Visual Prompting for Medical Visual Question Answering
by: Tascon-Morales, Sergio, et al.
Published: (2024)
by: Tascon-Morales, Sergio, et al.
Published: (2024)
Visual Robustness Benchmark for Visual Question Answering (VQA)
by: Ishmam, Md Farhan, et al.
Published: (2024)
by: Ishmam, Md Farhan, et al.
Published: (2024)
ORCA: Orchestrated Reasoning with Collaborative Agents for Document Visual Question Answering
by: Lassoued, Aymen, et al.
Published: (2026)
by: Lassoued, Aymen, et al.
Published: (2026)
Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering
by: Lim, Su Hyeon, et al.
Published: (2024)
by: Lim, Su Hyeon, et al.
Published: (2024)
STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes
by: Ishihara, Keishi, et al.
Published: (2025)
by: Ishihara, Keishi, et al.
Published: (2025)
MedLVR: Latent Visual Reasoning for Reliable Medical Visual Question Answering
by: Xi, Suyang, et al.
Published: (2026)
by: Xi, Suyang, et al.
Published: (2026)
Questioning the Stability of Visual Question Answering
by: Rosenfeld, Amir, et al.
Published: (2025)
by: Rosenfeld, Amir, et al.
Published: (2025)
Universal representations:The missing link between faces, text, planktons, and cat breeds
by: Bilen, Hakan, et al.
Published: (2017)
by: Bilen, Hakan, et al.
Published: (2017)
Geospatial Chain of Thought Reasoning for Enhanced Visual Question Answering on Satellite Imagery
by: Shanker, Shambhavi, et al.
Published: (2025)
by: Shanker, Shambhavi, et al.
Published: (2025)
Elevating Visual Question Answering through Implicitly Learned Reasoning Pathways in LVLMs
by: Jing, Liu, et al.
Published: (2025)
by: Jing, Liu, et al.
Published: (2025)
Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling
by: Movva, Prahitha, et al.
Published: (2025)
by: Movva, Prahitha, et al.
Published: (2025)
Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering
by: Fu, Xingyu, et al.
Published: (2023)
by: Fu, Xingyu, et al.
Published: (2023)
ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering
by: Tran, Duong T., et al.
Published: (2025)
by: Tran, Duong T., et al.
Published: (2025)
Question-Aware Gaussian Experts for Audio-Visual Question Answering
by: Kim, Hongyeob, et al.
Published: (2025)
by: Kim, Hongyeob, et al.
Published: (2025)
PAOLI: Pose-free Articulated Object Learning from Sparse-view Images
by: Deng, Jianning, et al.
Published: (2025)
by: Deng, Jianning, et al.
Published: (2025)
Evaluating Variance in Visual Question Answering Benchmarks
by: SR, Nikitha
Published: (2025)
by: SR, Nikitha
Published: (2025)
Odd-One-Out: Anomaly Detection by Comparing with Neighbors
by: Bhunia, Ankan, et al.
Published: (2024)
by: Bhunia, Ankan, et al.
Published: (2024)
Articulate your NeRF: Unsupervised articulated object modeling via conditional view synthesis
by: Deng, Jianning, et al.
Published: (2024)
by: Deng, Jianning, et al.
Published: (2024)
Looking 3D: Anomaly Detection with 2D-3D Alignment
by: Bhunia, Ankan, et al.
Published: (2024)
by: Bhunia, Ankan, et al.
Published: (2024)
Multimodal Rationales for Explainable Visual Question Answering
by: Li, Kun, et al.
Published: (2024)
by: Li, Kun, et al.
Published: (2024)
Breaking the Visual Shortcuts in Multimodal Knowledge-Based Visual Question Answering
by: Lee, Dosung, et al.
Published: (2025)
by: Lee, Dosung, et al.
Published: (2025)
Enhancing Scientific Visual Question Answering via Vision-Caption aware Supervised Fine-Tuning
by: Kapuriya, Janak, et al.
Published: (2025)
by: Kapuriya, Janak, et al.
Published: (2025)
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
by: Zhang, Xiaoman, et al.
Published: (2023)
by: Zhang, Xiaoman, et al.
Published: (2023)
Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering
by: Xue, Junxiao, et al.
Published: (2024)
by: Xue, Junxiao, et al.
Published: (2024)
WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering
by: Chen, Pingyi, et al.
Published: (2024)
by: Chen, Pingyi, et al.
Published: (2024)
Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
by: Wang, Zeqing, et al.
Published: (2023)
by: Wang, Zeqing, et al.
Published: (2023)
Progressive Multimodal Search and Reasoning for Knowledge-Intensive Visual Question Answering
by: Choi, Changin, et al.
Published: (2025)
by: Choi, Changin, et al.
Published: (2025)
Thought-For-Food: Reasoning Chain Induced Food Visual Question Answering
by: Jain, Riddhi, et al.
Published: (2025)
by: Jain, Riddhi, et al.
Published: (2025)
Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering
by: Romero, David, et al.
Published: (2024)
by: Romero, David, et al.
Published: (2024)
StaR-KVQA: Structured Reasoning Traces for Implicit-Knowledge Visual Question Answering
by: Wen, Zhihao, et al.
Published: (2025)
by: Wen, Zhihao, et al.
Published: (2025)
VQ-VA World: Towards High-Quality Visual Question-Visual Answering
by: Gou, Chenhui, et al.
Published: (2025)
by: Gou, Chenhui, et al.
Published: (2025)
VIHD: Visual Intervention-based Hallucination Detection for Medical Visual Question Answering
by: Chen, Jiayi, et al.
Published: (2026)
by: Chen, Jiayi, et al.
Published: (2026)
Index Light, Reason Deep: Deferred Visual Ingestion for Visual-Dense Document Question Answering
by: Xu, Tao
Published: (2026)
by: Xu, Tao
Published: (2026)
DriveLM: Driving with Graph Visual Question Answering
by: Sima, Chonghao, et al.
Published: (2023)
by: Sima, Chonghao, et al.
Published: (2023)
Object Retrieval for Visual Question Answering with Outside Knowledge
by: Kan, Shichao, et al.
Published: (2024)
by: Kan, Shichao, et al.
Published: (2024)
RECODE: Reasoning Through Code Generation for Visual Question Answering
by: Shen, Junhong, et al.
Published: (2025)
by: Shen, Junhong, et al.
Published: (2025)
Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps
by: Mariotti, Octave, et al.
Published: (2023)
by: Mariotti, Octave, et al.
Published: (2023)
Similar Items
-
HumMorph: Generalized Dynamic Human Neural Fields from Few Views
by: Zadrożny, Jakub, et al.
Published: (2025) -
MV-CoRe: Multimodal Visual-Conceptual Reasoning for Complex Visual Question Answering
by: Peng, Jingwei, et al.
Published: (2025) -
Selectively Answering Visual Questions
by: Eisenschlos, Julian Martin, et al.
Published: (2024) -
Targeted Visual Prompting for Medical Visual Question Answering
by: Tascon-Morales, Sergio, et al.
Published: (2024) -
Visual Robustness Benchmark for Visual Question Answering (VQA)
by: Ishmam, Md Farhan, et al.
Published: (2024)