Saved in:
| Main Authors: | Bhat, Sharat, Khandelwal, Harshita, Kataria, Tushar, Gupta, Vivek |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.10518 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MapIQ: Evaluating Multimodal Large Language Models for Map Question Answering
by: Srivastava, Varun, et al.
Published: (2025)
by: Srivastava, Varun, et al.
Published: (2025)
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
by: Pandya, Pranshu, et al.
Published: (2024)
by: Pandya, Pranshu, et al.
Published: (2024)
FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts
by: Singh, Shubhankar, et al.
Published: (2024)
by: Singh, Shubhankar, et al.
Published: (2024)
TransientTables: Evaluating LLMs' Reasoning on Temporally Evolving Semi-structured Tables
by: Shankarampeta, Abhilash, et al.
Published: (2025)
by: Shankarampeta, Abhilash, et al.
Published: (2025)
Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling
by: Movva, Prahitha, et al.
Published: (2025)
by: Movva, Prahitha, et al.
Published: (2025)
BoundarySeg:An Embarrassingly Simple Method To Boost Medical Image Segmentation Performance for Low Data Regimes
by: Kataria, Tushar, et al.
Published: (2025)
by: Kataria, Tushar, et al.
Published: (2025)
MMTABREAL: Real-World Benchmark for Multimodal Table Understanding
by: Titiya, Prasham, et al.
Published: (2025)
by: Titiya, Prasham, et al.
Published: (2025)
Cropland Mapping using Geospatial Embeddings
by: Zvonkov, Ivan, et al.
Published: (2025)
by: Zvonkov, Ivan, et al.
Published: (2025)
Re:Verse -- Can Your VLM Read a Manga?
by: Baranwal, Aaditya, et al.
Published: (2025)
by: Baranwal, Aaditya, et al.
Published: (2025)
Geospatial Chain of Thought Reasoning for Enhanced Visual Question Answering on Satellite Imagery
by: Shanker, Shambhavi, et al.
Published: (2025)
by: Shanker, Shambhavi, et al.
Published: (2025)
Knowledge-Aware Reasoning over Multimodal Semi-structured Tables
by: Mathur, Suyash Vardhan, et al.
Published: (2024)
by: Mathur, Suyash Vardhan, et al.
Published: (2024)
DomainVerse: A Benchmark Towards Real-World Distribution Shifts For Tuning-Free Adaptive Domain Generalization
by: Hou, Feng, et al.
Published: (2024)
by: Hou, Feng, et al.
Published: (2024)
Map-based Modular Approach for Zero-shot Embodied Question Answering
by: Sakamoto, Koya, et al.
Published: (2024)
by: Sakamoto, Koya, et al.
Published: (2024)
MASSM: An End-to-End Deep Learning Framework for Multi-Anatomy Statistical Shape Modeling Directly From Images
by: Ukey, Janmesh, et al.
Published: (2024)
by: Ukey, Janmesh, et al.
Published: (2024)
Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models
by: Agarwal, Sharat
Published: (2024)
by: Agarwal, Sharat
Published: (2024)
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
by: Islam, Md Mohaiminul, et al.
Published: (2025)
by: Islam, Md Mohaiminul, et al.
Published: (2025)
MapGCLR: Geospatial Contrastive Learning of Representations for Online Vectorized HD Map Construction
by: Merkert, Jonas, et al.
Published: (2026)
by: Merkert, Jonas, et al.
Published: (2026)
MorphoFlow: Sparse-Supervised Generative Shape Modeling with Adaptive Latent Relevance
by: Karanam, Mokshagna Sai Teja, et al.
Published: (2026)
by: Karanam, Mokshagna Sai Teja, et al.
Published: (2026)
Evaluating Variance in Visual Question Answering Benchmarks
by: SR, Nikitha
Published: (2025)
by: SR, Nikitha
Published: (2025)
On the Viability of Semi-Supervised Segmentation Methods for Statistical Shape Modeling
by: Khan, Asma, et al.
Published: (2024)
by: Khan, Asma, et al.
Published: (2024)
Authentic Emotion Mapping: Benchmarking Facial Expressions in Real News
by: Zhang, Qixuan, et al.
Published: (2024)
by: Zhang, Qixuan, et al.
Published: (2024)
Visual Robustness Benchmark for Visual Question Answering (VQA)
by: Ishmam, Md Farhan, et al.
Published: (2024)
by: Ishmam, Md Farhan, et al.
Published: (2024)
IMPLICITSTAINER: Resolution Agnostic Data-Efficient Virtual Staining Using Neural Implicit Functions
by: Kataria, Tushar, et al.
Published: (2025)
by: Kataria, Tushar, et al.
Published: (2025)
StainDiffuser: MultiTask Dual Diffusion Model for Virtual Staining
by: Kataria, Tushar, et al.
Published: (2024)
by: Kataria, Tushar, et al.
Published: (2024)
Exploring Real World Map Change Generalization of Prior-Informed HD Map Prediction Models
by: Bateman, Samuel M., et al.
Published: (2024)
by: Bateman, Samuel M., et al.
Published: (2024)
MedConcept: Unsupervised Concept Discovery for Interpretability in Medical VLMs
by: Haque, Md Rakibul, et al.
Published: (2026)
by: Haque, Md Rakibul, et al.
Published: (2026)
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
by: Tang, Jingqun, et al.
Published: (2024)
by: Tang, Jingqun, et al.
Published: (2024)
Combining Satellite and Weather Data for Crop Type Mapping: An Inverse Modelling Approach
by: Ravirathinam, Praveen, et al.
Published: (2024)
by: Ravirathinam, Praveen, et al.
Published: (2024)
SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking
by: Zhao, Weiguang, et al.
Published: (2026)
by: Zhao, Weiguang, et al.
Published: (2026)
Map2World: Segment Map Conditioned Text to 3D World Generation
by: Chung, Jaeyoung, et al.
Published: (2026)
by: Chung, Jaeyoung, et al.
Published: (2026)
Low-Rank Adaptation of Geospatial Foundation Models for Wildfire Mapping Using Sentinel-2 Data
by: Shibli, Ali, et al.
Published: (2026)
by: Shibli, Ali, et al.
Published: (2026)
Structure-Semantic Decoupled Modulation of Global Geospatial Embeddings for High-Resolution Remote Sensing Mapping
by: Lyu, Jienan, et al.
Published: (2026)
by: Lyu, Jienan, et al.
Published: (2026)
Landslide Hazard Mapping with Geospatial Foundation Models: Geographical Generalizability, Data Scarcity, and Band Adaptability
by: Li, Wenwen, et al.
Published: (2025)
by: Li, Wenwen, et al.
Published: (2025)
Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track
by: Gupta, Deepak, et al.
Published: (2024)
by: Gupta, Deepak, et al.
Published: (2024)
DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes
by: Al-Mohannadi, Aisha, et al.
Published: (2026)
by: Al-Mohannadi, Aisha, et al.
Published: (2026)
Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering
by: Jiang, Kaixuan, et al.
Published: (2025)
by: Jiang, Kaixuan, et al.
Published: (2025)
MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World
by: Dhiman, Ankit, et al.
Published: (2025)
by: Dhiman, Ankit, et al.
Published: (2025)
Answering Diverse Questions via Text Attached with Key Audio-Visual Clues
by: Ye, Qilang, et al.
Published: (2024)
by: Ye, Qilang, et al.
Published: (2024)
Hallucination Benchmark in Medical Visual Question Answering
by: Wu, Jinge, et al.
Published: (2024)
by: Wu, Jinge, et al.
Published: (2024)
ChartCheck: Explainable Fact-Checking over Real-World Chart Images
by: Akhtar, Mubashara, et al.
Published: (2023)
by: Akhtar, Mubashara, et al.
Published: (2023)
Similar Items
-
MapIQ: Evaluating Multimodal Large Language Models for Map Question Answering
by: Srivastava, Varun, et al.
Published: (2025) -
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
by: Pandya, Pranshu, et al.
Published: (2024) -
FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts
by: Singh, Shubhankar, et al.
Published: (2024) -
TransientTables: Evaluating LLMs' Reasoning on Temporally Evolving Semi-structured Tables
by: Shankarampeta, Abhilash, et al.
Published: (2025) -
Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling
by: Movva, Prahitha, et al.
Published: (2025)