:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bhat, Sharat, Khandelwal, Harshita, Kataria, Tushar, Gupta, Vivek
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.10518
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MapIQ: Evaluating Multimodal Large Language Models for Map Question Answering
by: Srivastava, Varun, et al.
Published: (2025)

NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
by: Pandya, Pranshu, et al.
Published: (2024)

FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts
by: Singh, Shubhankar, et al.
Published: (2024)

TransientTables: Evaluating LLMs' Reasoning on Temporally Evolving Semi-structured Tables
by: Shankarampeta, Abhilash, et al.
Published: (2025)

Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling
by: Movva, Prahitha, et al.
Published: (2025)

BoundarySeg:An Embarrassingly Simple Method To Boost Medical Image Segmentation Performance for Low Data Regimes
by: Kataria, Tushar, et al.
Published: (2025)

MMTABREAL: Real-World Benchmark for Multimodal Table Understanding
by: Titiya, Prasham, et al.
Published: (2025)

Cropland Mapping using Geospatial Embeddings
by: Zvonkov, Ivan, et al.
Published: (2025)

Re:Verse -- Can Your VLM Read a Manga?
by: Baranwal, Aaditya, et al.
Published: (2025)

Geospatial Chain of Thought Reasoning for Enhanced Visual Question Answering on Satellite Imagery
by: Shanker, Shambhavi, et al.
Published: (2025)

Knowledge-Aware Reasoning over Multimodal Semi-structured Tables
by: Mathur, Suyash Vardhan, et al.
Published: (2024)

DomainVerse: A Benchmark Towards Real-World Distribution Shifts For Tuning-Free Adaptive Domain Generalization
by: Hou, Feng, et al.
Published: (2024)

Map-based Modular Approach for Zero-shot Embodied Question Answering
by: Sakamoto, Koya, et al.
Published: (2024)

MASSM: An End-to-End Deep Learning Framework for Multi-Anatomy Statistical Shape Modeling Directly From Images
by: Ukey, Janmesh, et al.
Published: (2024)

Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models
by: Agarwal, Sharat
Published: (2024)

BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
by: Islam, Md Mohaiminul, et al.
Published: (2025)

MapGCLR: Geospatial Contrastive Learning of Representations for Online Vectorized HD Map Construction
by: Merkert, Jonas, et al.
Published: (2026)

MorphoFlow: Sparse-Supervised Generative Shape Modeling with Adaptive Latent Relevance
by: Karanam, Mokshagna Sai Teja, et al.
Published: (2026)

Evaluating Variance in Visual Question Answering Benchmarks
by: SR, Nikitha
Published: (2025)

On the Viability of Semi-Supervised Segmentation Methods for Statistical Shape Modeling
by: Khan, Asma, et al.
Published: (2024)

Authentic Emotion Mapping: Benchmarking Facial Expressions in Real News
by: Zhang, Qixuan, et al.
Published: (2024)

Visual Robustness Benchmark for Visual Question Answering (VQA)
by: Ishmam, Md Farhan, et al.
Published: (2024)

IMPLICITSTAINER: Resolution Agnostic Data-Efficient Virtual Staining Using Neural Implicit Functions
by: Kataria, Tushar, et al.
Published: (2025)

StainDiffuser: MultiTask Dual Diffusion Model for Virtual Staining
by: Kataria, Tushar, et al.
Published: (2024)

Exploring Real World Map Change Generalization of Prior-Informed HD Map Prediction Models
by: Bateman, Samuel M., et al.
Published: (2024)

MedConcept: Unsupervised Concept Discovery for Interpretability in Medical VLMs
by: Haque, Md Rakibul, et al.
Published: (2026)

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
by: Tang, Jingqun, et al.
Published: (2024)

Combining Satellite and Weather Data for Crop Type Mapping: An Inverse Modelling Approach
by: Ravirathinam, Praveen, et al.
Published: (2024)

SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking
by: Zhao, Weiguang, et al.
Published: (2026)

Map2World: Segment Map Conditioned Text to 3D World Generation
by: Chung, Jaeyoung, et al.
Published: (2026)

Low-Rank Adaptation of Geospatial Foundation Models for Wildfire Mapping Using Sentinel-2 Data
by: Shibli, Ali, et al.
Published: (2026)

Structure-Semantic Decoupled Modulation of Global Geospatial Embeddings for High-Resolution Remote Sensing Mapping
by: Lyu, Jienan, et al.
Published: (2026)

Landslide Hazard Mapping with Geospatial Foundation Models: Geographical Generalizability, Data Scarcity, and Band Adaptability
by: Li, Wenwen, et al.
Published: (2025)

Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track
by: Gupta, Deepak, et al.
Published: (2024)

DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes
by: Al-Mohannadi, Aisha, et al.
Published: (2026)

Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering
by: Jiang, Kaixuan, et al.
Published: (2025)

MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World
by: Dhiman, Ankit, et al.
Published: (2025)

Answering Diverse Questions via Text Attached with Key Audio-Visual Clues
by: Ye, Qilang, et al.
Published: (2024)

Hallucination Benchmark in Medical Visual Question Answering
by: Wu, Jinge, et al.
Published: (2024)

ChartCheck: Explainable Fact-Checking over Real-World Chart Images
by: Akhtar, Mubashara, et al.
Published: (2023)