:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Pal, Ankit, Lee, Jung-Oh, Zhang, Xiaoman, Sankarasubbu, Malaikannan, Roh, Seunghyeon, Kim, Won Jung, Lee, Meesun, Rajpurkar, Pranav
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computational Engineering, Finance, and Science Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2506.04353
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations
by: Pal, Ankit, et al.
Published: (2024)

ReXSonoVQA: A Video QA Benchmark for Procedure-Centric Ultrasound Understanding
by: Wang, Xucheng, et al.
Published: (2026)

ReX-MLE: The Autonomous Agent Benchmark for Medical Imaging Challenges
by: Kenia, Roshan, et al.
Published: (2025)

ReXGradient-160K: A Large-Scale Publicly Available Dataset of Chest Radiographs with Free-text Reports
by: Zhang, Xiaoman, et al.
Published: (2025)

FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models
by: Heiman, Alice, et al.
Published: (2024)

Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs
by: Zhang, Xiaoman, et al.
Published: (2024)

XLQA: A Benchmark for Locale-Aware Multilingual Open-Domain Question Answering
by: Roh, Keon-Woo, et al.
Published: (2025)

KoBBQ: Korean Bias Benchmark for Question Answering
by: Jin, Jiho, et al.
Published: (2023)

ReXrank: A Public Leaderboard for AI-Powered Radiology Report Generation
by: Zhang, Xiaoman, et al.
Published: (2024)

CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays
by: Lee, Hyungyung, et al.
Published: (2025)

Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?
by: Yuan, Grace Chang, et al.
Published: (2026)

ReXInTheWild: A Unified Benchmark for Medical Photograph Understanding
by: Banerjee, Oishi, et al.
Published: (2026)

AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
by: Kim, Minbeom, et al.
Published: (2024)

Question-Aware Gaussian Experts for Audio-Visual Question Answering
by: Kim, Hongyeob, et al.
Published: (2025)

Denoising Table-Text Retrieval for Open-Domain Question Answering
by: Kang, Deokhyung, et al.
Published: (2024)

MedVersa: A Generalist Foundation Model for Medical Image Interpretation
by: Zhou, Hong-Yu, et al.
Published: (2024)

Who's Asking? Evaluating LLM Robustness to Inquiry Personas in Factual Question Answering
by: Akpinar, Nil-Jana, et al.
Published: (2025)

Actions and Objects Pathways for Domain Adaptation in Video Question Answering
by: Mohamud, Safaa Abdullahi Moallim, et al.
Published: (2024)

Confidence-guided Refinement Reasoning for Zero-shot Question Answering
by: Jang, Youwon, et al.
Published: (2025)

ReSCORE: Label-free Iterative Retriever Training for Multi-hop Question Answering with Relevance-Consistency Supervision
by: Lee, Dosung, et al.
Published: (2025)

Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality
by: Park, Kyu Ri, et al.
Published: (2024)

ReXTrust: A Model for Fine-Grained Hallucination Detection in AI-Generated Radiology Reports
by: Hardy, Romain, et al.
Published: (2024)

ReXErr: Synthesizing Clinically Meaningful Errors in Diagnostic Radiology Reports
by: Rao, Vishwanatha M., et al.
Published: (2024)

FIQ: Fundamental Question Generation with the Integration of Question Embeddings for Video Question Answering
by: Oh, Ju-Young, et al.
Published: (2025)

A Perspective for Adapting Generalist AI to Specialized Medical AI Applications and Their Challenges
by: Wang, Zifeng, et al.
Published: (2024)

Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports
by: Serra, Francesco Dalla, et al.
Published: (2025)

SCARE: A Benchmark for SQL Correction and Question Answerability Classification for Reliable EHR Question Answering
by: Lee, Gyubok, et al.
Published: (2025)

DivCon-NeRF: Diverse and Consistent Ray Augmentation for Few-Shot NeRF
by: Lee, Ingyun, et al.
Published: (2025)

Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding
by: So, Yeonkyoung, et al.
Published: (2025)

Evaluating Contextual Intelligence in Recyclability: A Comprehensive Study of Image-Based Reasoning Systems
by: Park, Eliot, et al.
Published: (2025)

IPQA: A Benchmark for Core Intent Identification in Personalized Question Answering
by: Kim, Jieyong, et al.
Published: (2025)

Breaking the Visual Shortcuts in Multimodal Knowledge-Based Visual Question Answering
by: Lee, Dosung, et al.
Published: (2025)

Thunder-KoNUBench: A Corpus-Aligned Benchmark for Korean Negation Understanding
by: Jung, Sungmok, et al.
Published: (2026)

Voice-guided Orchestrated Intelligence for Clinical Evaluation (VOICE): A Voice AI Agent System for Prehospital Stroke Assessment
by: Acosta, Julian, et al.
Published: (2025)

SEAL: Semantic-aware Single-image Sticker Personalization with a Large-scale Sticker-tag Dataset
by: Roh, Changhyun, et al.
Published: (2026)

ColonCrafter: A Depth Estimation Model for Colonoscopy Videos Using Diffusion Priors
by: Hardy, Romain, et al.
Published: (2025)

Lunguage: A Benchmark for Structured and Sequential Chest X-ray Interpretation
by: Moon, Jong Hak, et al.
Published: (2025)

ReGraM: Region-First Knowledge Graph Reasoning for Medical Question Answering
by: Lee, Chaerin, et al.
Published: (2026)

Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays
by: Cho, Yeongjae, et al.
Published: (2024)

Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering
by: Park, ChaeHun, et al.
Published: (2024)