Saved in:
| Main Authors: | Pal, Ankit, Lee, Jung-Oh, Zhang, Xiaoman, Sankarasubbu, Malaikannan, Roh, Seunghyeon, Kim, Won Jung, Lee, Meesun, Rajpurkar, Pranav |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.04353 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations
by: Pal, Ankit, et al.
Published: (2024)
by: Pal, Ankit, et al.
Published: (2024)
ReXSonoVQA: A Video QA Benchmark for Procedure-Centric Ultrasound Understanding
by: Wang, Xucheng, et al.
Published: (2026)
by: Wang, Xucheng, et al.
Published: (2026)
ReX-MLE: The Autonomous Agent Benchmark for Medical Imaging Challenges
by: Kenia, Roshan, et al.
Published: (2025)
by: Kenia, Roshan, et al.
Published: (2025)
ReXGradient-160K: A Large-Scale Publicly Available Dataset of Chest Radiographs with Free-text Reports
by: Zhang, Xiaoman, et al.
Published: (2025)
by: Zhang, Xiaoman, et al.
Published: (2025)
FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models
by: Heiman, Alice, et al.
Published: (2024)
by: Heiman, Alice, et al.
Published: (2024)
Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs
by: Zhang, Xiaoman, et al.
Published: (2024)
by: Zhang, Xiaoman, et al.
Published: (2024)
XLQA: A Benchmark for Locale-Aware Multilingual Open-Domain Question Answering
by: Roh, Keon-Woo, et al.
Published: (2025)
by: Roh, Keon-Woo, et al.
Published: (2025)
KoBBQ: Korean Bias Benchmark for Question Answering
by: Jin, Jiho, et al.
Published: (2023)
by: Jin, Jiho, et al.
Published: (2023)
ReXrank: A Public Leaderboard for AI-Powered Radiology Report Generation
by: Zhang, Xiaoman, et al.
Published: (2024)
by: Zhang, Xiaoman, et al.
Published: (2024)
CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays
by: Lee, Hyungyung, et al.
Published: (2025)
by: Lee, Hyungyung, et al.
Published: (2025)
Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?
by: Yuan, Grace Chang, et al.
Published: (2026)
by: Yuan, Grace Chang, et al.
Published: (2026)
ReXInTheWild: A Unified Benchmark for Medical Photograph Understanding
by: Banerjee, Oishi, et al.
Published: (2026)
by: Banerjee, Oishi, et al.
Published: (2026)
AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
by: Kim, Minbeom, et al.
Published: (2024)
by: Kim, Minbeom, et al.
Published: (2024)
Question-Aware Gaussian Experts for Audio-Visual Question Answering
by: Kim, Hongyeob, et al.
Published: (2025)
by: Kim, Hongyeob, et al.
Published: (2025)
Denoising Table-Text Retrieval for Open-Domain Question Answering
by: Kang, Deokhyung, et al.
Published: (2024)
by: Kang, Deokhyung, et al.
Published: (2024)
MedVersa: A Generalist Foundation Model for Medical Image Interpretation
by: Zhou, Hong-Yu, et al.
Published: (2024)
by: Zhou, Hong-Yu, et al.
Published: (2024)
Who's Asking? Evaluating LLM Robustness to Inquiry Personas in Factual Question Answering
by: Akpinar, Nil-Jana, et al.
Published: (2025)
by: Akpinar, Nil-Jana, et al.
Published: (2025)
Actions and Objects Pathways for Domain Adaptation in Video Question Answering
by: Mohamud, Safaa Abdullahi Moallim, et al.
Published: (2024)
by: Mohamud, Safaa Abdullahi Moallim, et al.
Published: (2024)
Confidence-guided Refinement Reasoning for Zero-shot Question Answering
by: Jang, Youwon, et al.
Published: (2025)
by: Jang, Youwon, et al.
Published: (2025)
ReSCORE: Label-free Iterative Retriever Training for Multi-hop Question Answering with Relevance-Consistency Supervision
by: Lee, Dosung, et al.
Published: (2025)
by: Lee, Dosung, et al.
Published: (2025)
Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality
by: Park, Kyu Ri, et al.
Published: (2024)
by: Park, Kyu Ri, et al.
Published: (2024)
ReXTrust: A Model for Fine-Grained Hallucination Detection in AI-Generated Radiology Reports
by: Hardy, Romain, et al.
Published: (2024)
by: Hardy, Romain, et al.
Published: (2024)
ReXErr: Synthesizing Clinically Meaningful Errors in Diagnostic Radiology Reports
by: Rao, Vishwanatha M., et al.
Published: (2024)
by: Rao, Vishwanatha M., et al.
Published: (2024)
FIQ: Fundamental Question Generation with the Integration of Question Embeddings for Video Question Answering
by: Oh, Ju-Young, et al.
Published: (2025)
by: Oh, Ju-Young, et al.
Published: (2025)
A Perspective for Adapting Generalist AI to Specialized Medical AI Applications and Their Challenges
by: Wang, Zifeng, et al.
Published: (2024)
by: Wang, Zifeng, et al.
Published: (2024)
Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports
by: Serra, Francesco Dalla, et al.
Published: (2025)
by: Serra, Francesco Dalla, et al.
Published: (2025)
SCARE: A Benchmark for SQL Correction and Question Answerability Classification for Reliable EHR Question Answering
by: Lee, Gyubok, et al.
Published: (2025)
by: Lee, Gyubok, et al.
Published: (2025)
DivCon-NeRF: Diverse and Consistent Ray Augmentation for Few-Shot NeRF
by: Lee, Ingyun, et al.
Published: (2025)
by: Lee, Ingyun, et al.
Published: (2025)
Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding
by: So, Yeonkyoung, et al.
Published: (2025)
by: So, Yeonkyoung, et al.
Published: (2025)
Evaluating Contextual Intelligence in Recyclability: A Comprehensive Study of Image-Based Reasoning Systems
by: Park, Eliot, et al.
Published: (2025)
by: Park, Eliot, et al.
Published: (2025)
IPQA: A Benchmark for Core Intent Identification in Personalized Question Answering
by: Kim, Jieyong, et al.
Published: (2025)
by: Kim, Jieyong, et al.
Published: (2025)
Breaking the Visual Shortcuts in Multimodal Knowledge-Based Visual Question Answering
by: Lee, Dosung, et al.
Published: (2025)
by: Lee, Dosung, et al.
Published: (2025)
Thunder-KoNUBench: A Corpus-Aligned Benchmark for Korean Negation Understanding
by: Jung, Sungmok, et al.
Published: (2026)
by: Jung, Sungmok, et al.
Published: (2026)
Voice-guided Orchestrated Intelligence for Clinical Evaluation (VOICE): A Voice AI Agent System for Prehospital Stroke Assessment
by: Acosta, Julian, et al.
Published: (2025)
by: Acosta, Julian, et al.
Published: (2025)
SEAL: Semantic-aware Single-image Sticker Personalization with a Large-scale Sticker-tag Dataset
by: Roh, Changhyun, et al.
Published: (2026)
by: Roh, Changhyun, et al.
Published: (2026)
ColonCrafter: A Depth Estimation Model for Colonoscopy Videos Using Diffusion Priors
by: Hardy, Romain, et al.
Published: (2025)
by: Hardy, Romain, et al.
Published: (2025)
Lunguage: A Benchmark for Structured and Sequential Chest X-ray Interpretation
by: Moon, Jong Hak, et al.
Published: (2025)
by: Moon, Jong Hak, et al.
Published: (2025)
ReGraM: Region-First Knowledge Graph Reasoning for Medical Question Answering
by: Lee, Chaerin, et al.
Published: (2026)
by: Lee, Chaerin, et al.
Published: (2026)
Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays
by: Cho, Yeongjae, et al.
Published: (2024)
by: Cho, Yeongjae, et al.
Published: (2024)
Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering
by: Park, ChaeHun, et al.
Published: (2024)
by: Park, ChaeHun, et al.
Published: (2024)
Similar Items
-
Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations
by: Pal, Ankit, et al.
Published: (2024) -
ReXSonoVQA: A Video QA Benchmark for Procedure-Centric Ultrasound Understanding
by: Wang, Xucheng, et al.
Published: (2026) -
ReX-MLE: The Autonomous Agent Benchmark for Medical Imaging Challenges
by: Kenia, Roshan, et al.
Published: (2025) -
ReXGradient-160K: A Large-Scale Publicly Available Dataset of Chest Radiographs with Free-text Reports
by: Zhang, Xiaoman, et al.
Published: (2025) -
FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models
by: Heiman, Alice, et al.
Published: (2024)