:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mao, Yuren, Xu, Wenyi, Qin, Yuyang, Gao, Yunjun
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2505.16229
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MARCH: Multi-Agent Radiology Clinical Hierarchy for CT Report Generation
by: Lin, Yi, et al.
Published: (2026)

RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering
by: Butsanets, Léo, et al.
Published: (2025)

MedErr-CT: A Visual Question Answering Benchmark for Identifying and Correcting Errors in CT Reports
by: Kyung, Sunggu, et al.
Published: (2025)

Warehouse Spatial Question Answering with LLM Agent
by: Huang, Hsiang-Wei, et al.
Published: (2025)

MedRegion-CT: Region-Focused Multimodal LLM for Comprehensive 3D CT Report Generation
by: Kyung, Sunggu, et al.
Published: (2025)

Radiology Report Conditional 3D CT Generation with Multi Encoder Latent diffusion Model
by: Amirrajab, Sina, et al.
Published: (2025)

CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios
by: Lin, Jingyang, et al.
Published: (2024)

Free Form Medical Visual Question Answering in Radiology
by: Narayanan, Abhishek, et al.
Published: (2024)

Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering
by: Xue, Junxiao, et al.
Published: (2024)

CT2Rep: Automated Radiology Report Generation for 3D Medical Imaging
by: Hamamci, Ibrahim Ethem, et al.
Published: (2024)

VDMA: Video Question Answering with Dynamically Generated Multi-Agents
by: Kugo, Noriyuki, et al.
Published: (2024)

GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
by: Hamamci, Ibrahim Ethem, et al.
Published: (2023)

VideoMultiAgents: A Multi-Agent Framework for Video Question Answering
by: Kugo, Noriyuki, et al.
Published: (2025)

3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models
by: Chen, Hao, et al.
Published: (2024)

Learning to Search: A Decision-Based Agent for Knowledge-Based Visual Question Answering
by: Chen, Zhuohong, et al.
Published: (2026)

Evaluating Hallucination in Text-to-Image Diffusion Models with Scene-Graph based Question-Answering Agent
by: Qin, Ziyuan, et al.
Published: (2024)

Imitating Radiological Scrolling: A Global-Local Attention Model for 3D Chest CT Volumes Multi-Label Anomaly Classification
by: Di Piazza, Theo, et al.
Published: (2025)

Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports
by: Serra, Francesco Dalla, et al.
Published: (2025)

ORCA: Orchestrated Reasoning with Collaborative Agents for Document Visual Question Answering
by: Lassoued, Aymen, et al.
Published: (2026)

Reflective Dialogue between Teacher and Solver Agents for Video Question Answering
by: Murakawa, Takuya, et al.
Published: (2026)

ChartAgent: A Multimodal Agent for Visually Grounded Reasoning in Complex Chart Question Answering
by: Kaur, Rachneet, et al.
Published: (2025)

Adapting Lightweight Vision Language Models for Radiological Visual Question Answering
by: Shourya, Aditya, et al.
Published: (2025)

3D-MoRe: Unified Modal-Contextual Reasoning for Embodied Question Answering
by: Xu, Rongtao, et al.
Published: (2025)

SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
by: Qiu, Jielin, et al.
Published: (2024)

PETWB-REP: A Multi-Cancer Whole-Body FDG PET/CT and Radiology Report Dataset for Medical Imaging Research
by: Xue, Le, et al.
Published: (2025)

Multimodal Rationales for Explainable Visual Question Answering
by: Li, Kun, et al.
Published: (2024)

MTA-Agent: An Open Recipe for Multimodal Deep Search Agents
by: Peng, Xiangyu, et al.
Published: (2026)

D-PerceptCT: Deep Perceptual Enhancement for Low-Dose CT Images
by: Nabila, Taifour Yousra, et al.
Published: (2025)

Sketch2CT: Multimodal Diffusion for Structure-Aware 3D Medical Volume Generation
by: An, Delin, et al.
Published: (2026)

Opportunistic Promptable Segmentation: Leveraging Routine Radiological Annotations to Guide 3D CT Lesion Segmentation
by: Church, Samuel, et al.
Published: (2026)

Barriers in Integrating Medical Visual Question Answering into Radiology Workflows: A Scoping Review and Clinicians' Insights
by: Mishra, Deepali, et al.
Published: (2025)

Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
by: Wang, Zeqing, et al.
Published: (2023)

3D Question Answering for City Scene Understanding
by: Sun, Penglei, et al.
Published: (2024)

MobileFlow: A Multimodal LLM For Mobile GUI Agent
by: Nong, Songqin, et al.
Published: (2024)

MedLSAM: Localize and Segment Anything Model for 3D CT Images
by: Lei, Wenhui, et al.
Published: (2023)

Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation
by: Xu, Quanxing, et al.
Published: (2026)

A Short Review and Evaluation of SAM2's Performance in 3D CT Image Segmentation
by: He, Yufan, et al.
Published: (2024)

GLeVE: Graph-Guided Lesion Grounding with Proposal Verification in 3D CT
by: Jiang, Shuo, et al.
Published: (2026)

Space3D-Bench: Spatial 3D Question Answering Benchmark
by: Szymanska, Emilia, et al.
Published: (2024)

Foundation VAEs for 3D CT Reconstruction, Augmentation, and Generation
by: Chen, Qi, et al.
Published: (2026)