Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wu, Peiran, Liu, Che, Chen, Canyu, Li, Jun, Bercea, Cosmin I., Arcucci, Rossella
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2410.01089
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908068456955904
author	Wu, Peiran Liu, Che Chen, Canyu Li, Jun Bercea, Cosmin I. Arcucci, Rossella
author_facet	Wu, Peiran Liu, Che Chen, Canyu Li, Jun Bercea, Cosmin I. Arcucci, Rossella
contents	Advancements in Multimodal Large Language Models (MLLMs) have significantly improved medical task performance, such as Visual Question Answering (VQA) and Report Generation (RG). However, the fairness of these models across diverse demographic groups remains underexplored, despite its importance in healthcare. This oversight is partly due to the lack of demographic diversity in existing medical multimodal datasets, which complicates the evaluation of fairness. In response, we propose FMBench, the first benchmark designed to evaluate the fairness of MLLMs performance across diverse demographic attributes. FMBench has the following key features: 1: It includes four demographic attributes: race, ethnicity, language, and gender, across two tasks, VQA and RG, under zero-shot settings. 2: Our VQA task is free-form, enhancing real-world applicability and mitigating the biases associated with predefined choices. 3: We utilize both lexical metrics and LLM-based metrics, aligned with clinical evaluations, to assess models not only for linguistic accuracy but also from a clinical perspective. Furthermore, we introduce a new metric, Fairness-Aware Performance (FAP), to evaluate how fairly MLLMs perform across various demographic attributes. We thoroughly evaluate the performance and fairness of eight state-of-the-art open-source MLLMs, including both general and medical MLLMs, ranging from 7B to 26B parameters on the proposed benchmark. We aim for FMBench to assist the research community in refining model evaluation and driving future advancements in the field. All data and code will be released upon acceptance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_01089
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks Wu, Peiran Liu, Che Chen, Canyu Li, Jun Bercea, Cosmin I. Arcucci, Rossella Computer Vision and Pattern Recognition Advancements in Multimodal Large Language Models (MLLMs) have significantly improved medical task performance, such as Visual Question Answering (VQA) and Report Generation (RG). However, the fairness of these models across diverse demographic groups remains underexplored, despite its importance in healthcare. This oversight is partly due to the lack of demographic diversity in existing medical multimodal datasets, which complicates the evaluation of fairness. In response, we propose FMBench, the first benchmark designed to evaluate the fairness of MLLMs performance across diverse demographic attributes. FMBench has the following key features: 1: It includes four demographic attributes: race, ethnicity, language, and gender, across two tasks, VQA and RG, under zero-shot settings. 2: Our VQA task is free-form, enhancing real-world applicability and mitigating the biases associated with predefined choices. 3: We utilize both lexical metrics and LLM-based metrics, aligned with clinical evaluations, to assess models not only for linguistic accuracy but also from a clinical perspective. Furthermore, we introduce a new metric, Fairness-Aware Performance (FAP), to evaluate how fairly MLLMs perform across various demographic attributes. We thoroughly evaluate the performance and fairness of eight state-of-the-art open-source MLLMs, including both general and medical MLLMs, ranging from 7B to 26B parameters on the proposed benchmark. We aim for FMBench to assist the research community in refining model evaluation and driving future advancements in the field. All data and code will be released upon acceptance.
title	FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2410.01089

Similar Items