Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Shen, Jialu, Lyu, Han, Zhong, Suyang, Li, Hanzheng, Tao, Haoyi, Wang, Nan, Chen, Changhong, Fang, Xi
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.28039
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918475935514624
author	Shen, Jialu Lyu, Han Zhong, Suyang Li, Hanzheng Tao, Haoyi Wang, Nan Chen, Changhong Fang, Xi
author_facet	Shen, Jialu Lyu, Han Zhong, Suyang Li, Hanzheng Tao, Haoyi Wang, Nan Chen, Changhong Fang, Xi
contents	Spectra are a prevalent yet highly information-dense form of scientific imagery, presenting substantial challenges to multimodal large language models (MLLMs) due to their unstructured and domain-specific characteristics. Here we introduce SpecVQA, a professional scientific-image benchmark for evaluating multimodal models on scientific spectral understanding, covering 7 representative spectrum types with expert-annotated question-answer pairs. The aim comprises two aspects: spectra scientific QA evaluation and corresponding underlying task evaluation. SpecVQA contains 620 figures and 3100 QA pairs curated from peer-reviewed literature, targeting both direct information extraction and domain-specific reasoning. To effectively reduce token length while preserving essential curve characteristics, we propose a spectral data sampling and interpolation reconstruction approach. Ablation studies further confirm that the approach achieves substantial performance improvements on the proposed benchmark. We test the capability of prominent MLLMs in scientific spectral understanding on our benchmark and present a leaderboard. This work represents an essential step toward enhancing spectral understanding in multimodal large models and suggests promising directions for extending visual-language models to broader scientific research and data analysis.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_28039
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images Shen, Jialu Lyu, Han Zhong, Suyang Li, Hanzheng Tao, Haoyi Wang, Nan Chen, Changhong Fang, Xi Artificial Intelligence Spectra are a prevalent yet highly information-dense form of scientific imagery, presenting substantial challenges to multimodal large language models (MLLMs) due to their unstructured and domain-specific characteristics. Here we introduce SpecVQA, a professional scientific-image benchmark for evaluating multimodal models on scientific spectral understanding, covering 7 representative spectrum types with expert-annotated question-answer pairs. The aim comprises two aspects: spectra scientific QA evaluation and corresponding underlying task evaluation. SpecVQA contains 620 figures and 3100 QA pairs curated from peer-reviewed literature, targeting both direct information extraction and domain-specific reasoning. To effectively reduce token length while preserving essential curve characteristics, we propose a spectral data sampling and interpolation reconstruction approach. Ablation studies further confirm that the approach achieves substantial performance improvements on the proposed benchmark. We test the capability of prominent MLLMs in scientific spectral understanding on our benchmark and present a leaderboard. This work represents an essential step toward enhancing spectral understanding in multimodal large models and suggests promising directions for extending visual-language models to broader scientific research and data analysis.
title	SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images
topic	Artificial Intelligence
url	https://arxiv.org/abs/2604.28039

Similar Items