Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Koddenbrock, Mario, Lange, Christoph, Legner, Robin, Jäger, Martin, Kögler, Martin, Bournazou, Mariano N. Cruz, Neubauer, Peter, Biessmann, Felix, Rodner, Erik
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.02003
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917463041507328
author	Koddenbrock, Mario Lange, Christoph Legner, Robin Jäger, Martin Kögler, Martin Bournazou, Mariano N. Cruz Neubauer, Peter Biessmann, Felix Rodner, Erik
author_facet	Koddenbrock, Mario Lange, Christoph Legner, Robin Jäger, Martin Kögler, Martin Bournazou, Mariano N. Cruz Neubauer, Peter Biessmann, Felix Rodner, Erik
contents	Machine Learning (ML) has transformed many scientific fields, yet key applications still lack standardized benchmarks. Raman spectroscopy, a widely used technique for non-invasive molecular analysis, is one such field where progress is limited by fragmented datasets, inconsistent evaluation, and models that fail to capture the structure of spectral data. We introduce RamanBench, the first large-scale, fully reproducible benchmark for ML on Raman spectroscopy, consisting of streamlined data access, evaluation protocols and code, as well as a live leaderboard. It unifies 74 datasets (including 16 first released with this benchmark) across four domains, comprising 325,668 spectra and spanning classification and regression tasks under diverse experimental conditions. We benchmark 28 models under a standardized protocol, including classical methods (e.g., PLS), Raman-specific (e.g., RamanNet), Tabular Foundation Model (TFM) (e.g., TabPFN), and time-series approaches (e.g., ROCKET). TFM consistently outperform domain-specific and gradient boosting baselines, while time-series models remain competitive. However, no method generalizes across datasets, revealing a fundamental gap. Therefore, we invite the community to contribute new approaches to our living benchmark, with the potential to accelerate advances in critical applications such as medical diagnostics, biological research, and materials science.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_02003
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy Koddenbrock, Mario Lange, Christoph Legner, Robin Jäger, Martin Kögler, Martin Bournazou, Mariano N. Cruz Neubauer, Peter Biessmann, Felix Rodner, Erik Machine Learning Artificial Intelligence Machine Learning (ML) has transformed many scientific fields, yet key applications still lack standardized benchmarks. Raman spectroscopy, a widely used technique for non-invasive molecular analysis, is one such field where progress is limited by fragmented datasets, inconsistent evaluation, and models that fail to capture the structure of spectral data. We introduce RamanBench, the first large-scale, fully reproducible benchmark for ML on Raman spectroscopy, consisting of streamlined data access, evaluation protocols and code, as well as a live leaderboard. It unifies 74 datasets (including 16 first released with this benchmark) across four domains, comprising 325,668 spectra and spanning classification and regression tasks under diverse experimental conditions. We benchmark 28 models under a standardized protocol, including classical methods (e.g., PLS), Raman-specific (e.g., RamanNet), Tabular Foundation Model (TFM) (e.g., TabPFN), and time-series approaches (e.g., ROCKET). TFM consistently outperform domain-specific and gradient boosting baselines, while time-series models remain competitive. However, no method generalizes across datasets, revealing a fundamental gap. Therefore, we invite the community to contribute new approaches to our living benchmark, with the potential to accelerate advances in critical applications such as medical diagnostics, biological research, and materials science.
title	RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2605.02003

Similar Items