Saved in:
Bibliographic Details
Main Authors: Anand, Emile, Steinhardt, Charles, Hansen, Martin
Format: Preprint
Published: 2022
Subjects:
Online Access:https://arxiv.org/abs/2211.14708
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915257391251456
author Anand, Emile
Steinhardt, Charles
Hansen, Martin
author_facet Anand, Emile
Steinhardt, Charles
Hansen, Martin
contents Civilizations have tried to make drinking water safe to consume for thousands of years. The process of determining water contaminants has evolved with the complexity of the contaminants due to pesticides and heavy metals. The routine procedure to determine water safety is to use targeted analysis which searches for specific substances from some known list; however, we do not explicitly know which substances should be on this list. Before experimentally determining which substances are contaminants, how do we answer the sampling problem of identifying all the substances in the water? Here, we present an approach that builds on the work of Jaanus Liigand et al., which used non-targeted analysis that conducts a broader search on the sample to develop a random-forest regression model, to predict the names of all the substances in a sample, as well as their respective concentrations[1]. This work utilizes techniques from dimensionality reduction and linear decompositions to present a more accurate model using data from the European Massbank Metabolome Library to produce a global list of chemicals that researchers can then identify and test for when purifying water.
format Preprint
id arxiv_https___arxiv_org_abs_2211_14708
institution arXiv
publishDate 2022
record_format arxiv
spellingShingle Identifying Chemicals Through Dimensionality Reduction
Anand, Emile
Steinhardt, Charles
Hansen, Martin
Quantitative Methods
Databases
Machine Learning
68T99
I.2; I.m
Civilizations have tried to make drinking water safe to consume for thousands of years. The process of determining water contaminants has evolved with the complexity of the contaminants due to pesticides and heavy metals. The routine procedure to determine water safety is to use targeted analysis which searches for specific substances from some known list; however, we do not explicitly know which substances should be on this list. Before experimentally determining which substances are contaminants, how do we answer the sampling problem of identifying all the substances in the water? Here, we present an approach that builds on the work of Jaanus Liigand et al., which used non-targeted analysis that conducts a broader search on the sample to develop a random-forest regression model, to predict the names of all the substances in a sample, as well as their respective concentrations[1]. This work utilizes techniques from dimensionality reduction and linear decompositions to present a more accurate model using data from the European Massbank Metabolome Library to produce a global list of chemicals that researchers can then identify and test for when purifying water.
title Identifying Chemicals Through Dimensionality Reduction
topic Quantitative Methods
Databases
Machine Learning
68T99
I.2; I.m
url https://arxiv.org/abs/2211.14708