Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Barrowclough, George, Andrecki, Marian, Shinner, James, Donghi, Daniele
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2507.06021
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911045538283520
author	Barrowclough, George Andrecki, Marian Shinner, James Donghi, Daniele
author_facet	Barrowclough, George Andrecki, Marian Shinner, James Donghi, Daniele
contents	In production recommender systems, feature preprocessing must be faithfully replicated across training and inference environments. This often requires duplicating logic between offline and online environments, increasing engineering effort and introducing risks of dataset shift. We present Kamae, an open-source Python library that bridges this gap by translating PySpark preprocessing pipelines into equivalent Keras models. Kamae provides a suite of configurable Spark transformers and estimators, each mapped to a corresponding Keras layer, enabling consistent, end-to-end preprocessing across the ML lifecycle. Framework's utility is illustrated on real-world use cases, including MovieLens dataset and Expedia's Learning-to-Rank pipelines. The code is available at https://github.com/ExpediaGroup/kamae.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_06021
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Kamae: Bridging Spark and Keras for Seamless ML Preprocessing Barrowclough, George Andrecki, Marian Shinner, James Donghi, Daniele Machine Learning In production recommender systems, feature preprocessing must be faithfully replicated across training and inference environments. This often requires duplicating logic between offline and online environments, increasing engineering effort and introducing risks of dataset shift. We present Kamae, an open-source Python library that bridges this gap by translating PySpark preprocessing pipelines into equivalent Keras models. Kamae provides a suite of configurable Spark transformers and estimators, each mapped to a corresponding Keras layer, enabling consistent, end-to-end preprocessing across the ML lifecycle. Framework's utility is illustrated on real-world use cases, including MovieLens dataset and Expedia's Learning-to-Rank pipelines. The code is available at https://github.com/ExpediaGroup/kamae.
title	Kamae: Bridging Spark and Keras for Seamless ML Preprocessing
topic	Machine Learning
url	https://arxiv.org/abs/2507.06021

Similar Items