Saved in:
Bibliographic Details
Main Authors: Simchoni, Giora, Rosset, Saharon
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.22198
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • We present MMbeddings, a probabilistic embedding approach that reinterprets categorical embeddings through the lens of nonlinear mixed models, effectively bridging classical statistical theory with modern deep learning. By treating embeddings as latent random effects within a variational autoencoder framework, our method substantially decreases the number of parameters -- from the conventional embedding approach of cardinality $\times$ embedding dimension, which quickly becomes infeasible with large cardinalities, to a significantly smaller, cardinality-independent number determined primarily by the encoder architecture. This reduction dramatically mitigates overfitting and computational burden in high-cardinality settings. Extensive experiments on simulated and real datasets, encompassing collaborative filtering and tabular regression tasks using varied architectures, demonstrate that MMbeddings consistently outperforms traditional embeddings, underscoring its potential across diverse machine learning applications.