Saved in:
Bibliographic Details
Main Authors: Bao, Zhiwei, Liao-Liao, Liu, Wu, Zhiyu, Zhou, Yifan, Fan, Dan, Aibin, Michal, Coady, Yvonne, Brownsword, Andrew
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2405.03708
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910443888443392
author Bao, Zhiwei
Liao-Liao, Liu
Wu, Zhiyu
Zhou, Yifan
Fan, Dan
Aibin, Michal
Coady, Yvonne
Brownsword, Andrew
author_facet Bao, Zhiwei
Liao-Liao, Liu
Wu, Zhiyu
Zhou, Yifan
Fan, Dan
Aibin, Michal
Coady, Yvonne
Brownsword, Andrew
contents The exponential growth of artificial intelligence (AI) and machine learning (ML) applications has necessitated the development of efficient storage solutions for vector and tensor data. This paper presents a novel approach for tensor storage in a Lakehouse architecture using Delta Lake. By adopting the multidimensional array storage strategy from array databases and sparse encoding methods to Delta Lake tables, experiments show that this approach has demonstrated notable improvements in both space and time efficiencies when compared to traditional serialization of tensors. These results provide valuable insights for the development and implementation of optimized vector and tensor storage solutions in data-intensive applications, contributing to the evolution of efficient data management practices in AI and ML domains in cloud-native environments
format Preprint
id arxiv_https___arxiv_org_abs_2405_03708
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Delta Tensor: Efficient Vector and Tensor Storage in Delta Lake
Bao, Zhiwei
Liao-Liao, Liu
Wu, Zhiyu
Zhou, Yifan
Fan, Dan
Aibin, Michal
Coady, Yvonne
Brownsword, Andrew
Distributed, Parallel, and Cluster Computing
Databases
Machine Learning
The exponential growth of artificial intelligence (AI) and machine learning (ML) applications has necessitated the development of efficient storage solutions for vector and tensor data. This paper presents a novel approach for tensor storage in a Lakehouse architecture using Delta Lake. By adopting the multidimensional array storage strategy from array databases and sparse encoding methods to Delta Lake tables, experiments show that this approach has demonstrated notable improvements in both space and time efficiencies when compared to traditional serialization of tensors. These results provide valuable insights for the development and implementation of optimized vector and tensor storage solutions in data-intensive applications, contributing to the evolution of efficient data management practices in AI and ML domains in cloud-native environments
title Delta Tensor: Efficient Vector and Tensor Storage in Delta Lake
topic Distributed, Parallel, and Cluster Computing
Databases
Machine Learning
url https://arxiv.org/abs/2405.03708