Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.03708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866910443888443392 |
|---|---|
| author | Bao, Zhiwei Liao-Liao, Liu Wu, Zhiyu Zhou, Yifan Fan, Dan Aibin, Michal Coady, Yvonne Brownsword, Andrew |
| author_facet | Bao, Zhiwei Liao-Liao, Liu Wu, Zhiyu Zhou, Yifan Fan, Dan Aibin, Michal Coady, Yvonne Brownsword, Andrew |
| contents | The exponential growth of artificial intelligence (AI) and machine learning (ML) applications has necessitated the development of efficient storage solutions for vector and tensor data. This paper presents a novel approach for tensor storage in a Lakehouse architecture using Delta Lake. By adopting the multidimensional array storage strategy from array databases and sparse encoding methods to Delta Lake tables, experiments show that this approach has demonstrated notable improvements in both space and time efficiencies when compared to traditional serialization of tensors. These results provide valuable insights for the development and implementation of optimized vector and tensor storage solutions in data-intensive applications, contributing to the evolution of efficient data management practices in AI and ML domains in cloud-native environments |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2405_03708 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Delta Tensor: Efficient Vector and Tensor Storage in Delta Lake Bao, Zhiwei Liao-Liao, Liu Wu, Zhiyu Zhou, Yifan Fan, Dan Aibin, Michal Coady, Yvonne Brownsword, Andrew Distributed, Parallel, and Cluster Computing Databases Machine Learning The exponential growth of artificial intelligence (AI) and machine learning (ML) applications has necessitated the development of efficient storage solutions for vector and tensor data. This paper presents a novel approach for tensor storage in a Lakehouse architecture using Delta Lake. By adopting the multidimensional array storage strategy from array databases and sparse encoding methods to Delta Lake tables, experiments show that this approach has demonstrated notable improvements in both space and time efficiencies when compared to traditional serialization of tensors. These results provide valuable insights for the development and implementation of optimized vector and tensor storage solutions in data-intensive applications, contributing to the evolution of efficient data management practices in AI and ML domains in cloud-native environments |
| title | Delta Tensor: Efficient Vector and Tensor Storage in Delta Lake |
| topic | Distributed, Parallel, and Cluster Computing Databases Machine Learning |
| url | https://arxiv.org/abs/2405.03708 |