Saved in:
Bibliographic Details
Main Authors: Sadia, Mushtari, Chowdhury, Amrita Roy, Chen, Ang
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.14601
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915500881084416
author Sadia, Mushtari
Chowdhury, Amrita Roy
Chen, Ang
author_facet Sadia, Mushtari
Chowdhury, Amrita Roy
Chen, Ang
contents Unstructured data, such as text, images, audio, and video, comprises the vast majority of the world's information, yet it remains poorly supported by traditional data systems that rely on structured formats for computation. We argue for a new paradigm, which we call computing on unstructured data, built around three stages: extraction of latent structure, transformation of this structure through data processing techniques, and projection back into unstructured formats. This bi-directional pipeline allows unstructured data to benefit from the analytical power of structured computation, while preserving the richness and accessibility of unstructured representations for human and AI consumption. We illustrate this paradigm through two use cases and present the research components that need to be developed in a new data system called MXFlow.
format Preprint
id arxiv_https___arxiv_org_abs_2509_14601
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle A Case for Computing on Unstructured Data
Sadia, Mushtari
Chowdhury, Amrita Roy
Chen, Ang
Databases
Artificial Intelligence
Unstructured data, such as text, images, audio, and video, comprises the vast majority of the world's information, yet it remains poorly supported by traditional data systems that rely on structured formats for computation. We argue for a new paradigm, which we call computing on unstructured data, built around three stages: extraction of latent structure, transformation of this structure through data processing techniques, and projection back into unstructured formats. This bi-directional pipeline allows unstructured data to benefit from the analytical power of structured computation, while preserving the richness and accessibility of unstructured representations for human and AI consumption. We illustrate this paradigm through two use cases and present the research components that need to be developed in a new data system called MXFlow.
title A Case for Computing on Unstructured Data
topic Databases
Artificial Intelligence
url https://arxiv.org/abs/2509.14601