Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Rizk, Basem, Walsh, Joel, Core, Mark, Nye, Benjamin
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language Information Retrieval
Online Access:	https://arxiv.org/abs/2510.01513
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Analysis of multi-modal content can be tricky, computationally expensive, and require a significant amount of engineering efforts. Lots of work with pre-trained models on static data is out there, yet fusing these opensource models and methods with complex data such as videos is relatively challenging. In this paper, we present a framework that enables efficiently prototyping pipelines for multi-modal content analysis. We craft a candidate recipe for a pipeline, marrying a set of pre-trained models, to convert videos into a temporal semi-structured data format. We translate this structure further to a frame-level indexed knowledge graph representation that is query-able and supports continual learning, enabling the dynamic incorporation of new domain-specific knowledge through an interactive medium.

Similar Items