Saved in:
Bibliographic Details
Main Authors: Rodriguez, Julian, Lopez, Piotr, Lerma, Emiliano, Medrano, Rafael, Hernandez, Jacobo
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2512.10312
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • This document reports the sequence of practices and methodologies implemented during the Big Data course. It details the workflow beginning with the processing of the Epsilon dataset through group and individual strategies, followed by text analysis and classification with RestMex and movie feature analysis with IMDb. Finally, it describes the technical implementation of a distributed computing cluster with Apache Spark on Linux using Scala.