Salvato in:
Dettagli Bibliografici
Autori principali: Dr. J. Jebamalar Tamilselvi, Dr. K. Sutha, Dr. S. Sweetlin Susilabai, Mr. Raman Raguraman, Dr. Surya Susan Thomas
Natura: Recurso digital
Lingua:
Pubblicazione: Zenodo 2025
Accesso online:https://doi.org/10.5281/zenodo.15664496
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
Sommario:
  • <p>Effective data management is essential in Big Data Analytics, where vast amounts of data are collected, processed, and analyzed to extract valuable insights. Duplicate data presents significant challenges, including increased storage costs, diminished analytical efficiency, and compromised data quality, all of which can lead to erroneous decision-making. Conventional deduplication techniques based on exact matching or simplistic heuristics often fall short in addressing data variability, semantic equivalence, and the massive scale typical of big data. This research proposes a token-based framework enhanced by a hybrid machine learning approach, integrating tokenbased strategies with both supervised and unsupervised learning techniques to accurately detect and eliminate duplicate data in Big Data Analytics environments.</p>