Saved in:
Bibliographic Details
Main Authors: Rawat, Shreyash, Vijayarajan, V., Prasath, V. B. Surya
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.03380
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929234661867520
author Rawat, Shreyash
Vijayarajan, V.
Prasath, V. B. Surya
author_facet Rawat, Shreyash
Vijayarajan, V.
Prasath, V. B. Surya
contents Text extraction is a highly subjective problem which depends on the dataset that one is working on and the kind of summarization details that needs to be extracted out. All the steps ranging from preprocessing of the data, to the choice of an optimal model for predictions, depends on the problem and the corpus at hand. In this paper, we describe a text extraction model where the aim is to extract word specified information relating to the semantics such that we can get all related and meaningful information about that word in a succinct format. This model can obtain meaningful results and can augment ubiquitous search model or a normal clustering or topic modelling algorithms. By utilizing new technique called two cluster assignment technique with K-means model, we improved the ontology of the retrieved text. We further apply the vector average damping technique for flexible movement of clusters. Our experimental results on a recent corpus of Covid-19 shows that we obtain good results based on main keywords.
format Preprint
id arxiv_https___arxiv_org_abs_2402_03380
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Modified K-means with Cluster Assignment -- Application to COVID-19 Data
Rawat, Shreyash
Vijayarajan, V.
Prasath, V. B. Surya
Information Retrieval
Text extraction is a highly subjective problem which depends on the dataset that one is working on and the kind of summarization details that needs to be extracted out. All the steps ranging from preprocessing of the data, to the choice of an optimal model for predictions, depends on the problem and the corpus at hand. In this paper, we describe a text extraction model where the aim is to extract word specified information relating to the semantics such that we can get all related and meaningful information about that word in a succinct format. This model can obtain meaningful results and can augment ubiquitous search model or a normal clustering or topic modelling algorithms. By utilizing new technique called two cluster assignment technique with K-means model, we improved the ontology of the retrieved text. We further apply the vector average damping technique for flexible movement of clusters. Our experimental results on a recent corpus of Covid-19 shows that we obtain good results based on main keywords.
title Modified K-means with Cluster Assignment -- Application to COVID-19 Data
topic Information Retrieval
url https://arxiv.org/abs/2402.03380