Saved in:
Bibliographic Details
Main Authors: Yeganova, Lana E., Kim, Won G., Tian, Shubo, Xie, Natalie, Comeau, Donald C., Wilbur, W. John, Lu, Zhiyong
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.20224
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915812911087616
author Yeganova, Lana E.
Kim, Won G.
Tian, Shubo
Xie, Natalie
Comeau, Donald C.
Wilbur, W. John
Lu, Zhiyong
author_facet Yeganova, Lana E.
Kim, Won G.
Tian, Shubo
Xie, Natalie
Comeau, Donald C.
Wilbur, W. John
Lu, Zhiyong
contents The rapid expansion of biomedical publications creates challenges for organizing knowledge and detecting emerging trends, underscoring the need for scalable and interpretable methods. Common clustering and topic modeling approaches such as K-means or LDA remain sensitive to initialization and prone to local optima, limiting reproducibility and evaluation. We propose a reformulation of a convex optimization based clustering algorithm that produces stable, fine-grained topics by selecting exemplars from the data and guaranteeing a global optimum. Applied to about 12,000 PubMed articles on aging and longevity, our method uncovers topics validated by medical experts. It yields interpretable topics spanning from molecular mechanisms to dietary supplements, physical activity, and gut microbiota. The method performs favorably, and most importantly, its reproducibility and interpretability distinguish it from common clustering approaches, including K-means, LDA, and BERTopic. This work provides a basis for developing scalable, web-accessible tools for knowledge discovery.
format Preprint
id arxiv_https___arxiv_org_abs_2602_20224
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Exploring Anti-Aging Literature via ConvexTopics and Large Language Models
Yeganova, Lana E.
Kim, Won G.
Tian, Shubo
Xie, Natalie
Comeau, Donald C.
Wilbur, W. John
Lu, Zhiyong
Machine Learning
Artificial Intelligence
Computation and Language
The rapid expansion of biomedical publications creates challenges for organizing knowledge and detecting emerging trends, underscoring the need for scalable and interpretable methods. Common clustering and topic modeling approaches such as K-means or LDA remain sensitive to initialization and prone to local optima, limiting reproducibility and evaluation. We propose a reformulation of a convex optimization based clustering algorithm that produces stable, fine-grained topics by selecting exemplars from the data and guaranteeing a global optimum. Applied to about 12,000 PubMed articles on aging and longevity, our method uncovers topics validated by medical experts. It yields interpretable topics spanning from molecular mechanisms to dietary supplements, physical activity, and gut microbiota. The method performs favorably, and most importantly, its reproducibility and interpretability distinguish it from common clustering approaches, including K-means, LDA, and BERTopic. This work provides a basis for developing scalable, web-accessible tools for knowledge discovery.
title Exploring Anti-Aging Literature via ConvexTopics and Large Language Models
topic Machine Learning
Artificial Intelligence
Computation and Language
url https://arxiv.org/abs/2602.20224