Saved in:
Bibliographic Details
Main Authors: Li, Rongbin, Chen, Wenbo, Li, Zhao, Munoz-Castaneda, Rodrigo, Li, Jinbo, Maurya, Neha S., Solanki, Arnav, He, Huan, Xing, Hanwen, Ramlakhan, Meaghan, Wise, Zachary, Johansen, Nelson, Wu, Zhuhao, Xu, Hua, Hawrylycz, Michael, Zheng, W. Jim
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.17064
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917077630058496
author Li, Rongbin
Chen, Wenbo
Li, Zhao
Munoz-Castaneda, Rodrigo
Li, Jinbo
Maurya, Neha S.
Solanki, Arnav
He, Huan
Xing, Hanwen
Ramlakhan, Meaghan
Wise, Zachary
Johansen, Nelson
Wu, Zhuhao
Xu, Hua
Hawrylycz, Michael
Zheng, W. Jim
author_facet Li, Rongbin
Chen, Wenbo
Li, Zhao
Munoz-Castaneda, Rodrigo
Li, Jinbo
Maurya, Neha S.
Solanki, Arnav
He, Huan
Xing, Hanwen
Ramlakhan, Meaghan
Wise, Zachary
Johansen, Nelson
Wu, Zhuhao
Xu, Hua
Hawrylycz, Michael
Zheng, W. Jim
contents Single-cell RNA sequencing has transformed our ability to identify diverse cell types and their transcriptomic signatures. However, annotating these signatures-especially those involving poorly characterized genes-remains a major challenge. Traditional methods, such as Gene Set Enrichment Analysis (GSEA), depend on well-curated annotations and often perform poorly in these contexts. Large Language Models (LLMs) offer a promising alternative but struggle to represent complex biological knowledge within structured ontologies. To address this, we present BRAINCELL-AID (BRAINCELL-AID: https://biodataai.uth.edu/BRAINCELL-AID), a novel multi-agent AI system that integrates free-text descriptions with ontology labels to enable more accurate and robust gene set annotation. By incorporating retrieval-augmented generation (RAG), we developed a robust agentic workflow that refines predictions using relevant PubMed literature, reducing hallucinations and enhancing interpretability. Using this workflow, we achieved correct annotations for 77% of mouse gene sets among their top predictions. Applying this approach, we annotated 5,322 brain cell clusters from the comprehensive mouse brain cell atlas generated by the BRAIN Initiative Cell Census Network, enabling novel insights into brain cell function by identifying region-specific gene co-expression patterns and inferring functional roles of gene ensembles. BRAINCELL-AID also identifies Basal Ganglia-related cell types with neurologically meaningful descriptions. Hence, we create a valuable resource to support community-driven cell type annotation.
format Preprint
id arxiv_https___arxiv_org_abs_2510_17064
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle A Brain Cell Type Resource Created by Large Language Models and a Multi-Agent AI System for Collaborative Community Annotation
Li, Rongbin
Chen, Wenbo
Li, Zhao
Munoz-Castaneda, Rodrigo
Li, Jinbo
Maurya, Neha S.
Solanki, Arnav
He, Huan
Xing, Hanwen
Ramlakhan, Meaghan
Wise, Zachary
Johansen, Nelson
Wu, Zhuhao
Xu, Hua
Hawrylycz, Michael
Zheng, W. Jim
Artificial Intelligence
Single-cell RNA sequencing has transformed our ability to identify diverse cell types and their transcriptomic signatures. However, annotating these signatures-especially those involving poorly characterized genes-remains a major challenge. Traditional methods, such as Gene Set Enrichment Analysis (GSEA), depend on well-curated annotations and often perform poorly in these contexts. Large Language Models (LLMs) offer a promising alternative but struggle to represent complex biological knowledge within structured ontologies. To address this, we present BRAINCELL-AID (BRAINCELL-AID: https://biodataai.uth.edu/BRAINCELL-AID), a novel multi-agent AI system that integrates free-text descriptions with ontology labels to enable more accurate and robust gene set annotation. By incorporating retrieval-augmented generation (RAG), we developed a robust agentic workflow that refines predictions using relevant PubMed literature, reducing hallucinations and enhancing interpretability. Using this workflow, we achieved correct annotations for 77% of mouse gene sets among their top predictions. Applying this approach, we annotated 5,322 brain cell clusters from the comprehensive mouse brain cell atlas generated by the BRAIN Initiative Cell Census Network, enabling novel insights into brain cell function by identifying region-specific gene co-expression patterns and inferring functional roles of gene ensembles. BRAINCELL-AID also identifies Basal Ganglia-related cell types with neurologically meaningful descriptions. Hence, we create a valuable resource to support community-driven cell type annotation.
title A Brain Cell Type Resource Created by Large Language Models and a Multi-Agent AI System for Collaborative Community Annotation
topic Artificial Intelligence
url https://arxiv.org/abs/2510.17064