Bewaard in:
Bibliografische gegevens
Hoofdauteurs: Truong, Van Q., Ritchie, Marylyn D.
Formaat: Recurso digital
Taal:
Gepubliceerd in: Zenodo 2025
Onderwerpen:
Online toegang:https://doi.org/10.5281/zenodo.16891723
Tags: Voeg label toe
Geen labels, Wees de eerste die dit record labelt!
_version_ 1866902176755875840
author Truong, Van Q.
Ritchie, Marylyn D.
author_facet Truong, Van Q.
Ritchie, Marylyn D.
contents <p>Researchers increasingly rely on free and open-source software (FOSS) for computational analysis across the life sciences. However, the growing volume and diversity of available tools make it difficult to discover, understand, and select appropriate software for specific tasks. We present <em>ToolsyBio</em>, a modular system that uses retrieval-augmented generation (RAG) to assist researchers in exploring the bioinformatics software landscape via natural language queries. <em>ToolsyBio</em> is built on structured metadata from the <em>bio.tools</em> registry and semantically enriched with concepts from <em>EDAM</em>, a controlled vocabulary for bioscientific data analysis and data management. The system retrieves relevant tool descriptions using a vector store <em>ChromaDB</em> and generates grounded responses using a locally served large language model (LLM) via <em>Ollama</em>. We describe the system’s architecture, implementation, and potential for improving the findability and usability of bioinformatics tools through a conversational interface.</p>
format Recurso digital
id zenodo_https___doi_org_10_5281_zenodo_16891723
institution Zenodo
language
publishDate 2025
publisher Zenodo
record_format zenodo
spellingShingle ToolsyBio: A retrieval-augmented generation system for navigating the bioinformatics software landscape
Truong, Van Q.
Ritchie, Marylyn D.
large language models (LLMs)
retrieval-augmented generation (RAG)
open source software
semantic search
software discovery
<p>Researchers increasingly rely on free and open-source software (FOSS) for computational analysis across the life sciences. However, the growing volume and diversity of available tools make it difficult to discover, understand, and select appropriate software for specific tasks. We present <em>ToolsyBio</em>, a modular system that uses retrieval-augmented generation (RAG) to assist researchers in exploring the bioinformatics software landscape via natural language queries. <em>ToolsyBio</em> is built on structured metadata from the <em>bio.tools</em> registry and semantically enriched with concepts from <em>EDAM</em>, a controlled vocabulary for bioscientific data analysis and data management. The system retrieves relevant tool descriptions using a vector store <em>ChromaDB</em> and generates grounded responses using a locally served large language model (LLM) via <em>Ollama</em>. We describe the system’s architecture, implementation, and potential for improving the findability and usability of bioinformatics tools through a conversational interface.</p>
title ToolsyBio: A retrieval-augmented generation system for navigating the bioinformatics software landscape
topic large language models (LLMs)
retrieval-augmented generation (RAG)
open source software
semantic search
software discovery
url https://doi.org/10.5281/zenodo.16891723