Bewaard in:
| Hoofdauteurs: | , |
|---|---|
| Formaat: | Recurso digital |
| Taal: | |
| Gepubliceerd in: |
Zenodo
2025
|
| Onderwerpen: | |
| Online toegang: | https://doi.org/10.5281/zenodo.16891723 |
| Tags: |
Voeg label toe
Geen labels, Wees de eerste die dit record labelt!
|
| _version_ | 1866902176755875840 |
|---|---|
| author | Truong, Van Q. Ritchie, Marylyn D. |
| author_facet | Truong, Van Q. Ritchie, Marylyn D. |
| contents | <p>Researchers increasingly rely on free and open-source software (FOSS) for computational analysis across the life sciences. However, the growing volume and diversity of available tools make it difficult to discover, understand, and select appropriate software for specific tasks. We present <em>ToolsyBio</em>, a modular system that uses retrieval-augmented generation (RAG) to assist researchers in exploring the bioinformatics software landscape via natural language queries. <em>ToolsyBio</em> is built on structured metadata from the <em>bio.tools</em> registry and semantically enriched with concepts from <em>EDAM</em>, a controlled vocabulary for bioscientific data analysis and data management. The system retrieves relevant tool descriptions using a vector store <em>ChromaDB</em> and generates grounded responses using a locally served large language model (LLM) via <em>Ollama</em>. We describe the system’s architecture, implementation, and potential for improving the findability and usability of bioinformatics tools through a conversational interface.</p> |
| format | Recurso digital |
| id | zenodo_https___doi_org_10_5281_zenodo_16891723 |
| institution | Zenodo |
| language | |
| publishDate | 2025 |
| publisher | Zenodo |
| record_format | zenodo |
| spellingShingle | ToolsyBio: A retrieval-augmented generation system for navigating the bioinformatics software landscape Truong, Van Q. Ritchie, Marylyn D. large language models (LLMs) retrieval-augmented generation (RAG) open source software semantic search software discovery <p>Researchers increasingly rely on free and open-source software (FOSS) for computational analysis across the life sciences. However, the growing volume and diversity of available tools make it difficult to discover, understand, and select appropriate software for specific tasks. We present <em>ToolsyBio</em>, a modular system that uses retrieval-augmented generation (RAG) to assist researchers in exploring the bioinformatics software landscape via natural language queries. <em>ToolsyBio</em> is built on structured metadata from the <em>bio.tools</em> registry and semantically enriched with concepts from <em>EDAM</em>, a controlled vocabulary for bioscientific data analysis and data management. The system retrieves relevant tool descriptions using a vector store <em>ChromaDB</em> and generates grounded responses using a locally served large language model (LLM) via <em>Ollama</em>. We describe the system’s architecture, implementation, and potential for improving the findability and usability of bioinformatics tools through a conversational interface.</p> |
| title | ToolsyBio: A retrieval-augmented generation system for navigating the bioinformatics software landscape |
| topic | large language models (LLMs) retrieval-augmented generation (RAG) open source software semantic search software discovery |
| url | https://doi.org/10.5281/zenodo.16891723 |