Guardado en:
Detalles Bibliográficos
Autores principales: Carrami, Eli M, Sharifzadeh, Sahand
Formato: Preprint
Publicado: 2024
Materias:
Acceso en línea:https://arxiv.org/abs/2402.13653
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866929595960262656
author Carrami, Eli M
Sharifzadeh, Sahand
author_facet Carrami, Eli M
Sharifzadeh, Sahand
contents Understanding protein structure and function is crucial in biology. However, current computational methods are often task-specific and resource-intensive. To address this, we propose zero-shot Protein Question Answering (PQA), a task designed to answer a wide range of protein-related queries without task-specific training. The success of PQA hinges on high-quality datasets and robust evaluation strategies, both of which are lacking in current research. Existing datasets suffer from biases, noise, and lack of evolutionary context, while current evaluation methods fail to accurately assess model performance. We introduce the Pika framework to overcome these limitations. Pika comprises a curated, debiased dataset tailored for PQA and a biochemically relevant benchmarking strategy. We also propose multimodal large language models as a strong baseline for PQA, leveraging their natural language processing and knowledge. This approach promises a more flexible and efficient way to explore protein properties, advancing protein research. Our comprehensive PQA framework, Pika, including dataset, code, and model checkpoints, is openly accessible on github.com/EMCarrami/Pika, promoting wider research in the field.
format Preprint
id arxiv_https___arxiv_org_abs_2402_13653
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle PQA: Zero-shot Protein Question Answering for Free-form Scientific Enquiry with Large Language Models
Carrami, Eli M
Sharifzadeh, Sahand
Machine Learning
Understanding protein structure and function is crucial in biology. However, current computational methods are often task-specific and resource-intensive. To address this, we propose zero-shot Protein Question Answering (PQA), a task designed to answer a wide range of protein-related queries without task-specific training. The success of PQA hinges on high-quality datasets and robust evaluation strategies, both of which are lacking in current research. Existing datasets suffer from biases, noise, and lack of evolutionary context, while current evaluation methods fail to accurately assess model performance. We introduce the Pika framework to overcome these limitations. Pika comprises a curated, debiased dataset tailored for PQA and a biochemically relevant benchmarking strategy. We also propose multimodal large language models as a strong baseline for PQA, leveraging their natural language processing and knowledge. This approach promises a more flexible and efficient way to explore protein properties, advancing protein research. Our comprehensive PQA framework, Pika, including dataset, code, and model checkpoints, is openly accessible on github.com/EMCarrami/Pika, promoting wider research in the field.
title PQA: Zero-shot Protein Question Answering for Free-form Scientific Enquiry with Large Language Models
topic Machine Learning
url https://arxiv.org/abs/2402.13653