Saved in:
Bibliographic Details
Main Authors: Abdelwahab, Omar, Torkamaneh, Davoud
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.00659
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917738674388992
author Abdelwahab, Omar
Torkamaneh, Davoud
author_facet Abdelwahab, Omar
Torkamaneh, Davoud
contents Variant calling refinement is crucial for distinguishing true genetic variants from technical artifacts in high-throughput sequencing data. Manual review is time-consuming while heuristic filtering often lacks optimal solutions. Traditional variant calling methods often struggle with accuracy, especially in regions of low read coverage, leading to false-positive or false-negative calls. Here, we introduce VariantTransformer, a Transformer-based deep learning model, designed to automate variant calling refinement directly from VCF files in low-coverage data (10-15X). VariantTransformer, trained on two million variants, including SNPs and short InDels, from low-coverage sequencing data, achieved an accuracy of 89.26% and a ROC AUC of 0.88. When integrated into conventional variant calling pipelines, VariantTransformer outperformed traditional heuristic filters and approached the performance of state-of-the-art AI-based variant callers like DeepVariant. Comparative analysis demonstrated VariantTransformer's superiority in functionality, variant type coverage, training size, and input data type. VariantTransformer represents a significant advancement in variant calling refinement for low-coverage genomic studies.
format Preprint
id arxiv_https___arxiv_org_abs_2408_00659
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Refinement of genetic variants needs attention
Abdelwahab, Omar
Torkamaneh, Davoud
Genomics
Variant calling refinement is crucial for distinguishing true genetic variants from technical artifacts in high-throughput sequencing data. Manual review is time-consuming while heuristic filtering often lacks optimal solutions. Traditional variant calling methods often struggle with accuracy, especially in regions of low read coverage, leading to false-positive or false-negative calls. Here, we introduce VariantTransformer, a Transformer-based deep learning model, designed to automate variant calling refinement directly from VCF files in low-coverage data (10-15X). VariantTransformer, trained on two million variants, including SNPs and short InDels, from low-coverage sequencing data, achieved an accuracy of 89.26% and a ROC AUC of 0.88. When integrated into conventional variant calling pipelines, VariantTransformer outperformed traditional heuristic filters and approached the performance of state-of-the-art AI-based variant callers like DeepVariant. Comparative analysis demonstrated VariantTransformer's superiority in functionality, variant type coverage, training size, and input data type. VariantTransformer represents a significant advancement in variant calling refinement for low-coverage genomic studies.
title Refinement of genetic variants needs attention
topic Genomics
url https://arxiv.org/abs/2408.00659