RExPRT: a machine learning tool to predict pathogenicity of tandem repeat loci.

TitleRExPRT: a machine learning tool to predict pathogenicity of tandem repeat loci.
Publication TypeJournal Article
Year of Publication2024
AuthorsFazal S, Danzi MC, Xu I, Kobren SNadimpalli, Sunyaev S, Reuter C, Marwaha S, Wheeler M, Dolzhenko E, Lucas F, Wuchty S, Tekin M, Züchner S, Aguiar-Pulido V
JournalGenome Biol
Volume25
Issue1
Pagination39
Date Published2024 Jan 31
ISSN1474-760X
KeywordsMachine Learning, Tandem Repeat Sequences, Virulence
Abstract

Expansions of tandem repeats (TRs) cause approximately 60 monogenic diseases. We expect that the discovery of additional pathogenic repeat expansions will narrow the diagnostic gap in many diseases. A growing number of TR expansions are being identified, and interpreting them is a challenge. We present RExPRT (Repeat EXpansion Pathogenicity pRediction Tool), a machine learning tool for distinguishing pathogenic from benign TR expansions. Our results demonstrate that an ensemble approach classifies TRs with an average precision of 93% and recall of 83%. RExPRT's high precision will be valuable in large-scale discovery studies, which require prioritization of candidate loci for follow-up studies.

DOI10.1186/s13059-024-03171-4
Alternate JournalGenome Biol
PubMed ID38297326
PubMed Central IDPMC10832122
Grant ListR01 NS072248 / NS / NINDS NIH HHS / United States