Novel Polygenic Method Developed to Predict Genetic Liability of Traits and Disorders

Journal Published on October 15, 2019

Originally published on Nature Communications

Key findings

A novel Bayesian polygenic prediction method, PRS-CS, uses continuous shrinkage priors to enable simple and efficient multivariate block update of single nucleotide polymorphism effect sizes
In simulation studies using genetic data, PRS-CS demonstrated better predictive accuracy than four existing methods across a wide range of genetic architectures, especially when the training sample size was large
When applied to specimens from the Partners HealthCare Biobank, PRS-CS was more accurate than the other methods in predicting five of six common complex diseases and all of six quantitative traits

Polygenic risk scores (PRS) summarize the effects of genome-wide genetic markers to measure the genetic liability of a complex trait or disorder. Polygenic prediction is likely to become useful in clinical care because it may facilitate prevention, early detection and risk stratification of common diseases—as well as allow personalized treatment.

Error: Please enter a valid email address.

Email Address

Conventional methods of calculating PRS use a subset of genetic markers, after pruning out single nucleotide polymorphisms (SNPs) in local linkage disequilibrium (LD), and apply a P-value threshold to summary statistics from genome-wide association studies (GWAS). Recent research conducted at the Harvard T.H. Chan School of Public Health and published in American Journal of Human Genetics suggests that this approach, while keeping computations simpler, discards too much information and limits prediction accuracy.

More sophisticated Bayesian polygenic prediction methods have been developed, such as LDpred, that can incorporate genome-wide markers. However, these methods use discrete mixture priors on SNP effect sizes, which impose daunting computational challenges. The model space grows exponentially with the number of markers, and the priors do not allow for a block update of effect sizes and thus may result in an inaccurate adjustment for LD patterns.

Tian Ge, PhD, a junior faculty member in the Psychiatric and Neurodevelopmental Genetics Unit (PNGU) at Massachusetts General Hospital, PNGU Director Jordan W. Smoller, MD, ScD, and colleagues have developed a novel polygenic prediction method, dubbed PRS-CS, that uses a conceptually different class of priors. The researchers report in Nature Communications that in both simulations and real-world analyses, PRS-CS demonstrated better prediction accuracy than existing methods.

The Concept

PRS-CS is a form of Bayesian regression that places continuous shrinkage priors on SNP effect sizes. The amount of shrinkage applied to each genetic marker is adaptive to the strength of its associated signal in GWAS, so the technique accommodates diverse underlying genetic architectures.

In addition, effect sizes for SNPs in each LD block are updated jointly, in multivariate fashion, instead of updating the effect size for each marker separately and sequentially. The idea is that PRS-CS will accurately model local LD patterns and provide substantial computational improvements.

PRS-CS can be applied to individual-level data, but only GWAS summary statistics and an external LD reference panel are required. In the current study, the 1,000 Genomes Project European sample (N=503) was used as the external LD reference.

The journal article presents the conceptual framework in detail. The priors are represented as global–local scale mixtures of normals where a global scaling parameter, Φ, shares across genetic markers and controls the degree of sparseness of the model. The researchers also developed PRS-CS-auto, a fully Bayesian approach that enables automatic learning of Φ from GWAS summary statistics.

Simulations

Using UK Biobank data, the researchers conducted simulation studies of the predictive power of PRS-CS and PRS-CS-auto across different genetic architectures and GWAS sample sizes. They compared the results with those of four other polygenic prediction methods:

Polygenic scoring based on all genetic markers (unadjusted PRS)
P-value thresholding (P+T)
LDpred
LDpred-inf (LDpred specialized to an infinitesimal prior)

This analysis found that:

Methods that account for local LD patterns (LDpred, PRS-CS and PRS-CS-auto) outperformed P+T, which discards LD information
When the genetic architecture was sparse, the prediction accuracy of LDpred decreased dramatically as the GWAS sample size grew, probably because LDpred does not accurately adjust for the LD structure. In contrast, PRS-CS and PRS-CS-auto were minimally affected in these scenarios
In a few scenarios where the GWAS sample was small, PRS-CS had lower prediction accuracy than LDpred, but it outperformed LDpred as the GWAS sample grew across all genetic architectures
PRS-CS-auto did not perform well when the GWAS sample size was small and the genetic architecture was sparse, but it approached the accuracy of PRS-CS as the sample size increased

Real-Data Analyses

The researchers then evaluated how well PRS-CS, PRS-CS-auto and the other methods could predict six common complex diseases and six quantitative traits using samples from the Partners HealthCare Biobank. They found that:

For breast cancer and rheumatoid arthritis, PRS-CS was substantially more accurate than LDpred
For coronary artery disease, depression and type 2 diabetes, LDpred and PRS-CS performed similarly, and both were dramatically better than P+T
PRS-CS was inferior to LDpred only in the prediction of inflammatory bowel disease. Among all diseases and traits, inflammatory bowel disease had the smallest GWAS sample, so this result was consistent with the simulations
PRS-CS-auto was less accurate than LDpred for all diseases except breast cancer. To date, the GWAS samples for most diseases may not be large enough for PRS-CS-auto to accurately learn the global shrinkage parameter
For all traits evaluated (height, body mass index, high-density lipoproteins, low-density lipoproteins, cholesterol and triglycerides), both PRS-CS and PRS-CS-auto were the most accurate methods
Although PRS-CS represents a substantial improvement over existing methods for polygenic prediction, its accuracy is still lower than what is needed clinically. As GWAS sample sizes continue to grow, however, PRS-CS and PRS-CS-auto should demonstrate even greater advantages.