- A novel Bayesian polygenic prediction method, PRS-CS, uses continuous shrinkage priors to enable simple and efficient multivariate block update of single nucleotide polymorphism effect sizes
- In simulation studies using genetic data, PRS-CS demonstrated better predictive accuracy than four existing methods across a wide range of genetic architectures, especially when the training sample size was large
- When applied to specimens from the Partners HealthCare Biobank, PRS-CS was more accurate than the other methods in predicting five of six common complex diseases and all of six quantitative traits
Polygenic risk scores (PRS) summarize the effects of genome-wide genetic markers to measure the genetic liability of a complex trait or disorder. Polygenic prediction is likely to become useful in clinical care because it may facilitate prevention, early detection and risk stratification of common diseases—as well as allow personalized treatment.
Conventional methods of calculating PRS use a subset of genetic markers, after pruning out single nucleotide polymorphisms (SNPs) in local linkage disequilibrium (LD), and apply a P-value threshold to summary statistics from genome-wide association studies (GWAS). Recent research conducted at the Harvard T.H. Chan School of Public Health and published in American Journal of Human Genetics suggests that this approach, while keeping computations simpler, discards too much information and limits prediction accuracy.
More sophisticated Bayesian polygenic prediction methods have been developed, such as LDpred, that can incorporate genome-wide markers. However, these methods use discrete mixture priors on SNP effect sizes, which impose daunting computational challenges. The model space grows exponentially with the number of markers, and the priors do not allow for a block update of effect sizes and thus may result in an inaccurate adjustment for LD patterns.
Tian Ge, PhD, a junior faculty member in the Psychiatric and Neurodevelopmental Genetics Unit (PNGU) at Massachusetts General Hospital, PNGU Director Jordan W. Smoller, MD, ScD, and colleagues have developed a novel polygenic prediction method, dubbed PRS-CS, that uses a conceptually different class of priors. The researchers report in Nature Communications that in both simulations and real-world analyses, PRS-CS demonstrated better prediction accuracy than existing methods.
PRS-CS is a form of Bayesian regression that places continuous shrinkage priors on SNP effect sizes. The amount of shrinkage applied to each genetic marker is adaptive to the strength of its associated signal in GWAS, so the technique accommodates diverse underlying genetic architectures.
In addition, effect sizes for SNPs in each LD block are updated jointly, in multivariate fashion, instead of updating the effect size for each marker separately and sequentially. The idea is that PRS-CS will accurately model local LD patterns and provide substantial computational improvements.
PRS-CS can be applied to individual-level data, but only GWAS summary statistics and an external LD reference panel are required. In the current study, the 1,000 Genomes Project European sample (N=503) was used as the external LD reference.
The journal article presents the conceptual framework in detail. The priors are represented as global–local scale mixtures of normals where a global scaling parameter, Φ, shares across genetic markers and controls the degree of sparseness of the model. The researchers also developed PRS-CS-auto, a fully Bayesian approach that enables automatic learning of Φ from GWAS summary statistics.
Using UK Biobank data, the researchers conducted simulation studies of the predictive power of PRS-CS and PRS-CS-auto across different genetic architectures and GWAS sample sizes. They compared the results with those of four other polygenic prediction methods:
- Polygenic scoring based on all genetic markers (unadjusted PRS)
- P-value thresholding (P+T)
- LDpred-inf (LDpred specialized to an infinitesimal prior)
This analysis found that:
- Methods that account for local LD patterns (LDpred, PRS-CS and PRS-CS-auto) outperformed P+T, which discards LD information
- When the genetic architecture was sparse, the prediction accuracy of LDpred decreased dramatically as the GWAS sample size grew, probably because LDpred does not accurately adjust for the LD structure. In contrast, PRS-CS and PRS-CS-auto were minimally affected in these scenarios
- In a few scenarios where the GWAS sample was small, PRS-CS had lower prediction accuracy than LDpred, but it outperformed LDpred as the GWAS sample grew across all genetic architectures
- PRS-CS-auto did not perform well when the GWAS sample size was small and the genetic architecture was sparse, but it approached the accuracy of PRS-CS as the sample size increased
The researchers then evaluated how well PRS-CS, PRS-CS-auto and the other methods could predict six common complex diseases and six quantitative traits using samples from the Partners HealthCare Biobank. They found that:
- For breast cancer and rheumatoid arthritis, PRS-CS was substantially more accurate than LDpred
- For coronary artery disease, depression and type 2 diabetes, LDpred and PRS-CS performed similarly, and both were dramatically better than P+T
- PRS-CS was inferior to LDpred only in the prediction of inflammatory bowel disease. Among all diseases and traits, inflammatory bowel disease had the smallest GWAS sample, so this result was consistent with the simulations
- PRS-CS-auto was less accurate than LDpred for all diseases except breast cancer. To date, the GWAS samples for most diseases may not be large enough for PRS-CS-auto to accurately learn the global shrinkage parameter
- For all traits evaluated (height, body mass index, high-density lipoproteins, low-density lipoproteins, cholesterol and triglycerides), both PRS-CS and PRS-CS-auto were the most accurate methods
- Although PRS-CS represents a substantial improvement over existing methods for polygenic prediction, its accuracy is still lower than what is needed clinically. As GWAS sample sizes continue to grow, however, PRS-CS and PRS-CS-auto should demonstrate even greater advantages.
Learn more about the Psychiatric and Neurodevelopmental Genetics Unit