Lessons for Developing Polygenic Risk Scores With Multi-Ancestry Genome-wide Association Studies

Journal Published on March 7, 2023

Originally published on Cell Genomics

Key findings

The Global Biobank Meta-analysis Initiative (GBMI) was used to study the effects of multi-ancestry genome-wide association study (GWAS) discovery data, polygenicity and polygenic risk score (PRS) methods on prediction performance in diverse target cohorts
Across 14 phenotypes and diverse ancestries, large-scale meta-analyses from the GBMI significantly improved PRS accuracy compared with previous studies that had smaller sample sizes and less diverse cohorts
Based on the results of the analyses, the authors provide guidelines for developing PRSs with multi-ancestry discovery data, including best practices for characteristics of the discovery GWAS, PRS model fitting, and the target cohort

The accuracy of polygenic risk scores (PRSs) is increasing as the power of genome-wide association studies (GWAS) improves. However, many different methods are used to compute PRSs and assess their clinical impact, each with strengths and weaknesses.

Error: Please enter a valid email address.

Email Address

Guidelines for computing PRSs are sorely needed but will require harmonized genetic data spanning diverse phenotypes, participants, and ascertainment strategies.

As a step toward developing best practices, Massachusetts General Hospital researchers explored methodological considerations and PRS performance using data in the Global Biobank Meta-analysis Initiative (GBMI). Ying Wang, PhD, research fellow, Alicia R. Martin, PhD, principal investigator in then the Analytic and Translational Genetics Unit, and colleagues review practical considerations and lessons learned in Cell Genomics.

The Global Biobank Meta-analysis Initiative

The GBMI contains paired genetic and clinical data from more than 2.2 million individuals across diverse ancestries, mostly European but also East Asian, African, Central and South Asian, admixed American and Middle Eastern. The pilot meta-analyses published in Cell Genomics of the GBMI identified genetic loci for 14 diseases and endpoints.

In the current analyses, the researchers used nine biobanks to construct PRSs using the classic pruning and thresholding (P + T) method and PRS–continuous shrinkage (PRS-CS). For both methods, using a European-based linkage disequilibrium (LD) reference panel resulted in comparable or better prediction accuracy than several non-European-based panels.

Overall, PRS-CS outperformed P + T, but prediction accuracy was heterogeneous across endpoints, biobanks and ancestries.

Lessons for Developing PRSs

The heart of the new paper is that the authors provide lessons and guidance for developing PRSs with multi-ancestry discovery data:

Characteristics of the discovery GWAS

Use the largest and most diverse discovery cohort available
Consider polygenicity regarding PRS model selection—Choose models adaptive to trait genetic architecture; if not possible, then choose hyper-parameters reflecting polygenicity, such as smaller phi values for less polygenic traits in the PRS-CS grid model and larger values for more polygenic traits
Confirm single-nucleotide polymorphisms (SNP)-based heritability is significantly larger than 0
Consider ancestry composition for the choice of the LD reference panel and to benchmark PRS accuracy
Perform standard quality controls more cautiously when using a multi-ancestry discovery GWAS; in addition to filtering variants on minor allele frequency, filter on per-variant effective sample size. If that information is unavailable then use only HapMap3 variants

PRS model fitting

When using an external LD reference panel, match ancestry with the dominant one in the discovery GWAS. When no ancestry is dominant, the reference panel should be proportional to the discovery GWAS
When the target population includes diverse ancestries, use a target ancestry–matched tuning cohort
The predictive accuracy of P + T was much less sensitive to different LD-related parameters compared with various P-value thresholds. The PRS-CS auto model (which doesn't require post hoc tuning of the phi parameter) can be used when the discovery GWAS is large enough

Target population

Perform quality control by ancestry if diverse ancestries are included in the target population
Report PRS distribution statistics (e.g., median PRS) and relative accuracy when benchmarking the performance of different PRS predictors

Future Directions

Considerations besides genetics, such as environmental exposures and demographics, may affect the predictive power of PRSs within and across ancestries. How to model these factors along with PRSs is an open question for future research.

view original journal article Subscription may be required

Cell Genomics

Journal Article Published: January 4, 2023

Explore psychiatry research at Mass General

Learn more about the Psychiatric and Neurodevelopmental Genetics Unit

Psychiatry

Multiple Common Genetic Loci Underlie Eight Psychiatric Disorders

The largest-ever cross-disorder genome-wide association study of psychiatric disorders detected multiple common loci and shared genetic structures.

Neuroscience, Psychiatry

Journal April 22, 2020

Stem Cell "Mini-Brains" Useful for Studying the Biology of Bipolar Disorder

Three-dimensional cellular models of the dorsal forebrain, generated from induced pluripotent stem cells, facilitated exploration of the neurobiology of bipolar disorder.