Skip to content

Artifactual Bias in Large MRI Analyses of Brain Development Is Pervasive, Requires Multipronged Intervention

Key findings

  • In this era of “big data” in clinical research, a key question is whether large sample size can overcome the risk of errant findings that arise from common artifacts, especially those that can be time-consuming to troubleshoot
  • Following detailed manual image quality assessments of >10,000 pediatric MRI scans from the Adolescent Brain Cognitive Development cohort, the team compared automated quality control indices to user ratings. Most automated ratings fell well short of what was needed to prevent type I and type II errors related to subtle motion artifacts, which were present in the majority of images
  • Comparisons of manual and automated data enabled the team to identify specific thresholds of one automated measure, such as surface hole numbers, that approximated manual quality control ratings. However, subtle clinical associations of brain structure and behavior remained susceptible to error, even when limited to participants with relatively high-quality scans
  • In these high-quality scans, manual editing of subtle artifacts was still associated with significant changes in cortical thickness and surface area — changes that, in some regions, were replicated in a non-overlapping clinical cohort
  • “Bigger is not necessarily better” in pediatric neuroimaging studies, due to non-random biases caused by common artifacts. But a range of automated and manual measures can be deployed to protect against errant findings. Findings of small effect size require more time-intensive quality control measures

Longitudinal MRI studies of adolescent brain development, such as the ongoing Adolescent Brain Cognitive Development (ABCD) study, require thousands of participants because effect sizes for relationships between psychopathology and MRI indices tend to be small. Another challenge is that structural MRI (sMRI) scans of children and adolescents are particularly susceptible to artifacts due to participant motion.

Joshua L. Roffman, MD, MMSc, director of the Mass General Early Brain Development Initiative, and colleagues have determined that a large sample size doesn’t necessarily compensate for errant sMRI measurements arising from inclusion of poorer-quality images. In Nature Neuroscience, they explain how visual quality control (QC) and manual editing, conducted in concert with automated QC, can mitigate bias.

Variable Image Quality in Baseline Scans

The nationally representative ABCD study enrolled 11,875 participants, age 9 or 10 at baseline, at 22 U.S. sites. Dr. Roffman’s group applied a manual QC protocol (MQC) to 10,295 T1 baseline scans:

  • Visual inspection of the full volume in all three planes to identify areas of signal dropout or large defects
  • Slice-by-slice assessment for smaller problems that would require manual edits
  • Rating of scan quality from 1 (no or only a small number of problems that would require ~30 minutes for manual editing) to 4 (very large number of problems that would be impractical to edit)

In the ABCD study, all but 325 scans were designated as “recommended for use.” In the current study, the 10,295 scans fell disproportionately within higher MQC groups: only 0.4% were rated MQC=1 and 1.4% were MQC=2, whereas 10.6% were MQC=3 and 48.9% were MQC=4.

Effects of Image Quality on Cortical Measurements

Automated measures of cortical thickness, surface area, and volume are commonly used as markers of psychopathology in neuroimaging research. Using FreeSurfer software, the researchers found increasingly strong effects on each of these measures as MQC ratings worsened. 

The surface hole number (SHN) outperformed all other automated QC metrics in predicting MQC ratings. By combining SHN data with MQC ratings, the researchers developed a four-tier, automated sMRI QC rubric that reliably classified the quality of individual scans. 

Risk of Error in Clinical Analyses

The researchers also demonstrated that unaccounted variance in scan quality can affect associations between MRI and clinical indices. When only the highest-quality (MQC=1) scans were included in analyses corrected for SHN (n=4,617), three cortical regions showed significant associations between cortical volume and the Child Behavior Checklist externalizing symptoms score. The number of significant associations increased as QC thresholds were relaxed, such that when all available scans were included without regard to MQC ratings and without SHN correction (n=10,257), 43 regions were significantly associated.

These findings suggest that using only the highest-quality scans can result in underpowered analyses (type II error), whereas inclusion of lower-quality scans can result in false-positives (type I error).

Effects of Manual Edits on sMRI Measurements

150 baseline scans with MQC=1 and 30 with MQC=2 were randomly selected for manual cortical edits. The effects of the edits were most pronounced for cortical thickness and volume, both of which tended to decrease. The changes reached statistical significance for cortical thickness in 40 regions and cortical volume in 28 regions.

To assess the reproducibility and developmental specificity of the effects of cortical editing, the researchers compared ABCD results to those of a second, non-overlapping MRI cohort of 292 youths, ages 8–18 years, who underwent MRI at Massachusetts General Hospital and were assessed as being free of pathology. Of the 40 regions that demonstrated significant effects of manual edits on thickness in ABCD, 18 showed nominally significant effects of edits in the same direction in the Mass General cohort. Differences in pre-edit to post-edit mean thickness were greater at ages 8–10 years than at other ages.

Guidance for Manual and Automated QC

Beyond best practices to minimize participant motion, relatively labor-intensive approaches — visual QC, manual editing, and the use of automated measures such as SHN — are needed to protect against errant sMRI findings in youth cohorts. The authors have posted an extended protocol on Zenodo that offers specific guidance, including a time-saving approach based solely on SHN.

Learn about the Mass General Early Brain Development Initiative

Learn about the Mass General Department of Psychiatry

Related

Heidi I. L. Jacobs, PhD, and colleagues determined the greater density of tau tangles in the locus coeruleus may degrade cognition and social function independently of Alzheimer's disease pathology and before the onset of cognitive impairment. And that the amount of social activity can partially explain this.

Related

Erica Greenberg, MD and colleagues found evidence in the medical literature of a bidirectional relationship between "the Tourette triad" and sleep symptoms: sleep problems in individuals with tics, OCD, ADHD may in part be caused by the same shared impaired functional connectivity in brain circuitry that leads to symptoms of OCD, tic disorders, and ADHD.