Skip to content

Electronic Health Records Can Be Used to Validate Genetic Determinants of Treatment Response in IBD

Key findings

  • Development of tools to personalize therapeutic decisions in patients with inflammatory bowel disease (IBD) has been limited by the need for carefully followed prospective cohorts
  • In this study of 326 IBD patients on tumor necrosis factor (TNF) inhibitor therapy, nonresponse scores were derived from the free text notes in electronic health records (EHRs) by a natural language processing system
  • The nonresponse scores were significantly inversely correlated with genetic risk scores, and the associations were similar for Crohn's disease and ulcerative colitis
  • Individually, the genetic risk scores were significantly inversely correlated with narrative mentions of diarrhea, fatigue and bleeding
  • The findings suggest that narrative text in EHRs can be used to efficiently validate genetic determinants of durable response to anti-TNF therapy in IBD

The drug classes that treat Crohn's disease (CD) and ulcerative colitis (UC) have different immunologic mechanisms of action, and biomarkers are needed that predict response to each class. A barrier to developing such tools is the need for large prospective cohorts and systematic assessment of treatment response using validated indexes.

Data in electronic health records (EHRs) have been used to classify diseases, and now Ashwin N. Ananthakrishnan, MD, MPH, director of the Crohn's and Colitis Center at Massachusetts General Hospital, Ramnik J. Xavier, MD, PhD, director of the Center for the Study of IBD, and colleagues have extended that effort to defining therapeutic outcomes in inflammatory bowel disease (IBD). In Clinical Gastroenterology and Hepatology, they report that narrative text from EHRs can be used efficiently to validate genetic determinants of durable response to tumor necrosis factor (TNF) inhibitor therapy.

Study Design

The researchers integrated data from two sources:

  1. A prospective patient registry: In a previous report published in Inflammatory Bowel Diseases, the researchers described weighted genetic risk scores that predicted a durable response to anti-TNF therapy in IBD. These consisted of IBD-related and other immune function–related single nucleotide polymorphisms sequenced on a microchip
  2. An EHR–based IBD cohort: In another previous study also published in Inflammatory Bowel Diseases, the researchers searched the narrative (free text) portion of EHRs over the year after patients with IBD initiated anti-TNF therapy. Using a natural language processing system, they validated a "likelihood of nonresponse score" comprising the weighted number of narrative mentions of concepts of "diarrhea" and "fatigue" during inpatient and outpatient care. The scores successfully differentiated complete, partial and nonresponders

The current study included 326 patients (69% CD, 31% UC), of whom 271 were on infliximab and 55 were on adalimumab. Forty-three patients were non-overlapping with prior publications.

The researchers evaluated the relationship between the weighted genetic risk scores and the outcomes of the natural language processing, including both the nonresponse score and individual symptoms of IBD.


The genetic risk scores ranged from seven to 25 for both CD and UC, with higher scores indicating a greater likelihood of durable response. In the entire cohort, the narrative nonresponse score demonstrated a significant inverse correlation with the genetic risk scores (ρ = −.15, P = .009).

Individually, the genetic risk scores were significantly inversely correlated with narrative mentions of diarrhea, fatigue and bleeding, although not with pain. The associations between genetic risk scores and narrative nonresponse scores were similar for CD and UC.

On multivariable analysis, patients in the highest tertile of the genetic risk scores had a significantly lower narrative nonresponse scores than those in the lowest tertile (P for trend = .035).

Implications for Translational Research

EHR-based IBD cohorts offer considerable promise to expand the sample sizes of studies that seek to identify biomarkers or validate biomarker discoveries. This should increase efficiency, lower the cost of the process and increase applicability across institutions and diseases. In the future, machine learning methods may even be useful to interpret imaging and endoscopy results.

Visit the Division of Gastroenterology

Refer a patient to the Division of Gastroenterology

Related topics


In patients with acute severe ulcerative colitis, mural stratification seen on multidetector computed tomography predicts poor response to intravenous steroids and the need for inpatient medical or surgical rescue therapy.


Massachusetts General Hospital researchers have charted the course of severe checkpoint inhibitor–related colitis, including the fact that it usually develops within three months, and that half of the patients require second-line immunosuppression.