Machine learning model provides rapid prediction of C. difficile infection risk

Every year nearly 30,000 Americans die from an aggressive, gut-infecting bacteria called Clostridium difficile (C. difficile), which is resistant to many common antibiotics and can flourish when antibiotic treatment kills off beneficial bacteria that normally keep it at bay. Investigators from Massachusetts General Hospital (MGH), the University of Michigan (U-M) and Massachusetts Institute of Technology (MIT) now have developed investigational "machine learning" models, specifically tailored to individual institutions, that can predict a patient's risk of developing C. difficile much earlier than it would be diagnosed with current methods. Preliminary data from their study, which is being published today in Infection Control and Hospital Epidemiology, were presented last October at the ID Week 2017 conference.

The authors note that most previous models of C. difficile infection risk were designed as "one size fits all" approaches and included only a few risk factors, which limited their usefulness. Co-lead authors Jeeheh Oh, a U-M graduate student in Computer Science and Engineering, and Maggie Makar, MS, of MIT's Computer Science and Artificial Intelligence Laboratory and their colleagues took a "big data" approach that analyzed the whole electronic health record (EHR) to predict a patient's C. difficile risk throughout the course of hospitalization. Their method allows the development of institution-specific models that could accommodate different patient populations, different EHR systems and factors specific to each institution.

"When data are simply pooled into a one-size-fits-all model, institutional differences in patient populations, hospital layouts, testing and treatment protocols, or even in the way staff interact with the EHR can lead to differences in the underlying data distributions and ultimately to poor performance of such a model," says IHPI member Jenna Wiens, PhD, assistant professor of Computer Science and Engineering at U-M and co-senior author of the study. "To mitigate these issues, we take a hospital-specific approach, training a model tailored to each institution."

Using their machine-learning-based model, the investigators analyzed de-identified data - including individual patient demographics and medical history, details of their admission and daily hospitalization, and the likelihood of exposure to C. difficile - from the EHRs of almost 257,000 patients admitted to either MGH or to Michigan Medicine - U-M's academic medical center - over periods of two years and six years, respectively. The model generated daily risk scores for each individual patient that, when a set threshold is exceeded, classify patients as at high risk.