September 6, 2022

How we applied AI to prevent sepsis in preterm babies

A case study on using XGBoost for time series forecasting to predict the onset of sepsis in preterm infants within a 12-hour prediction horizon.

Improving the chances of preterm-born infants

About 20% of the preterm-born infants admitted to the newborn intensive care unit (NICU) will develop sepsis which is related to a higher mortality and adverse long-term effects. In the AI for Health - Sepsis Prevention project we applied machine learning to accurately predict whether a preterm baby is going to develop sepsis. Sepsis is a reaction to an infection and can be life-threatening. Early prediction of the onset gives doctors the necessary time to apply preventative measures. 

Healthcare professionals meet data scientists

Our team of 4 data scientists worked for 20 weeks to address the problem. We worked in a very close collaboration with the hospital. Intensive care is an extremely sensitive topic in medicine and we had to establish good basic comprehension of the application area. Since we were all located in the Netherlands, we even got a chance to visit the premises of the NICU in person for some hands-on experience.  

AI for Health engineers at the NICU of UMC Utrecht
We decided to split the team and work on two solutions. One was trying to improve the existing logistic regression model that UMC Utrecht developed. The other one was to create a new XGboost model that would outperform the improved existing model. 

Existing logistic regression model got an upgrade

We started with improving the already existing logistic regression model UMC Utrecht used. We applied two methods: 

  1. hyperparameter optimization and
  2. feature engineering

For the hyperparameter optimization we used random search and grid search with cross validation. We did this on the original set of features on only a portion of the patients due to computational limitations . We still managed to improve the ROC/ AUC score from 0.56 to 0.67.

The goal of the second approach was to create new features based on the minimum, mean, variance, peaks, and drops of those original features which have a low number of null values. We computed their feature importance by using a simple regression model and gradient boosting model to measure each feature’s influence on the outcome of the model.

We computed a correlation matrix, which gave us the opportunity to manually explore possible feature combinations for our logistic regression model. Finally, we used a sequential feature selector to find the best 5 to 8 features for our model.  We found that the best combination of features was: 

  • HF mean (2h) - heart frequency mean over a 2 hour interval;
  • SpO2 drops (2h) - how many oxygen saturation drops occur in a 2 hour interval;
  • HF variance (2h) - heart frequency variance over a 2 hour interval;
  • Bradycardia (2h) - how often a too slow heart rate occurs in a 2 hour interval;
  • AdemF mean -  breathing frequency.

Despite improving the ROC/ AUC score from 0.56 to 0.67 we still operated at a near  chance level. The change had to be more fundamental for a usable outcome. We had to reformulate the problem.  We’ve realized quite early that:

  • The logistic regression model didn’t cater to the problem - it just flagged an off-the-chart value (or combination of values) when something was already happening. This might not give the medical staff the convenient time window that would ideally be sought after, to maximize the probability of a successful intervention.
  • We would have to move away from the real-time analysis of a data stream  approach to predicting future events from chunks historical of data. A classic time series forecasting problem!

“We needed a model to predict which patient will develop sepsis within the next 12 hours. A traffic police officer that doesn’t just direct & flag real-time data traffic, but one that knows on which intersection important stuff will happen. Show me what traffic crosses the intersection right now and  I’ll tell you what happens in the next 12 hours of time - that’s the goal. ” - *Kamal Elsayed, AI for Health engineer *

*Developing a new XGBoost time series forecasting model *

The goal here was to create an XGBoost time series forecasting classification model that predicts the onset of sepsis in preterm infants within a 12-hour prediction horizon

The model was trained on these features: 

  1. arterial blood pressure
  2. diastole arterial blood
  3. pressure systole
  4. incubator measured temperature
  5. monitor temperature
  6. heart rate pulse
  7. heart rate pleth
  8. monitor heart rate
  9. respiratory rate
  10. O2 saturation
  11. gestation age
  12. gender

Data Pre-processing

The aforementioned features are minute-by-minute time-series data streams which were recorded from a set of invasive and non-invasive medical instruments in the incubator. Each feature data stream could extend to multiple days. An event feature shows per timestamp if a notable medical intervention or an administrative event occurred. Notable events include: admission, discharge, death, negative or positive blood cultures. A positive blood culture confirms a positive sepsis case, and the corresponding timestamp is marked as sepsis onset (t_sepsis).

For every patient, each of the 10 physiological markers was subset to the 12 hours of data that directly preceded a sepsis timestamp in a case patient, or control timestamp in a control patient. Number of positive patients equaled the number of negative. In order to maintain a balanced dataset, 398 control patients were pseudo randomly drawn from the 2196 control pool. The selection was constrained to match the distribution of the gestation age.   

Over this extracted 12 hours segment, a sliding window of length 3 hours was run on every physio-marker to aggregate a set of 8 statistical features. This created a total of 320 training features per patient (10 physiological x 8 statistical x 4 Intervals (3hrs)). Additionally, we added the gestation age and gender as features.

The targets used in model training were derived from the event feature. A 12 hours segment from a patient was assigned the positive class if it directly preceded a sepsis timestamp. If no sepsis event followed, the segment was assigned the negative class.

*Model Validation *

This model was trained and evaluated using a *repeated nested cross-validation procedure *to simultaneously search for the optimal parameters and evaluate the test scores. Both the inner and outer cross-validation loops used k = 4 and each loop was repeated 10 times. The inner loop used a random search over a set of probability distributions. 

The model reached an average precision of .90 on the test set as seen in the graph below. This shows promise for actual implementation of the model. Further testing on new unseen data should prove generalizability and clinical feasibility. 

Results of cross-validation

Explainable AI in action

We developed a prediction interpretability analysis of the XGBoost model using SHAP. *SHAP *is a novel model agnostic technique that is used to explain predictions and model the decision process using so-called Shapley’s values. 

Top 6 features, averaged absolute SHAP 
Top 6 features, SHAP of individual predictions
The most important findings from this model were:

  1. Minimal incubator measured temperature (int. 3) had highest average absolute SHAP values.
  2. Both mean and median heart frequency had high impact in the 20 most impactful features.
  3. Most prominent features without consideration of filters were then incubator measured temperature, arterial blood pressure systole and heart frequency.
  4. Unlike in the original logistic regression model, in our XGBoost model O2 saturation was significantly less dominant. It wasn’t present in the top 10 features.

Due to a different pre-processing of the dataset we couldn’t compare the performance of the 2 models and the influence of measured features one-to-one. The XGboost superiority lies in its prediction capability - it gives the hospital enough advance time to pay attention to specific patients. Our suggestion to the hospital team was to further validate the model accuracy on patient data in clinical practice.  

Missing data held back more advanced techniques

Our team tried to include a more advanced filtering technique - the fast Fourier transform, to build new features from the heart rate variable. But this technique proved to be difficult to implement due to a high number of missing data. Dealing with the missing data and working in a virtual environment with limited memory proved to be the biggest hurdles throughout the Challenge. 

We know that it’s extremely difficult to do measurements on preterm babies consistently but it would be of great benefit. Measuring patient data without interruption would ensure a low number of missing values for crucial features like heart rate or O2 saturation; improving the prediction models immensely.  

*What I learnt about applying ML in real life (and about the need for Explainable AI) *

What surprised me the most in this Challenge, was really how many factors come into play when you design a machine learning solution for a particular purpose. It is not just about which algorithm performs the best. In this case it was very important that the outcome of the algorithm was explainable. Doctors have to make medical decisions based on these outputs and therefore can’t just trust it blindly.

UMC Utrecht considered our results a success and is already planning similar initiatives to deploy AI for clinical purposes. Both sides learned a lot from each other; our team of data scientists got seasoned in medical AI and the hospital got valuable machine learning models as well as a blueprint for similar projects. 

AI for Health - Preventing Sepsis team of engineers
I’d love to give a shout out to the entire AI against Sepsis team. We did good and learned on the way. By improving the existing and creating a new model, we hope that more preterm babies’ sepsis can be signaled early on. When the babies receive their treatment earlier, severe consequences will be prevented and this might even save some lives.

Laura Didden

AI for Health Engineer 

*AI for Health - Predicting Sepsis Team: *Kamal Elsayed, Simon Sukup, Simona Stoyanova, Laura Didden

AI for Health
Time Series Forecasting
Explainable AI
Challenge results
Subscribe to our newsletter

Be the first to know when a new AI for Good challenge is launched. Keep up do date with the latest AI for Good news.

* indicates required
Thank you!

We’ve just sent you a confirmation email.

We know, this can be annoying, but we want to make sure we don’t spam anyone. Please, check out your inbox and confirm the link in the email.

Once confirmed, you’ll be ready to go!

Oops! Something went wrong while submitting the form.