The Bear Necessity of AI in Conservation
The AI for Bears Challenge results which aims to improve the monitoring and identification of bears using advanced computer vision techniques.
About 20% of the preterm-born infants admitted to the newborn intensive care unit (NICU) will develop sepsis which is related to a higher mortality and adverse long-term effects. In the AI for Health - Sepsis Prevention project we applied machine learning to accurately predict whether a preterm baby is going to develop sepsis. Sepsis is a reaction to an infection and can be life-threatening. Early prediction of the onset gives doctors the necessary time to apply preventative measures.
Our team of 4 data scientists worked for 20 weeks to address the problem. We worked in a very close collaboration with the hospital. Intensive care is an extremely sensitive topic in medicine and we had to establish good basic comprehension of the application area. Since we were all located in the Netherlands, we even got a chance to visit the premises of the NICU in person for some hands-on experience. We decided to split the team and work on two solutions. One was trying to improve the existing logistic regression model that UMC Utrecht developed. The other one was to create a new XGboost model that would outperform the improved existing model.
We started with improving the already existing logistic regression model UMC Utrecht used. We applied two methods:
For the hyperparameter optimization we used random search and grid search with cross validation. We did this on the original set of features on only a portion of the patients due to computational limitations . We still managed to improve the ROC/ AUC score from 0.56 to 0.67.
The goal of the second approach was to create new features based on the minimum, mean, variance, peaks, and drops of those original features which have a low number of null values. We computed their feature importance by using a simple regression model and gradient boosting model to measure each feature’s influence on the outcome of the model.
We computed a correlation matrix, which gave us the opportunity to manually explore possible feature combinations for our logistic regression model. Finally, we used a sequential feature selector to find the best 5 to 8 features for our model. We found that the best combination of features was:
Despite improving the ROC/ AUC score from 0.56 to 0.67 we still operated at a near chance level. The change had to be more fundamental for a usable outcome. We had to reformulate the problem. We’ve realized quite early that:
“We needed a model to predict which patient will develop sepsis within the next 12 hours. A traffic police officer that doesn’t just direct & flag real-time data traffic, but one that knows on which intersection important stuff will happen. Show me what traffic crosses the intersection right now and I’ll tell you what happens in the next 12 hours of time - that’s the goal. ” - *Kamal Elsayed, AI for Health engineer *
The goal here was to create an XGBoost time series forecasting classification model that predicts the onset of sepsis in preterm infants within a 12-hour prediction horizon.
The model was trained on these features:
The aforementioned features are minute-by-minute time-series data streams which were recorded from a set of invasive and non-invasive medical instruments in the incubator. Each feature data stream could extend to multiple days. An event feature shows per timestamp if a notable medical intervention or an administrative event occurred. Notable events include: admission, discharge, death, negative or positive blood cultures. A positive blood culture confirms a positive sepsis case, and the corresponding timestamp is marked as sepsis onset (t_sepsis).
For every patient, each of the 10 physiological markers was subset to the 12 hours of data that directly preceded a sepsis timestamp in a case patient, or control timestamp in a control patient. Number of positive patients equaled the number of negative. In order to maintain a balanced dataset, 398 control patients were pseudo randomly drawn from the 2196 control pool. The selection was constrained to match the distribution of the gestation age.
Over this extracted 12 hours segment, a sliding window of length 3 hours was run on every physio-marker to aggregate a set of 8 statistical features. This created a total of 320 training features per patient (10 physiological x 8 statistical x 4 Intervals (3hrs)). Additionally, we added the gestation age and gender as features.
The targets used in model training were derived from the event feature. A 12 hours segment from a patient was assigned the positive class if it directly preceded a sepsis timestamp. If no sepsis event followed, the segment was assigned the negative class.
This model was trained and evaluated using a *repeated nested cross-validation procedure *to simultaneously search for the optimal parameters and evaluate the test scores. Both the inner and outer cross-validation loops used k = 4 and each loop was repeated 10 times. The inner loop used a random search over a set of probability distributions.
The model reached an average precision of .90 on the test set as seen in the graph below. This shows promise for actual implementation of the model. Further testing on new unseen data should prove generalizability and clinical feasibility.
We developed a prediction interpretability analysis of the XGBoost model using SHAP. *SHAP *is a novel model agnostic technique that is used to explain predictions and model the decision process using so-called Shapley’s values. The most important findings from this model were:
Due to a different pre-processing of the dataset we couldn’t compare the performance of the 2 models and the influence of measured features one-to-one. The XGboost superiority lies in its prediction capability - it gives the hospital enough advance time to pay attention to specific patients. Our suggestion to the hospital team was to further validate the model accuracy on patient data in clinical practice.
Our team tried to include a more advanced filtering technique - the fast Fourier transform, to build new features from the heart rate variable. But this technique proved to be difficult to implement due to a high number of missing data. Dealing with the missing data and working in a virtual environment with limited memory proved to be the biggest hurdles throughout the Challenge.
We know that it’s extremely difficult to do measurements on preterm babies consistently but it would be of great benefit. Measuring patient data without interruption would ensure a low number of missing values for crucial features like heart rate or O2 saturation; improving the prediction models immensely.
What surprised me the most in this Challenge, was really how many factors come into play when you design a machine learning solution for a particular purpose. It is not just about which algorithm performs the best. In this case it was very important that the outcome of the algorithm was explainable. Doctors have to make medical decisions based on these outputs and therefore can’t just trust it blindly.
UMC Utrecht considered our results a success and is already planning similar initiatives to deploy AI for clinical purposes. Both sides learned a lot from each other; our team of data scientists got seasoned in medical AI and the hospital got valuable machine learning models as well as a blueprint for similar projects. I’d love to give a shout out to the entire AI against Sepsis team. We did good and learned on the way. By improving the existing and creating a new model, we hope that more preterm babies’ sepsis can be signaled early on. When the babies receive their treatment earlier, severe consequences will be prevented and this might even save some lives.
Laura Didden
AI for Health Engineer
*AI for Health - Predicting Sepsis Team: *Kamal Elsayed, Simon Sukup, Simona Stoyanova, Laura Didden