Abstract:
OBJECTIVE To explore and evaluate the machine learning model for prediction of bacterial bloodstream infections established based on routine test data.
METHODS By means of retrospective survey, a total of 5 421 patients who were hospitalized in 3 medical institutions from Jan. 2015 to Dec. 2022 were recruited as the research subjects, 1 914 of whom were assigned as the bloodstream infection group, and 3 507 were assigned as the non-bloodstream infection group. The baseline data including gender and age and the results of routine laboratory tests were collected from the enrolled patients. The 3 types of machine learning algorithms, logistic regression, support vector machine and random forest, were respectively used for the screening of the optimal prediction model; the contribution of feature variables to the predictive capability of the model was interpreted through SHAP. The feature variables of the model were optimized by using recursive feature elimination method, and the predictive efficiency of the model was evaluated by the area under the curve (AUC) of receiver operating characteristic (ROC) curves.
RESULTS Totally 26 variables involving age, gender and blood routine test indexes were included. The random forest was chosen as the optimal machine learning algorithm for the establishment of prediction model for bloodstream infections, and the accuracy of the model was 0.709, with the AUC 0.706. The result of SHAP explanation indicated that the age, hematokrit and erythrocyte volume distribution width-CV had remarkable effect on the model′s making right decisions. 17 variables of the prediction model showed more remarkable effect than 26 variable on distinguishing from the gram-positive bacteria bloodstream infections from the gram-negative bacteria bloodstream infections, with the AUC 0.715, the sensitivity 0.701, the specificity 0.632.
CONCLUSIONS The prediction model that is established based on the blood routine test indexes by machine learning algorithm can predict the bacterial bloodstream infection. Meanwhile, the feature selection strategy can further improve the predictive efficiency of the model on basis of lowering the dimensionality.