基于常规检验指标预测血流感染的机器学习模型构建与评价

Machine learning model for prediction of bloodstream infections established based on routine test indexes and its predictive efficiency

  • 摘要:
    目的 探讨利用常规检验数据构建预测细菌性血流感染的机器学习模型, 并进行评价。
    方法 采用回顾性调查研究方法, 收集2015年1月-2022年12月3家医疗机构的住院患者共计5 421例为研究对象, 其中血流感染组1 914例, 非血流感染组3 507例。收集包括性别和年龄在内的一般资料和实验室常规检测结果。分别使用逻辑回归、支持向量机和随机森林3种机器学习算法进行最优预测模型筛选, 通过SHAP解释特征对模型预测能力的贡献程度, 采用递归特征消除法优化模型特征变量和受试者工作特征(ROC)曲线下面积(AUC)评价模型预测效能。
    结果 共纳入年龄、性别和血常规指标的26个变量。选择随机森林作为最优机器学习算法构建预测血流感染模型, 其准确度和AUC值分别为0.709和0.706。SHAP解释结果表明, 年龄、红细胞压积和红细胞体积分布宽度-CV对模型做出正确决策有显著影响。在区分革兰阳性细菌和革兰阴性细菌感染的预测模型中, 与26个变量相比, 17个变量则具有更好效果, 其AUC值为0.715, 灵敏度和特异度分别为0.701和0.632。
    结论 通过机器学习算法, 利用血常规数据可以预测细菌性血流感染;同时特征选择策略可以在降低维度基础上进一步提高模型预测性能。

     

    Abstract:
    OBJECTIVE To explore and evaluate the machine learning model for prediction of bacterial bloodstream infections established based on routine test data.
    METHODS By means of retrospective survey, a total of 5 421 patients who were hospitalized in 3 medical institutions from Jan. 2015 to Dec. 2022 were recruited as the research subjects, 1 914 of whom were assigned as the bloodstream infection group, and 3 507 were assigned as the non-bloodstream infection group. The baseline data including gender and age and the results of routine laboratory tests were collected from the enrolled patients. The 3 types of machine learning algorithms, logistic regression, support vector machine and random forest, were respectively used for the screening of the optimal prediction model; the contribution of feature variables to the predictive capability of the model was interpreted through SHAP. The feature variables of the model were optimized by using recursive feature elimination method, and the predictive efficiency of the model was evaluated by the area under the curve (AUC) of receiver operating characteristic (ROC) curves.
    RESULTS Totally 26 variables involving age, gender and blood routine test indexes were included. The random forest was chosen as the optimal machine learning algorithm for the establishment of prediction model for bloodstream infections, and the accuracy of the model was 0.709, with the AUC 0.706. The result of SHAP explanation indicated that the age, hematokrit and erythrocyte volume distribution width-CV had remarkable effect on the model′s making right decisions. 17 variables of the prediction model showed more remarkable effect than 26 variable on distinguishing from the gram-positive bacteria bloodstream infections from the gram-negative bacteria bloodstream infections, with the AUC 0.715, the sensitivity 0.701, the specificity 0.632.
    CONCLUSIONS The prediction model that is established based on the blood routine test indexes by machine learning algorithm can predict the bacterial bloodstream infection. Meanwhile, the feature selection strategy can further improve the predictive efficiency of the model on basis of lowering the dimensionality.

     

/

返回文章
返回