基于机器学习的肠球菌血流感染预后不良预测模型的构建与评估

Establishment of prediction model for adverse prognosis of patients with Enterococcus bloodstream infection based on machine learning and its predictive efficiency

  • 摘要: 目的 构建基于机器学习肠球菌血流感染患者发生预后不良的多种预测模型,并评估其预测效能。方法 回顾性分析2021年1月1日-2024年12月31日南京医科大学附属江宁医院收治的128例肠球菌血流感染患者的临床资料,采用Lasso回归和多因素logistic回归筛选与其发生有关联的显著变量,并将其纳入机器学习模型。分别采用逻辑回归、决策树、随机森林、极限梯度提升、轻量级梯度提升机、支持向量机和人工神经网络7种机器学习方法构建预测模型,比较模型的精确率、准确率、灵敏度和F1分数等以评估不同模型的预测效能。结果 逻辑回归、决策树、随机森林、极限梯度提升、轻量级梯度提升机、支持向量机和人工神经网络在测试集中的准确率分别为83.33、84.44、87.78、86.67、82.22、86.67和86.67; 精确率分别为88.24、78.72、85.71、83.72、77.78、83.72和83.72; F1分数分别为0.800、0.841、0.867、0.857、0.814、0.857和0.857; AUC值分别为0.922、0.922、0.952、0.933、0.878、0.916和0.942。其中随机森林模型预测性提示,低蛋白血症是最具影响力的因素。结论 成功构建出预测肠球菌血流感染患者发生预后不良的模型,其中随机森林模型预测效能最佳,可为该类患者临床护理工作提供一个早期预测和防治预后不良发生的有效工具。

     

    Abstract: OBJECTIVE To construct multiple prediction models for adverse prognosis of the patients with Enterococcus bloodstream infection based on machine learning and evaluate its predictive efficiency. METHODS The clinical data of 128 patients with Enterococcus bloodstream infection who were treated in Jiangning Hospital Affiliated to Nanjing Medical University from Jan. 1, 2021 to Dec. 31, 2024 were retrospectively analyzed. The significant variables associated with incidence of the infection were screened out by Lasso regression and multivariate logistic regression and were brought into the machine learning model. The prediction models were constructed by respectively adopting 7 machine learning methods including logistic regression, decision trees, random forests, extreme gradient boosting, lightweight gradient boosting machines, support vector machines and artificial neural networks. The precision, accuracy, sensitivity and F1 score were observed and compared among the 7 models so as to evaluate the predictive efficiencies of the models. RESULTS The accurate rates of logistic regression, decision trees, random forests, extreme gradient boosting, lightweight gradient boosting machines, support vector machines and artificial neural networks were respectively 83.33, 84.44, 87.78, 86.67, 82.22, 86.67 and 86.67 in the test set; the precise rates were 88.24, 78.72, 85.71, 83.72, 77.78, 83.72 and 83.72, respectively; the F1 scores were 0.800, 0.841, 0.867, 0.857, 0.814, 0.857 and 0.857, respectively; the AUCs were 0.922, 0.922, 0.952, 0.933, 0.878, 0.916 and 0.942, respectively. The random forest model showed that the hypoproteinemia was the most influential factor. CONCLUSIONS The models that can predict the adverse prognosis of the patients with Enterococcus bloodstream infection are successfully constructed, among which the random forest model shows the optimal predictive efficiency, and it can serve as an effective tool for early prediction, prevention and treatment of adverse prognosis of such group of patients during clinical nursing practice.

     

/

返回文章
返回