基于机器学习的肺炎克雷伯菌血流感染患者预后不良预测模型的构建与评估

Development and assessment of a machine learning-based predictive model for poor prognosis in patients with Klebsiella pneumoniae bloodstream infection

  • 摘要:
    目的 构建一种可解释的机器学习模型,用于预测肺炎克雷伯菌血流感染患者发生预后不良的风险。
    方法 回顾性分析2019年1月1日-2024年12月31日南京医科大学附属江宁医院393例肺炎克雷伯菌血流感染患者的临床资料,采用lasso回归和logistic回归筛选变量,并将其纳入机器学习模型。分别采用逻辑回归、决策树、随机森林、极限梯度提升、轻量级梯度提升机、支持向量机、人工神经网络7种机器学习方法构建肺炎克雷伯菌血流感染患者发生预后不良的预测模型。通过计算F1分数、受试者工作特征曲线下面积等评估模型预测效能,SHAP值用于评估每个特征在性能最佳模型中的贡献。
    结果 7种机器学习模型中,逻辑回归、决策树、随机森林、极限梯度提升、轻量级梯度提升机、支持向量机和人工神经网络在测试集中的准确度分别为0.853、0.844、0.933、0.875、0.875、0.718和0.853;精确率分别为0.957、0.940、0.974、0.957、0.957、0.897和0.957;F1分数分别为0.920、0.885、0.949、0.918、0.918、0.823和0.921;曲线下面积分别为0.985、0.959、0.987、0.982、0.985、0.983和0.985。其中随机森林模型预测性能最佳。SHAP分析确定了影响预测的关键因素,包括感染性休克、既往抗菌药物使用和ICU/EICU入住史。
    结论 随机森林模型在预测肺炎克雷伯菌血流感染预后方面性能最佳,SHAP分析突出了关键风险因素,可为辅助诊断提供有效帮助。

     

    Abstract:
    OBJECTIVE  To develop an interpretable machine learning model to predict the risk of poor prognosis in patients with Klebsiella pneumoniae bloodstream infection (BSI).
    METHODS  Clinical data from 393 patients with K. pneumoniae BSI admitted to the Affiliated Jiangning Hospital of Nanjing Medical University between Jan. 1, 2019, and Dec. 31, 2024, were retrospectively analyzed. Seven machine learning methods, including logistic regression, decision tree, random forest, extreme gradient boosting, light gradient boosting machine, support vector machine and artificial neural network, were employed to construct prediction models for poor prognosis in these patients. Model predictive performance was evaluated through metrics such as F1 score and area under the receiver operating characteristic curve (AUC-ROC). SHAP values were adopted to assess the contribution of each feature in the best-performing model.
    RESULTS  Among the seven machine learning models evaluated (logistic regression, decision tree, random forest, extreme gradient boosting, light gradient boosting machine, support vector machine and artificial neural network), the random forest model demonstrated the best predictive performance. The accuracy rates for the models in the test set were 0.853, 0.844, 0.933, 0.875, 0.875, 0.718 and 0.853, respectively. The precision rates were 0.957, 0.940, 0.974, 0.957, 0.957, 0.897 and 0.957, respectively. The F1 scores were 0.920, 0.885, 0.949, 0.918, 0.918, 0.823 and 0.921, respectively. The AUC values were 0.985, 0.959, 0.987, 0.982, 0.985, 0.983 and 0.985, respectively. SHAP analysis identified key influencing factors, including septic shock, prior antimicrobial use and ICU/EICU admission history.
    CONCLUSIONS  The random forest model exhibited optimal performance in predicting the prognosis of K. pneumoniae BSI. SHAP analysis highlighted critical risk factors, thereby providing valuable foundation for auxiliary diagnosis.

     

/

返回文章
返回