基于机器学习构建抗菌药物相关性发热的预测模型与验证

Construction of prediction model for antibacterial drugs-associated fever based on machine learning and its validation

  • 摘要: 目的 构建抗菌药物相关性发热(ADF)的预测模型,为抗菌药物合理使用提供参考。方法 收集2019年1月1日-2024年12月31日焦作市人民医院收治的可能为药物热的不良反应案例,提取患者临床数据按7∶3分层随机分为训练集和测试集,采用LASSO回归和logistic回归筛选最终预测变量,基于9种机器算法构建预测模型,筛选最佳模型,绘制受试者工作特征(ROC)曲线、决策曲线(DCA)、精确率-召回率(PR)曲线、校准曲线分析模型性能,基于沙普利加性解释(SHAP)方法解释预测变量对模型的影响。结果 共有204例患者纳入研究,发生ADF有96例。嗜酸粒细胞百分比、单核细胞百分比、红细胞数目用药前后的差值,以及最高体温时的中性粒细胞数目和红细胞分布宽度标准差是发生ADF的特征变量。类别型特征梯度提升机(Catboost)为最佳模型,测试集中的曲线下面积(AUC)=0.846,阈值概率在25%~78%范围内预测结果具有临床正收益。SHAP分析显示特征变量重要性排序依次为嗜酸性粒细胞百分比差值、单核细胞百分比差值、最高体温时的红细胞分布宽度标准差、红细胞数目差、最高体温时的中性粒细胞数目。结论 嗜酸性粒细胞百分比、单核细胞百分比、红细胞数目用药前后的差值与最高体温时的中性粒细胞数目和红细胞分布宽度标准差为发生ADF的特征变量。基于这5个变量构建的Catboost预测模型具有最佳的预测效果,在一定程度上能够辅助临床做出合理的治疗决策。

     

    Abstract: OBJECTIVE To establish the prediction model for antibacterial drugs-associated fever (ADF) so as to provide guidance for reasonable use of antibacterial drugs. METHODS The patients who had adverse reactions that might be ADF and were treated in Jiaozuo People's Hospital from Jan. 1, 2019 to Dec. 31, 2024 were enrolled in the study, the clinical data were extracted and randomly divided into the training set and the test set in a 7∶3 ratio. The final predictive variables were screened out by LASSO regression and logistic regression, the prediction models were established based on 9 types of machine algorithms, the optimal models were screened out. The receiver operating characteristic (ROC) curves, decision curves for decision curve analysis (DCA), precision-recall (PR)curves and calibration curves were drawn to analyze the efficiencies of the models, and the impacts of the predictive variables on the models were interpreted through Shapley additive explanation (SHAP) method. RESULTS A total of 204 patients were enrolled in the study, 96 of whom had ADF. The percentage of eosinophilic granulocyte, percentage of monocytes, difference value of red cell counts before and after drug therapy and neutrophil counts and standard deviation of red blood cell distribution width at the highest body temperature were the characteristic variables for ADF. Categorical feature gradient boosting machine (Catboost) was verified as the optimal model. The area under the curve (AUC) was 0.846 in the test set, and the prediction result with the threshold probability ranging between 25% and 78% yielded the positive clinical benefit. SHAP analysis indicated that the characteristic variable, ranking in the order of importance, were as follows: the difference value of eosinophils percentage, difference value of monocyte percentage, standard deviation of red cell distribution width at the highest body temperature, difference value of red cell counts, neutrophils counts at the highest body temperature. CONCLUSIONS The eosinophils percentage, monocytes percentage, difference value of red cells counts before and after drug therapy and neutrophils counts and standard deviation of red cell distribution width at the highest body temperature are the characteristic variables for ADF. The Catboost prediction model that is established based on the five variables can achieve the most remarkable predictive effect and, to some extent, may assist clinicians in making reasonable treatment decisions.

     

/

返回文章
返回