基于多算法改进机器学习结核性胸膜炎的风险预测模型研究

Research on risk prediction model for tuberculous pleuritis based on multi-algorithm improved machine learning

  • 摘要: 目的 利用机器学习构建结核性胸膜炎(TPE)的风险预测模型。方法 选取2020年1月-2025年2月南通市第六人民医院收治的胸腔积液患者1 519例,其中将纳入数据集的1 434例患者中结核性胸膜炎患者527例,非结核性胸膜炎患者907例,按7∶3随机分为训练集和验证集,收集患者的入院资料、胸腔积液及外周血中腺苷脱氨酶(ADA)、红细胞(RBC)、癌胚抗原(CEA)等39项指标。通过单变量分析、Lasso回归、Boruta特征选择算法筛选潜在的预测因子。采用7种机器学习建立预测模型。使用受试者工作曲线下面积(AUC)、灵敏度、特异性、准确性、F1分数、校准曲线和决策曲线(DCA)评估结核性胸膜炎的预测模型性能,并确定最优模型。最后使用沙普利加性解释(SHAP)对最优模型进行解释,分析特征的作用机制及对分类性能的影响。结果 3种特征选择方法筛选出10个共同预测因子。在7个机器学习模型中,随机森林(RF)模型表现最佳,AUC、F1分数均最高(0.906、0.786)。另外校准曲线与DCA曲线分析均表明该模型性能较优。对RF模型的特征变量SHAP解释分析显示,盗汗、乏力、发热、年龄、胸腔积液ADA、胸腔积液RBC、胸腔积液CEA、血乳酸脱氢酶(LDH)、血中性粒细胞(NC)、血ADA是TPE的预测因素。结论 基于RF算法构建的TPE诊断模型具有最优质的诊断性能,可以更加简单、快速、有效地识别TPE。

     

    Abstract: OBJECTIVE To establish the risk prediction model for tuberculous pleuritis (TPE) by machine learning algorithm (MLA). METHODS A total of 1519 patients with pleural effusion who were treated in Nantong Sixth People's Hospital from Jan. 2020 to Feb. 2025 were enrolled in the study, 1434 patients, including 527 patients with tuberculous pleuritis and 907 patients with non-tuberculous pleuritis, were randomly divided into the training set and the validation set in a 7∶3 ratio. The medical data and 39 indictors of pleural effusion and peripheral blood indicators such as adenosine deaminase (ADA), red blood cell (RBC) and carcino-embryonic antigen (CEA) were randomly divided into the training set and the validation set in a 7∶3 ratio. The potential predictive factors were screened out by means of univariate analysis, Lasso regression, and Boruta feature selection algorithm. The prediction model was established based on 7 types of machine learning. The efficiency of the model in prediction of TPE was assessed by the area under the curve (AUC), sensitivity, specificity, accuracy, F1 score, calibration curve, and decision curve analysis (DCA), and the optimal model was determined. The optimal model was finally interpreted by SHAP, and the action mechanisms of the features and the impact on classifying performance were observed. RESULTS Totally 10 predictive factors in common were screened out by three types of feature selection methods. Random forest (RF) model performed the best among the 7 types of machine learning, with the AUC (0.906) and F1 score (0.786) the highest. Both the calibration curve and DCA analysis indicated that the model had better performance. The analysis of the characteristic variables of RF model based on SHAP algorithm showed that night sweats, fatigue, fever, age, pleural effusion ADA, pleural effusion RBC, pleural effusion CEA, blood lactate dehydrogenase (LDH), blood neutrophils (NC) and blood ADA were the predictive factors for TPE. CONCLUSIONS The TPE diagnosis model that is established based on RF algorithm has the best diagnostic performance, and it can identify TPE more conveniently, quickly and effectively.

     

/

返回文章
返回