儿童腺病毒感染重症肺炎基于机器学习与SHAP的诊断模型

Diagnostic model for severe pneumonia caused by adenovirus infection in children based on machine learning and SHAP

  • 摘要:
    目的 探究基于机器学习与SHAP的诊断模型在儿童腺病毒感染重症肺炎诊断中的应用价值。
    方法 选择2023年3月-2024年4月于武汉市第三医院就诊的腺病毒感染患儿562例,分为非肺炎组(n=236)和肺炎组(n=326);肺炎组按照3∶1的比例,分为训练集(n=245)和验证集(n=81);训练集根据肺炎程度,分为重症组(n=90)和非重症组(n=155)。采用多因素logistic回归分析儿童腺病毒感染重症肺炎的危险因素,并进行共线性诊断。采用受试者工作特征(ROC)曲线在训练集和验证集中验证模型的预测效能,并选出最优模型。通过SHAP对最优预测模型进行可解释化处理。
    结果 与非肺炎组相比,肺炎组年龄 < 2岁、咳嗽、喘息、肺实变、胸腔积液、混合感染、过敏史的患儿比例及住院天数、热程均更长(P < 0.05)。经过单因素分析及共线性诊断,排除混杂因素后,住院天数(OR=1.112)、热程(OR=1.964)、喘息(OR=2.430)、肺实变(OR=2.546)、混合感染(OR=2.617)、LDH水平(OR=1.613)是儿童腺病毒感染重症肺炎的危险因素(P < 0.05)。根据危险因素所构建的8个机器学习模型中,梯度提升机(GBM)模型预测儿童腺病毒感染重症肺炎的效能最优,训练集和验证集中其曲线下面积(AUC)分别为0.796、0.785。SHAP分析显示,贡献度前4的特征分别为LDH水平、喘息、热程、肺实变。
    结论 GBM模型预测腺病毒感染患儿重症肺炎风险的效能最优,其中LDH水平、喘息、热程、肺实变为重要的预测特征,可为临床诊治提供参考。

     

    Abstract:
    OBJECTIVE To explore the application value of a diagnostic model based on machine learning and SHAP in the diagnosis of severe pneumonia caused by adenovirus infection in children.
    METHODS A total of 562 children with adenovirus infection who were admitted to Wuhan Third Hospital from Mar. 2023 to Apr. 2024 were selected and divided into a non-pneumonia group (n=236) and a pneumonia group (n=326). The pneumonia group was further divided into a training set (n=245) and a validation set (n=81) at a ratio of 3∶1. The training set was further categorized into a severe group (n=90) and a non-severe group (n=155) based on the severity of pneumonia. Multivariate logistic regression analysis was used to identify the risk factors for severe pneumonia in children with adenovirus infection, and collinearity diagnosis was performed. Receiver operating characteristic (ROC) curves were used to validate the predictive performance of the model in both the training and validation sets, and the optimal model was selected. The optimal predictive model was interpreted using SHAP.
    RESULTS Compared with the non-pneumonia group, the pneumonia group had a high proportion of children aged < 2 years, with cough, wheezing, pulmonary consolidation, pleural effusion, mixed infections and allergic history, as well as long hospital stays and fever duration (P < 0.05). After univariate analysis and collinearity diagnosis to exclude confounding factors, the length of hospital stay (OR=1.112), fever duration (OR=1.964), wheezing (OR=2.430), pulmonary consolidation (OR=2.546), mixed infections (OR=2.617) and LDH level (OR=1.613) were identified as risk factors for severe pneumonia in children with adenovirus infection (P < 0.05). Among the eight machine learning models constructed based on these risk factors, the gradient boosting machine (GBM) model demonstrated the best performance in predicting severe pneumonia in children with adenovirus infection, with area under the curve (AUC) of 0.796 and 0.785 in the training and validation sets, respectively. SHAP analysis revealed that the top four contributing characteristics were LDH level, wheezing, fever duration and pulmonary consolidation.
    CONCLUSIONS The GBM model exhibits optimal performance in predicting the risk of severe pneumonia in children with adenovirus infection. Among the predictive characteristics, LDH level, wheezing, fever duration and pulmonary consolidation are significant, providing valuable reference for clinical diagnosis and treatment.

     

/

返回文章
返回