Abstract:
OBJECTIVE To explore the application value of a diagnostic model based on machine learning and SHAP in the diagnosis of severe pneumonia caused by adenovirus infection in children.
METHODS A total of 562 children with adenovirus infection who were admitted to Wuhan Third Hospital from Mar. 2023 to Apr. 2024 were selected and divided into a non-pneumonia group (n=236) and a pneumonia group (n=326). The pneumonia group was further divided into a training set (n=245) and a validation set (n=81) at a ratio of 3∶1. The training set was further categorized into a severe group (n=90) and a non-severe group (n=155) based on the severity of pneumonia. Multivariate logistic regression analysis was used to identify the risk factors for severe pneumonia in children with adenovirus infection, and collinearity diagnosis was performed. Receiver operating characteristic (ROC) curves were used to validate the predictive performance of the model in both the training and validation sets, and the optimal model was selected. The optimal predictive model was interpreted using SHAP.
RESULTS Compared with the non-pneumonia group, the pneumonia group had a high proportion of children aged < 2 years, with cough, wheezing, pulmonary consolidation, pleural effusion, mixed infections and allergic history, as well as long hospital stays and fever duration (P < 0.05). After univariate analysis and collinearity diagnosis to exclude confounding factors, the length of hospital stay (OR=1.112), fever duration (OR=1.964), wheezing (OR=2.430), pulmonary consolidation (OR=2.546), mixed infections (OR=2.617) and LDH level (OR=1.613) were identified as risk factors for severe pneumonia in children with adenovirus infection (P < 0.05). Among the eight machine learning models constructed based on these risk factors, the gradient boosting machine (GBM) model demonstrated the best performance in predicting severe pneumonia in children with adenovirus infection, with area under the curve (AUC) of 0.796 and 0.785 in the training and validation sets, respectively. SHAP analysis revealed that the top four contributing characteristics were LDH level, wheezing, fever duration and pulmonary consolidation.
CONCLUSIONS The GBM model exhibits optimal performance in predicting the risk of severe pneumonia in children with adenovirus infection. Among the predictive characteristics, LDH level, wheezing, fever duration and pulmonary consolidation are significant, providing valuable reference for clinical diagnosis and treatment.