OBJECTIVE To construct and validate a machine learning-based risk prediction model for peritoneal dialysis-associated peritonitis (PDAP), identify key predictive factors, and provide a decision-making tool for the early identification of high-risk populations in clinical settings.
METHODS A total of 271 patients undergoing peritoneal dialysis at Zhongshan Hospital of Traditional Chinese Medicine from Jan. 2020 to Dec. 2023 were selected. Patients were divided into a PDAP group (73 cases) and a non-PDAP group (198 cases) according to guidelines. The dataset was randomly divided into a training set (204 cases) and a test set (67 cases) at a ratio of 3:1. Hyperparameters were optimized with five-fold cross-validation (repeated three times), and five machine learning models, including Support Vector Machine, Random Forest, Extreme Gradient Boosting, K-Nearest Neighbor and Least Absolute Shrinkage and Selection Operator (LASSO) regression, were constructed, with the model performance evaluated primarily based on the area under the curve (AUC). The optimal model was determined through comprehensive assessment of model efficacy with receiver operating characteristic curves, confusion matrices and decision curves, and was validated with calibration curves.
RESULTS The PDAP group exhibited higher proportions of hypertension and higher blood glucose levels than the non-PDAP group, while levels of phosphorus, uric acid, transferrin saturation, potassium, calcium, albumin, alanine aminotransferase, total cholesterol, low-density lipoprotein cholesterol and total protein were lower in the PDAP group (P<0.05). In the test set, the LASSO model demonstrated the best comprehensive performance, with an AUC of 0.844, sensitivity of 0.611, specificity of 0.959, positive predictive value of 0.846, negative predictive value of 0.870, F1 score of 0.710 and accuracy of 0.866. Additionally, the LASSO model exhibited the highest net benefit in decision curves (threshold probability range: 0.1−0.8). The calibration curve showed a slope of 1.001, an intercept of 0.095 and a Brier score of 0.119. The LASSO model identified 10 important variables, including combined hypertension, total protein, total cholesterol, potassium, alanine aminotransferase, calcium, transferrin saturation, uric acid, phosphorus and albumin levels.
CONCLUSIONS The LASSO regression model demonstrate excellent performance and clinical utility in PDAP risk prediction. The 10 key indicators identified by the model provide a quantitative basis for the early identification and intervention of high-risk patients.