Abstract:
OBJECTIVE To establish the risk prediction model for tuberculous pleuritis (TPE) by machine learning algorithm (MLA).
METHODS A total of 1519 patients with pleural effusion who were treated in Nantong Sixth People's Hospital from Jan. 2020 to Feb. 2025 were enrolled in the study, 1434 patients, including 527 patients with tuberculous pleuritis and 907 patients with non-tuberculous pleuritis, were randomly divided into the training set and the validation set in a 7∶3 ratio. The medical data and 39 indictors of pleural effusion and peripheral blood indicators such as adenosine deaminase (ADA), red blood cell (RBC) and carcino-embryonic antigen (CEA) were randomly divided into the training set and the validation set in a 7∶3 ratio. The potential predictive factors were screened out by means of univariate analysis, Lasso regression, and Boruta feature selection algorithm. The prediction model was established based on 7 types of machine learning. The efficiency of the model in prediction of TPE was assessed by the area under the curve (AUC), sensitivity, specificity, accuracy, F1 score, calibration curve, and decision curve analysis (DCA), and the optimal model was determined. The optimal model was finally interpreted by SHAP, and the action mechanisms of the features and the impact on classifying performance were observed.
RESULTS Totally 10 predictive factors in common were screened out by three types of feature selection methods. Random forest (RF) model performed the best among the 7 types of machine learning, with the AUC (0.906) and F1 score (0.786) the highest. Both the calibration curve and DCA analysis indicated that the model had better performance. The analysis of the characteristic variables of RF model based on SHAP algorithm showed that night sweats, fatigue, fever, age, pleural effusion ADA, pleural effusion RBC, pleural effusion CEA, blood lactate dehydrogenase (LDH), blood neutrophils (NC) and blood ADA were the predictive factors for TPE.
CONCLUSIONS The TPE diagnosis model that is established based on RF algorithm has the best diagnostic performance, and it can identify TPE more conveniently, quickly and effectively.