Abstract:
OBJECTIVE To develop a machine learning-based model that integrates multi-dimensional clinical data to discriminate between culture-negative pulmonary tuberculosis and bacterial pneumonia, thereby providing an objective and reliable auxiliary diagnostic tool for clinical practice.
METHODS A total of 400 patients (280 with culture-negative pulmonary tuberculosis and 120 with bacterial pneumonia) treated at Tianjin Haihe Hospital from Dec. 2023 to Dec. 2024 were enrolled. Demographic characteristics, clinical symptoms, imaging findings and laboratory indicators were systematically collected. Potential predictive variables were screened through univariate analysis, and independent predictors were identified by multivariate logistic regression. Based on the screening results, four machine learning models—logistic regression, random forest, XGBoost and support vector machine—were developed. Hyperparameters were optimized via 5-fold cross-validation, and the dataset was split into a training set (70%) and a validation set (30%). The discriminative performance of the models was evaluated by accuracy, sensitivity, specificity and area under the curve (AUC). Model calibration and stability were assessed through calibration curves and the distribution of predicted probabilities.
RESULTS Univariate analysis and multivariate logistic regression analysis identified six independent predictors. Among them, positivity of the T-SPOT test for tuberculosis infection (T-SPOT.TB) (OR=86.974), upper lobe involvement (OR=48.462), cavity formation (OR=7.271) and weight loss (OR=7.389) were risk factors for culture-negative pulmonary tuberculosis, while elevated procalcitonin (PCT) levels (OR=0.007) and purulent sputum production (OR=0.056) were predictive factors for bacterial pneumonia. The XGBoost model, developed based on these factors, achieved the best performance on the validation set, with an AUC of 1.000, an accuracy of 99.22%, a sensitivity of 99.01% and a specificity of 99.32%. It also demonstrated excellent calibration and generalization ability.
CONCLUSIONS In this study, a machine learning model—particularly the XGBoost model—was developed based on T-SPOT.TB, upper lobe involvement, elevated PCT levels, cavity formation, weight loss and purpurulent sputum. The model demonstrated excellent discriminative performance and robust generalization ability in differentiating between culture-negative pulmonary tuberculosis and bacterial pneumonia, thereby providing a reliable auxiliary diagnostic tool for clinicians.