Abstract: | To tune and test the generalizability of a deep learning-based model for assessment of COVID-19 lung disease severity on chest radiographs (CXRs) from different patient populations.A published convolutional Siamese neural network-based model previously trained on hospitalized patients with COVID-19 was tuned using 250 outpatient CXRs. This model produces a quantitative measure of COVID-19 lung disease severity (pulmonary x-ray severity (PXS) score). The model was evaluated on CXRs from 4 test sets, including 3 from the United States (patients hospitalized at an academic medical center (N = 154), patients hospitalized at a community hospital (N = 113), and outpatients (N = 108)) and 1 from Brazil (patients at an academic medical center emergency department (N = 303)). Radiologists from both countries independently assigned reference standard CXR severity scores, which were correlated with the PXS scores as a measure of model performance (Pearson R). The Uniform Manifold Approximation and Projection (UMAP) technique was used to visualize the neural network results.Tuning the deep learning model with outpatient data showed high model performance in 2 United States hospitalized patient datasets (R = 0.88 and R = 0.90, compared to baseline R = 0.86). Model performance was similar, though slightly lower, when tested on the United States outpatient and Brazil emergency department datasets (R = 0.86 and R = 0.85, respectively). UMAP showed that the model learned disease severity information that generalized across test sets.A deep learning model that extracts a COVID-19 severity score on CXRs showed generalizable performance across multiple populations from 2 continents, including outpatients and hospitalized patients. |