基于临床及CT特征构建预测肺浸润性黏液腺癌的机器学习模型 |
| |
引用本文: | 张俊杰,郝李刚,许茜,冯会,张宁,时高峰. 基于临床及CT特征构建预测肺浸润性黏液腺癌的机器学习模型[J]. 中华全科医学, 2023, 21(1): 6-9. DOI: 10.16766/j.cnki.issn.1674-4152.002799 |
| |
作者姓名: | 张俊杰 郝李刚 许茜 冯会 张宁 时高峰 |
| |
作者单位: | 1.河北医科大学第四医院CT磁共振科, 河北 石家庄 050011 |
| |
基金项目: | 国家重点研发计划课题2018YFC0116404河北省邢台市重点研发计划项目ZC20301-健康医疗领域 |
| |
摘 要: | 目的 肺黏液腺癌是一种罕见的肺癌亚型,存在独特的分子生物学特征,并影响治疗方案的选择。本研究拟通过建立浸润性黏液腺癌的机器学习模型来提高治疗前黏液腺癌诊断的准确性。 方法 回顾性分析河北医科大学第四医院在2017年1月—2022年5月期间经穿刺活检或手术病理证实的620例肺浸润性腺癌患者资料。采用倾向性评分匹配法(PSM)进行1 : 1匹配后按7 : 3比例将患者随机分为训练集和测试集, 应用具有统计学差异的变量构建支持向量机(SVM)、随机森林(RF)及逻辑回归(LR)3种机器学习模型, 并通过AUC值选择最优模型。通过5折交叉验证方法分析最优机器学习模型AUC值及绘制决策曲线分析(decision curve analysis, DCA)曲线, 并构建诺莫图。 结果 结果显示病灶位于下叶、囊腔、支气管截断征及ΔCTV值是浸润性黏液性腺癌的独立预测因素。将以上4个特征通过机器学习构建预测模型并进行模型比较, 最终显示逻辑回归模型(AUC = 0. 801)为最优模型。将285例随机抽取30%为测试集(85例), 剩余样本作为训练集进行5折交叉验证, 逻辑回归模型在验证集中得到AUC为0. 777, 测试集中的AUC为0. 785, 准确度为0. 682, 训练集中的AUC为0. 803, 准确度为0. 749。最终构建逻辑回归模型的诺莫图, 模型校准曲线中的Briser Score为0. 149, 且绘制的DCA曲线同样显示该模型具有良好的预测能力及稳定性。 结论 通过对基于临床及CT特征的机器学习模型的分析, 构建了原发性肺浸润性黏液性腺癌的临床预测模型, 该模型具有潜在指导临床诊断的作用。
|
关 键 词: | 原发性肺癌 黏液性腺癌 CT特征 诊断 模型 机器学习 |
收稿时间: | 2022-10-11 |
CT-derived model for the diagnosis of pulmonary invasive mucinous adenocarcinoma by machine learning |
| |
Affiliation: | Department of Computed Tomography and Magnetic Resonance, Hebei Medical University Fourth Affiliated Hospital, Shijiazhuang, Hebei 050011, China |
| |
Abstract: | Objective Lung mucinous adenocarcinoma is a rare subtype of lung cancer with unique molecular biology characteristics. It influences the choice of treatment options. We explore a machine learning model based on clinical and CT features in the diagnosis of lung invasive mucinous adenocarcinoma, propose to improve the diagnostic accuracy of pre-treatment mucinous adenocarcinoma. Methods A retrospective analysis of 620 cases with pulmonary invasive adenocarcinoma confirmed by needle biopsy or surgical pathology in the Fourth Hospital of Hebei Medical University from January 2017 to May 2022 was performed. After matching by using the propensity score matching (PSM) with a matching ratio 1 : 1, the patients were randomly divided into the training set and the test set based on the 7 : 3 ratio. Three machine learning models, namely, support vector machine (SVM), random forest (RF) and logistic regression (LR), were constructed using the variables with statistical differences, and the optimal model was selected by AUC values. The AUC value of the optimal machine learning model was analysed by 5-fold cross-validation method, the DCA curve was drawn to evaluate the diagnostic efficiency of the constructed model, and a Nomogram is constructed. Results Analysis showed that lesion location in the lower lobe, cystic lumen, bronchial truncation and ΔCTV value were independent predictive factors for invasive mucinous adenocarcinoma. The 4 above mentioned features were constructed by machine learning, and the prediction model was compared. Finally, the logistic regression model (AUC=0.801) was shown to be the optimal model. 30% of 285 cases were randomly selected as the test set (n=85 cases), and the remaining samples were used as the training set for 5-fold cross-validation. The logistic regression model obtained AUC of 0.777 in the validation set, AUC of 0.785 in the test set, accuracy of 0.682, AUC of 0.803 in the training set and accuracy of 0.749. Finally, the Nomogram of the logistic regression model was constructed. The Briser Score in the calibration curve of the model was 0.149, and the DCA curve also showed that the model had good predictive ability and stability in potential clinic application. Conclusion By using machine learning models based on clinical and CT features, a clinical prediction model for primary pulmonary invasive mucinous adenocarcinoma was constructed, which has a potential role in guiding clinical diagnosis. |
| |
Keywords: | |
|
| 点击此处可从《中华全科医学》浏览原始摘要信息 |
|
点击此处可从《中华全科医学》下载免费的PDF全文 |
|