聚类分析在医疗费用数据挖掘中的应用 Application of clustering analysis in medical expenses data mining期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

聚类分析在医疗费用数据挖掘中的应用

引用本文：	沈培,张吉凯. 聚类分析在医疗费用数据挖掘中的应用[J]. 广东卫生防疫, 2012, 0(1): 18-22

作者姓名：	沈培张吉凯

作者单位：	[1]华中科技大学管理学院,湖北武汉430074 [2]广东商学院信息学院 ,湖北武汉430074 [3]广东省疾病预防控制中心,湖北武汉430074

基金项目：	广东省医学科研基金项目（A2009071）

摘要：	目的建立一种预处理方法，在进行医疗费用数据挖掘时，将因变量（呈偏态分布的连续性变量）转换为分类变量，从而得到更加科学合理的研究结果。方法以广东省甲型病毒性肝炎医疗费用调查取得的115例患者为研究对象，分别采用中位数的分类方法和K-means聚类的方法作为预处理方法，对医疗费用这一呈偏态分布的因变量进行分类，然后建立支持向量机数学模型，采用支持向量机进行医疗费用影响因素分析；通过比较模型的预测精度、模型收益以及影响因素的筛选结果，确定最优的预处理方法。结果115例甲肝病人甲肝总住院费用中位数为2744．69元，呈偏态分布。应变量以中位数方法分类，采用支持向量机模型筛选影响因素结果显示，对医疗费用影响最大的有7个变量（前3位为医院等级、性别、疾病类型）；采用聚类分析进行数据预处理时筛选影响因素结果显示，对医疗费用影响最大的有7个变量（前3位为医院等级、住院天数、支付方式）。与中位数方法的分类方法比较，采用聚类分析进行数据预处理时，支持向量机模型结果得到的预测精度由91．30％上升到97．39％；收益图表陡峭地升高到100．00％然后渐渐变得平缓，显示模型收益更好；影响因素筛选结果更加科学合理，符合实际情况。结论聚类分析是一种优秀的数据挖掘预处理方法，具有良好的应用性。
关键词：	数据挖掘聚类分析卫生保健费用
Application of clustering analysis in medical expenses data mining

SHEN Pei,ZHANG Ji-kai. Application of clustering analysis in medical expenses data mining[J]. Guangdong Journal of Health and Epidemic Prevention, 2012, 0(1): 18-22

Authors:	SHEN Pei ZHANG Ji-kai

Affiliation:	(School of Management, Huazhong University of Science and Technology, Wuhan 430074, China)

Abstract:	Objective In the medical expense research, establish a pretreatment method to trans- form the continuous dependent variable to categorical variable to get more reasonable result. Methods Data of 115 patients were obtained from the survey of medical costs for patients with viral hepatitis in Guangdong Province. The classification of the median and K-means clustering method were used as a pretreatment method to classify the skewed distribution dependent variables of medical expenses for hepatitis. Then, a support vector machine mathematical model was established to analyze the influence factors of med- ical expenses by support vector machine. By comparing the forecasting accuracy, model gain, and selection of dependent variables, the optimal pretreatment method was determined. Results The median of medical expenses of hospitalization for 115 patients with viral hepatitis was 2 774. 69 yuan, showing a skewed distri- bution. Using support vector machine model selection influence factors, the result showed that seven varia- bles had greatest impact on medical costs （The top three were hospital level, gender, and disease type. ）. While using cluster analysis as data pretreatment method, the influence factors selection showed that seven variables had greatest impact on the medical expenditure （The top three were hospital level, days of hospi- talization, and payment manner）. Compared to the median classification method, the data mining results of clustering analysis acquired higher forecasting accuracy （from 91.30% to 97. 39% ）, better model gains （the gain chart steep rose to 100% and then gradually became flat. ）, and more reasonable and practical influence factors. Conclusion As a good pretreatment method of data mining, the clustering analysis showed good applicability.

Keywords:	Data mining Cluster analysis Health care costs
本文献已被维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏