首页 | 本学科首页   官方微博 | 高级检索  
检索        

本体支持的生物医学领域元数据异质性与可兼容性研究
引用本文:张璐璐,杨晟,史涪仁,潘虹洁,王志刚,杨啸林.本体支持的生物医学领域元数据异质性与可兼容性研究[J].中国生物医学工程学报,2019,38(3):324-331.
作者姓名:张璐璐  杨晟  史涪仁  潘虹洁  王志刚  杨啸林
作者单位:(中国医学科学院基础医学研究所,北京协和医学院基础学院,北京 100005)
基金项目:中国医学科学院医学与健康科技创新工程项目(2018-I2M-AI-009);国家重点研发计划(2017YFC0908404);“国家人口与健康科学数据共享服务平台基础医学科学数据中心”
摘    要:利用本体支持数据元素的表示,是提升元数据机器可理解性的重要手段。采用生物医学通用数据元素数据库caDSR中的数据,评价相关的数据元素之间的语义异质性,并利用机器学习对元数据可兼容性进行判别。首先,从caDSR 中选取60对通用数据元素,涉及人口学、生活方式、既往病史和实验室测量等方面。依据ISO/IEC 111179标准抽提数据元素的必要组分,利用NCIT的本体支持,就每对关联数据元素的相似度进行评价。依据数据元素内部各组分的语义相似度,利用支持向量机,对数据元素间的可兼容性做出预测,其准确度超过80%。研究结果显示,目前在caDSR数据库中,对于元数据的定义存在较大的异质性,这些异质性在数据元素的概念域尤其集中。虽然如此,通过机器学习的方法,还是能够依据现有的数据元素的定义实现数据可兼容性的自动判断。研究所建立的方法,对于优化数据元素构建流程、丰富数据标准化工具具有一定的应用价值。

关 键 词:元数据  异质性  通用数据元素  本体  支持向量机  
收稿时间:2019-03-14

Research on Heterogeneity and Compatibility of Biomedical Field Metadata Supported by Ontology
Zhang Lulu,Yang Sheng,Shi Furen,Pan Hongjie,Wang Zhigang,Yang Xiaolin.Research on Heterogeneity and Compatibility of Biomedical Field Metadata Supported by Ontology[J].Chinese Journal of Biomedical Engineering,2019,38(3):324-331.
Authors:Zhang Lulu  Yang Sheng  Shi Furen  Pan Hongjie  Wang Zhigang  Yang Xiaolin
Institution:(Institute of Basic Medical Sciences,Chinese Academy of Medical Sciences,School of Basic Medicine,Peking Union Medical College, Beijing 100005,China)
Abstract:Using ontologies to support the representation of data elements is an important means to improve the machine′s understanding of metadata. In this paper, we evaluated the semantic heterogeneity of data elements in caDSR and assessed two related data elements integration ability. First, 60 pairs of common data elements were selected from caDSR, covering demography, lifestyle, medical history and laboratory measurements. Next, the essential components of data elements were extracted according to the ISO/IEC11179 standard and the similarity of these essential components between every pair of data elements with the support of NCIT was calculated. At last, the compatibility between related data elements was predicted by using SVM based on the semantic similarity between corresponding CDE components. The overall accuracy was above 80%. The results showed that there was currently considerable heterogeneity in the definition of metadata in the caDSR database, especially in the conceptual domain of data elements. Nevertheless, our method still could realize the automatic judgement of data compatibility based on the definition of existing data elements by the help of machine learning. The method established in this study has a certain value for optimizing data element construction process and enriching data standardization tools.
Keywords:metadata  heterogeneity  common data element  ontology  supporting vector machine(SVM)  
本文献已被 CNKI 等数据库收录!
点击此处可从《中国生物医学工程学报》浏览原始摘要信息
点击此处可从《中国生物医学工程学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号