首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.

Objective

Concept extraction is a process to identify phrases referring to concepts of interests in unstructured text. It is a critical component in automated text processing. We investigate the performance of machine learning taggers for clinical concept extraction, particularly the portability of taggers across documents from multiple data sources.

Methods

We used BioTagger-GM to train machine learning taggers, which we originally developed for the detection of gene/protein names in the biology domain. Trained taggers were evaluated using the annotated clinical documents made available in the 2010 i2b2/VA Challenge workshop, consisting of documents from four data sources.

Results

As expected, performance of a tagger trained on one data source degraded when evaluated on another source, but the degradation of the performance varied depending on data sources. A tagger trained on multiple data sources was robust, and it achieved an F score as high as 0.890 on one data source. The results also suggest that performance of machine learning taggers is likely to improve if more annotated documents are available for training.

Conclusion

Our study shows how the performance of machine learning taggers is degraded when they are ported across clinical documents from different sources. The portability of taggers can be enhanced by training on datasets from multiple sources. The study also shows that BioTagger-GM can be easily extended to detect clinical concept mentions with good performance.  相似文献   

2.
电子病历和电子健康档案是医院信息系统重要的两个组成部分。电子健康档案是电子病历的高级形式,有着电子病历无法替代的作用。它兼容患者医疗及个人健康保健、家庭健康档案、公共卫生信息、慢性病随访记录等信息,并实现居民健康档案多档合一。它实现了不同医疗机构的信息数据交流、利用更新及交互应用;它可提高信息的使用率,是社区卫生服务的依据、全科医疗的工具,在区域卫生信息化建设中起着重要作用。  相似文献   

3.
Objectives Drug repurposing, which finds new indications for existing drugs, has received great attention recently. The goal of our work is to assess the feasibility of using electronic health records (EHRs) and automated informatics methods to efficiently validate a recent drug repurposing association of metformin with reduced cancer mortality.Methods By linking two large EHRs from Vanderbilt University Medical Center and Mayo Clinic to their tumor registries, we constructed a cohort including 32 415 adults with a cancer diagnosis at Vanderbilt and 79 258 cancer patients at Mayo from 1995 to 2010. Using automated informatics methods, we further identified type 2 diabetes patients within the cancer cohort and determined their drug exposure information, as well as other covariates such as smoking status. We then estimated HRs for all-cause mortality and their associated 95% CIs using stratified Cox proportional hazard models. HRs were estimated according to metformin exposure, adjusted for age at diagnosis, sex, race, body mass index, tobacco use, insulin use, cancer type, and non-cancer Charlson comorbidity index.Results Among all Vanderbilt cancer patients, metformin was associated with a 22% decrease in overall mortality compared to other oral hypoglycemic medications (HR 0.78; 95% CI 0.69 to 0.88) and with a 39% decrease compared to type 2 diabetes patients on insulin only (HR 0.61; 95% CI 0.50 to 0.73). Diabetic patients on metformin also had a 23% improved survival compared with non-diabetic patients (HR 0.77; 95% CI 0.71 to 0.85). These associations were replicated using the Mayo Clinic EHR data. Many site-specific cancers including breast, colorectal, lung, and prostate demonstrated reduced mortality with metformin use in at least one EHR.Conclusions EHR data suggested that the use of metformin was associated with decreased mortality after a cancer diagnosis compared with diabetic and non-diabetic cancer patients not on metformin, indicating its potential as a chemotherapeutic regimen. This study serves as a model for robust and inexpensive validation studies for drug repurposing signals using EHR data.  相似文献   

4.

Objective

To develop a system to extract follow-up information from radiology reports. The method may be used as a component in a system which automatically generates follow-up information in a timely fashion.

Methods

A novel method of combining an LSP (labeled sequential pattern) classifier with a CRF (conditional random field) recognizer was devised. The LSP classifier filters out irrelevant sentences, while the CRF recognizer extracts follow-up and time phrases from candidate sentences presented by the LSP classifier.

Measurements

The standard performance metrics of precision (P), recall (R), and F measure (F) in the exact and inexact matching settings were used for evaluation.

Results

Four experiments conducted using 20 000 radiology reports showed that the CRF recognizer achieved high performance without time-consuming feature engineering and that the LSP classifier further improved the performance of the CRF recognizer. The performance of the current system is P=0.90, R=0.86, F=0.88 in the exact matching setting and P=0.98, R=0.93, F=0.95 in the inexact matching setting.

Conclusion

The experiments demonstrate that the system performs far better than a baseline rule-based system and is worth considering for deployment trials in an alert generation system. The LSP classifier successfully compensated for the inherent weakness of CRF, that is, its inability to use global information.  相似文献   

5.
6.

Objective

Despite at least 40 years of promising empirical performance, very few clinical natural language processing (NLP) or information extraction systems currently contribute to medical science or care. The authors address this gap by reducing the need for custom software and rules development with a graphical user interface-driven, highly generalizable approach to concept-level retrieval.

Materials and methods

A ‘learn by example’ approach combines features derived from open-source NLP pipelines with open-source machine learning classifiers to automatically and iteratively evaluate top-performing configurations. The Fourth i2b2/VA Shared Task Challenge''s concept extraction task provided the data sets and metrics used to evaluate performance.

Results

Top F-measure scores for each of the tasks were medical problems (0.83), treatments (0.82), and tests (0.83). Recall lagged precision in all experiments. Precision was near or above 0.90 in all tasks.

Discussion

With no customization for the tasks and less than 5 min of end-user time to configure and launch each experiment, the average F-measure was 0.83, one point behind the mean F-measure of the 22 entrants in the competition. Strong precision scores indicate the potential of applying the approach for more specific clinical information extraction tasks. There was not one best configuration, supporting an iterative approach to model creation.

Conclusion

Acceptable levels of performance can be achieved using fully automated and generalizable approaches to concept-level information extraction. The described implementation and related documentation is available for download.  相似文献   

7.

Objective

A supervised machine learning approach to discover relations between medical problems, treatments, and tests mentioned in electronic medical records.

Materials and methods

A single support vector machine classifier was used to identify relations between concepts and to assign their semantic type. Several resources such as Wikipedia, WordNet, General Inquirer, and a relation similarity metric inform the classifier.

Results

The techniques reported in this paper were evaluated in the 2010 i2b2 Challenge and obtained the highest F1 score for the relation extraction task. When gold standard data for concepts and assertions were available, F1 was 73.7, precision was 72.0, and recall was 75.3. F1 is defined as 2*Precision*Recall/(Precision+Recall). Alternatively, when concepts and assertions were discovered automatically, F1 was 48.4, precision was 57.6, and recall was 41.7.

Discussion

Although a rich set of features was developed for the classifiers presented in this paper, little knowledge mining was performed from medical ontologies such as those found in UMLS. Future studies should incorporate features extracted from such knowledge sources, which we expect to further improve the results. Moreover, each relation discovery was treated independently. Joint classification of relations may further improve the quality of results. Also, joint learning of the discovery of concepts, assertions, and relations may also improve the results of automatic relation extraction.

Conclusion

Lexical and contextual features proved to be very important in relation extraction from medical texts. When they are not available to the classifier, the F1 score decreases by 3.7%. In addition, features based on similarity contribute to a decrease of 1.1% when they are not available.  相似文献   

8.
病案是诊疗过程中具有法律效力的文字记录,也是各个医疗单位的重要档案资料,随着网络信息技术的不断发展,在医院信息系统已经广泛应用电子病案,与纸质病案比较,电子病案的应用和发展是一个非常大的进步。但目前,我国不能实现真正无纸化电子病案,纸质病案不能被完全的电子病案所取代,在今后一定时间内,电子病案与纸质病案将共同存在。本文分析了纸质病案的优劣势、电子病案的优劣势、纸质病案与电子病案并存的优劣势,并阐明随着医院不断深入发展信息化建设及计算机技术发展的日新月异,电子病案的应用成为医院医疗信息化发展的必然趋势,我们要将病案管理的观念更新,使病案管理的能力显著增强。  相似文献   

9.
目的提高社区卫生服务电子病历的质量,规范社区卫生服务的病历、处方书写和知情同意书的签署等医疗行为,提高医疗质量。方法抽查2006年6月~2007年3月北京市东城区94名社区医生的电子病历,按电子病历的质量控制流程图进行质量控制。结果质量控制后,诊疗记录的合格率从88.55%提高至98.86%,电子处方合格率从71.87%提高至94.30%,知情同意书规范签署率从51.85%提高至94.91%。结论电子病历的质量控制有利于规范社区卫生服务工作中的病历、处方书写和知情同意书的签署等医疗行为。  相似文献   

10.
ObjectiveSubstance use screening in adolescence is unstandardized and often documented in clinical notes, rather than in structured electronic health records (EHRs). The objective of this study was to integrate logic rules with state-of-the-art natural language processing (NLP) and machine learning technologies to detect substance use information from both structured and unstructured EHR data.Materials and MethodsPediatric patients (10-20 years of age) with any encounter between July 1, 2012, and October 31, 2017, were included (n = 3890 patients; 19 478 encounters). EHR data were extracted at each encounter, manually reviewed for substance use (alcohol, tobacco, marijuana, opiate, any use), and coded as lifetime use, current use, or family use. Logic rules mapped structured EHR indicators to screening results. A knowledge-based NLP system and a deep learning model detected substance use information from unstructured clinical narratives. System performance was evaluated using positive predictive value, sensitivity, negative predictive value, specificity, and area under the receiver-operating characteristic curve (AUC).ResultsThe dataset included 17 235 structured indicators and 27 141 clinical narratives. Manual review of clinical narratives captured 94.0% of positive screening results, while structured EHR data captured 22.0%. Logic rules detected screening results from structured data with 1.0 and 0.99 for sensitivity and specificity, respectively. The knowledge-based system detected substance use information from clinical narratives with 0.86, 0.79, and 0.88 for AUC, sensitivity, and specificity, respectively. The deep learning model further improved detection capacity, achieving 0.88, 0.81, and 0.85 for AUC, sensitivity, and specificity, respectively. Finally, integrating predictions from structured and unstructured data achieved high detection capacity across all cases (0.96, 0.85, and 0.87 for AUC, sensitivity, and specificity, respectively).ConclusionsIt is feasible to detect substance use screening and results among pediatric patients using logic rules, NLP, and machine learning technologies.  相似文献   

11.
实现病案无纸化的必要性和可能性   总被引:2,自引:1,他引:1  
病案实现无纸化是病案管理的发展方向。本文通过对病案管理现状的分析,对比病历无纸化的优势,阐述病案无纸化发展的瓶颈和解决问题依据,得出结论:病案无纸化将成为未来卫生信息管理的必然趋势,具有广阔的发展前景。  相似文献   

12.
Xu Y  Liu J  Wu J  Wang Y  Tu Z  Sun JT  Tsujii J  Chang EI 《J Am Med Inform Assoc》2012,19(5):897-905

Objective

To create a highly accurate coreference system in discharge summaries for the 2011 i2b2 challenge. The coreference categories include Person, Problem, Treatment, and Test.

Design

An integrated coreference resolution system was developed by exploiting Person attributes, contextual semantic clues, and world knowledge. It includes three subsystems: Person coreference system based on three Person attributes, Problem/Treatment/Test system based on numerous contextual semantic extractors and world knowledge, and Pronoun system based on a multi-class support vector machine classifier. The three Person attributes are patient, relative and hospital personnel. Contextual semantic extractors include anatomy, position, medication, indicator, temporal, spatial, section, modifier, equipment, operation, and assertion. The world knowledge is extracted from external resources such as Wikipedia.

Measurements

Micro-averaged precision, recall and F-measure in MUC, BCubed and CEAF were used to evaluate results.

Results

The system achieved an overall micro-averaged precision, recall and F-measure of 0.906, 0.925, and 0.915, respectively, on test data (from four hospitals) released by the challenge organizers. It achieved a precision, recall and F-measure of 0.905, 0.920 and 0.913, respectively, on test data without Pittsburgh data. We ranked the first out of 20 competing teams. Among the four sub-tasks on Person, Problem, Treatment, and Test, the highest F-measure was seen for Person coreference.

Conclusions

This system achieved encouraging results. The Person system can determine whether personal pronouns and proper names are coreferent or not. The Problem/Treatment/Test system benefits from both world knowledge in evaluating the similarity of two mentions and contextual semantic extractors in identifying semantic clues. The Pronoun system can automatically detect whether a Pronoun mention is coreferent to that of the other four types. This study demonstrates that it is feasible to accomplish the coreference task in discharge summaries.  相似文献   

13.
郭煜 《基层医学论坛》2013,(11):1361-1362
目的通过电子病历和传统手写病历质量的比较分析,探讨电子病历运行过程中存在的问题并评价应用效果。方法随机抽取某院2012年1月—6月电子病历600份以及该院电子病历实施前半年手写病历600份进行对照分析,比较分析两种病历的甲级病案率和各环节的质量评分。结果电子病历缺陷率高于手写病历,甲级病案率低于手写病历。电子病历的入院记录、病程记录的质量评分显著低于手写病历;书写基本要求环节的评分高于手写病历。结论电子病历虽然提高了书写的规范性和统一性,但更易出现操作失误,应加强网上实时监控、医生相关计算机知识的培训和医务人员对易犯错误的认识。  相似文献   

14.
目的通过电子病历和手写病历的比较分析,探讨电子病历运行过程中存在的问题及对策。方法本研究随机抽取1 200份电子病历作为试验组,同时随机抽取1 200份手写病历作为对照组,汇总缺陷项目,分析比较2组缺陷率的差异。结果电子病历缺陷率明显高于手写病历(P<0.05),项目记录缺项或不全、项目记录内容方面电子病历缺陷率要明显高于手写病历(P<0.05),记录时间缺陷率电子病历低于手写病历(P<0.05)。结论电子病历能够提高时限性病程记录的完成率,但更易出现漏填、内容不清楚或错误情况,应加强医生的相关培训和监督。  相似文献   

15.
目的探讨建立数字化病案管理网络,从中挖掘和提炼出丰富的信息用于决策和科研教学,达到广泛应用病案信息的目的。方法通过对纸质病案数字化、电子病案的规范管理,开发数字化病案管理网络体系及构建网络利用信息平台,使病案信息管理系统与全院内部网络之间完美结合。结果建立了纸质病案数字化管理系统及电子病案管理模式,使病案的形成过程、存储方式及使用方法都符合管理规范,力求达到病案数字化存储、安全可靠、资料完整、利用快捷方便、信息资源网上共享等要求,能够满足医院信息化对病案管理的需求。结论使用信息化手段可以提升医院病案管理水平。  相似文献   

16.
电子病案管理系统的实施与应用   总被引:3,自引:0,他引:3  
指出传统病案管理模式存在的问题,从新旧病案归档整理、电子病案的使用等方面介绍电子病案管理系统的实施,阐明实施电子病案管理的意义,包括节省存储空间,提高检索效率、病案质量与规范管理等。  相似文献   

17.
简哲  李燕 《医学信息学杂志》2016,37(12):10-13,21
分析自然语言处理在医学领域应用存在障碍的原因,提出电子病历自然语言处理测评的方法,介绍历年来有关电子病历自然语言处理测评内容及其发展情况,包括文本检索会议、医学自然语言处理测评、SHARe/CLEF测评、I2B2测评等。  相似文献   

18.
Objective The trade-off between the speed and simplicity of dictionary-based term recognition and the richer linguistic information provided by more advanced natural language processing (NLP) is an area of active discussion in clinical informatics. In this paper, we quantify this trade-off among text processing systems that make different trade-offs between speed and linguistic understanding. We tested both types of systems in three clinical research tasks: phase IV safety profiling of a drug, learning adverse drug–drug interactions, and learning used-to-treat relationships between drugs and indications.Materials We first benchmarked the accuracy of the NCBO Annotator and REVEAL in a manually annotated, publically available dataset from the 2008 i2b2 Obesity Challenge. We then applied the NCBO Annotator and REVEAL to 9 million clinical notes from the Stanford Translational Research Integrated Database Environment (STRIDE) and used the resulting data for three research tasks.Results There is no significant difference between using the NCBO Annotator and REVEAL in the results of the three research tasks when using large datasets. In one subtask, REVEAL achieved higher sensitivity with smaller datasets.Conclusions For a variety of tasks, employing simple term recognition methods instead of advanced NLP methods results in little or no impact on accuracy when using large datasets. Simpler dictionary-based methods have the advantage of scaling well to very large datasets. Promoting the use of simple, dictionary-based methods for population level analyses can advance adoption of NLP in practice.  相似文献   

19.

Objective

A system that translates narrative text in the medical domain into structured representation is in great demand. The system performs three sub-tasks: concept extraction, assertion classification, and relation identification.

Design

The overall system consists of five steps: (1) pre-processing sentences, (2) marking noun phrases (NPs) and adjective phrases (APs), (3) extracting concepts that use a dosage-unit dictionary to dynamically switch two models based on Conditional Random Fields (CRF), (4) classifying assertions based on voting of five classifiers, and (5) identifying relations using normalized sentences with a set of effective discriminating features.

Measurements

Macro-averaged and micro-averaged precision, recall and F-measure were used to evaluate results.

Results

The performance is competitive with the state-of-the-art systems with micro-averaged F-measure of 0.8489 for concept extraction, 0.9392 for assertion classification and 0.7326 for relation identification.

Conclusions

The system exploits an array of common features and achieves state-of-the-art performance. Prudent feature engineering sets the foundation of our systems. In concept extraction, we demonstrated that switching models, one of which is especially designed for telegraphic sentences, improved extraction of the treatment concept significantly. In assertion classification, a set of features derived from a rule-based classifier were proven to be effective for the classes such as conditional and possible. These classes would suffer from data scarcity in conventional machine-learning methods. In relation identification, we use two-staged architecture, the second of which applies pairwise classifiers to possible candidate classes. This architecture significantly improves performance.  相似文献   

20.
目的研究电子病案系统的实施对医院病案质量的影响。方法根据质控医师对全院病案的定期检查,选取15个临床科室的病案缺陷率,采用SAS软件的重复测量分析方法对实施电子病案前后四年病案质量缺陷率进行分析。结果电子病案的实施初期导致医院总体病案缺陷率有所上升,影响因素主要有以下五点,(1)不断有新来院实习医师及进修医师书写病案;(2)医师未养成打印病案后及时签名的习惯,未签字缺陷占2012年全年病案缺陷比例的28.5%;(3)病案中必填项目未填写,该缺陷占2012年全年病案缺陷比例的22.1%,而上述两项缺陷在电子病案未实施前基本不存在;(4)电子病案暴露出的病案缺陷,易于质控,2012年病案内涵质量的缺陷比例占该年病案缺陷的13.4%,而相对比未实施电子病案的2010年和2009年,该类缺陷比例只占8.3%和6.5%。(5)质控医师水平的不断提高及质控范围的不断扩大,增强了麻醉记录,输血单,输血记录,消化内镜检查报告,支气管镜检查报告等部分的质控,该部分检查出的病案缺陷占2012年全年病案缺陷的2.5%;结论电子病案的实施有利于医院病案质量的监控及持续改进。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号