首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Objective

De-identified medical records are critical to biomedical research. Text de-identification software exists, including “resynthesis” components that replace real identifiers with synthetic identifiers. The goal of this research is to evaluate the effectiveness and examine possible bias introduced by resynthesis on de-identification software.

Design

We evaluated the open-source MITRE Identification Scrubber Toolkit, which includes a resynthesis capability, with clinical text from Vanderbilt University Medical Center patient records. We investigated four record classes from over 500 patients'' files, including laboratory reports, medication orders, discharge summaries and clinical notes. We trained and tested the de-identification tool on real and resynthesized records.

Measurements

We measured performance in terms of precision, recall, F-measure and accuracy for the detection of protected health identifiers as designated by the HIPAA Safe Harbor Rule.

Results

The de-identification tool was trained and tested on a collection of real and resynthesized Vanderbilt records. Results for training and testing on the real records were 0.990 accuracy and 0.960 F-measure. The results improved when trained and tested on resynthesized records with 0.998 accuracy and 0.980 F-measure but deteriorated moderately when trained on real records and tested on resynthesized records with 0.989 accuracy 0.862 F-measure. Moreover, the results declined significantly when trained on resynthesized records and tested on real records with 0.942 accuracy and 0.728 F-measure.

Conclusion

The de-identification tool achieves high accuracy when training and test sets are homogeneous (ie, both real or resynthesized records). The resynthesis component regularizes the data to make them less “realistic,” resulting in loss of performance particularly when training on resynthesized data and testing on real data.  相似文献   

2.

Objective

To describe a new medication information extraction system—Textractor—developed for the ‘i2b2 medication extraction challenge’. The development, functionalities, and official evaluation of the system are detailed.

Design

Textractor is based on the Apache Unstructured Information Management Architecture (UMIA) framework, and uses methods that are a hybrid between machine learning and pattern matching. Two modules in the system are based on machine learning algorithms, while other modules use regular expressions, rules, and dictionaries, and one module embeds MetaMap Transfer.

Measurements

The official evaluation was based on a reference standard of 251 discharge summaries annotated by all teams participating in the challenge. The metrics used were recall, precision, and the F1-measure. They were calculated with exact and inexact matches, and were averaged at the level of systems and documents.

Results

The reference metric for this challenge, the system-level overall F1-measure, reached about 77% for exact matches, with a recall of 72% and a precision of 83%. Performance was the best with route information (F1-measure about 86%), and was good for dosage and frequency information, with F1-measures of about 82–85%. Results were not as good for durations, with F1-measures of 36–39%, and for reasons, with F1-measures of 24–27%.

Conclusion

The official evaluation of Textractor for the i2b2 medication extraction challenge demonstrated satisfactory performance. This system was among the 10 best performing systems in this challenge.  相似文献   

3.
Objective Many tasks in natural language processing utilize lexical pattern-matching techniques, including information extraction (IE), negation identification, and syntactic parsing. However, it is generally difficult to derive patterns that achieve acceptable levels of recall while also remaining highly precise.Materials and Methods We present a multiple sequence alignment (MSA)-based technique that automatically generates patterns, thereby leveraging language usage to determine the context of words that influence a given target. MSAs capture the commonalities among word sequences and are able to reveal areas of linguistic stability and variation. In this way, MSAs provide a systemic approach to generating lexical patterns that are generalizable, which will both increase recall levels and maintain high levels of precision.Results The MSA-generated patterns exhibited consistent F1-, F.5-, and F2- scores compared to two baseline techniques for IE across four different tasks. Both baseline techniques performed well for some tasks and less well for others, but MSA was found to consistently perform at a high level for all four tasks.Discussion The performance of MSA on the four extraction tasks indicates the method’s versatility. The results show that the MSA-based patterns are able to handle the extraction of individual data elements as well as relations between two concepts without the need for large amounts of manual intervention.Conclusion We presented an MSA-based framework for generating lexical patterns that showed consistently high levels of both performance and recall over four different extraction tasks when compared to baseline methods.  相似文献   

4.

Objective

To determine how well statistical text mining (STM) models can identify falls within clinical text associated with an ambulatory encounter.

Materials and Methods

2241 patients were selected with a fall-related ICD-9-CM E-code or matched injury diagnosis code while being treated as an outpatient at one of four sites within the Veterans Health Administration. All clinical documents within a 48-h window of the recorded E-code or injury diagnosis code for each patient were obtained (n=26 010; 611 distinct document titles) and annotated for falls. Logistic regression, support vector machine, and cost-sensitive support vector machine (SVM-cost) models were trained on a stratified sample of 70% of documents from one location (dataset Atrain) and then applied to the remaining unseen documents (datasets Atest–D).

Results

All three STM models obtained area under the receiver operating characteristic curve (AUC) scores above 0.950 on the four test datasets (Atest–D). The SVM-cost model obtained the highest AUC scores, ranging from 0.953 to 0.978. The SVM-cost model also achieved F-measure values ranging from 0.745 to 0.853, sensitivity from 0.890 to 0.931, and specificity from 0.877 to 0.944.

Discussion

The STM models performed well across a large heterogeneous collection of document titles. In addition, the models also generalized across other sites, including a traditionally bilingual site that had distinctly different grammatical patterns.

Conclusions

The results of this study suggest STM-based models have the potential to improve surveillance of falls. Furthermore, the encouraging evidence shown here that STM is a robust technique for mining clinical documents bodes well for other surveillance-related topics.  相似文献   

5.

Objective

This paper presents Lancet, a supervised machine-learning system that automatically extracts medication events consisting of medication names and information pertaining to their prescribed use (dosage, mode, frequency, duration and reason) from lists or narrative text in medical discharge summaries.

Design

Lancet incorporates three supervised machine-learning models: a conditional random fields model for tagging individual medication names and associated fields, an AdaBoost model with decision stump algorithm for determining which medication names and fields belong to a single medication event, and a support vector machines disambiguation model for identifying the context style (narrative or list).

Measurements

The authors, from the University of Wisconsin-Milwaukee, participated in the third i2b2 shared-task for challenges in natural language processing for clinical data: medication extraction challenge. With the performance metrics provided by the i2b2 challenge, the micro F1 (precision/recall) scores are reported for both the horizontal and vertical level.

Results

Among the top 10 teams, Lancet achieved the highest precision at 90.4% with an overall F1 score of 76.4% (horizontal system level with exact match), a gain of 11.2% and 12%, respectively, compared with the rule-based baseline system jMerki. By combining the two systems, the hybrid system further increased the F1 score by 3.4% from 76.4% to 79.0%.

Conclusions

Supervised machine-learning systems with minimal external knowledge resources can achieve a high precision with a competitive overall F1 score.Lancet based on this learning framework does not rely on expensive manually curated rules. The system is available online at http://code.google.com/p/lancet/.Pharmacotherapy is an important part of a patient''s medical treatment, and nearly all patient records incorporate a significant amount of medication information. The administration of medication at a specific time point during the patient''s medical diagnosis, treatment, or prevention of disease is referred to as a medication event,1–3 and the written representation of these events typically comprises the name of the medication and any of its associated fields, including but not limited to dosage, mode, frequency, etc.4 Accurately capturing medication events from patient records is an important step toward large-scale data mining and knowledge discovery,5 medication surveillance and clinical decision support6 and medication reconciliation.7–10In addition to its importance, medication event information (eg, treatment outcomes, medication reactions and allergy information) is often difficult to extract, as clinical records exhibit a range of different styles and grammatical structures for recording such information.4 Therefore, Informatics for Integrating Biology & the Bedside (i2b2) recognized automatic medication event extraction with natural language processing (NLP) approaches as one of the great challenges in medical informatics. As one of 20 groups that participated in the i2b2 medication extraction challenge, we report in this study on Lancet, which we developed for medication event extraction.  相似文献   

6.
应用Delphi开发全文文献管理系统   总被引:2,自引:0,他引:2  
对课题研究中工作组或个人参考文献资料管理的现状进行分析,将先进的网络数据库技术应用于文献管理系统的开发,介绍了系统开发的设计思想、采用的客户服务器结构及实现方法和具体应用。利用面向对象的开发工具Delphi,设计出基于本地数据库或服务器模式的文献管理系统。该系统已在课题研究中得以应用,并取得了较好的使用效果。  相似文献   

7.

Objective

To develop a computerized clinical decision support system (CDSS) for cervical cancer screening that can interpret free-text Papanicolaou (Pap) reports.

Materials and Methods

The CDSS was constituted by two rulebases: the free-text rulebase for interpreting Pap reports and a guideline rulebase. The free-text rulebase was developed by analyzing a corpus of 49 293 Pap reports. The guideline rulebase was constructed using national cervical cancer screening guidelines. The CDSS accesses the electronic medical record (EMR) system to generate patient-specific recommendations. For evaluation, the screening recommendations made by the CDSS for 74 patients were reviewed by a physician.

Results and Discussion

Evaluation revealed that the CDSS outputs the optimal screening recommendations for 73 out of 74 test patients and it identified two cases for gynecology referral that were missed by the physician. The CDSS aided the physician to amend recommendations in six cases. The failure case was because human papillomavirus (HPV) testing was sometimes performed separately from the Pap test and these results were reported by a laboratory system that was not queried by the CDSS. Subsequently, the CDSS was upgraded to look up the HPV results missed earlier and it generated the optimal recommendations for all 74 test cases.

Limitations

Single institution and single expert study.

Conclusion

An accurate CDSS system could be constructed for cervical cancer screening given the standardized reporting of Pap tests and the availability of explicit guidelines. Overall, the study demonstrates that free text in the EMR can be effectively utilized through natural language processing to develop clinical decision support tools.  相似文献   

8.
9.
This paper reports on the development of an on-line automated medical record system suitable for nursing homes. The software was written in standard MUMPS (Massachusetts General Hospital Utilities Multi-Programming System) and is easy for the first-time user to operate. The data managed by the system may be divided into four major components, including text tables, the hierarchical problem dictionary, the patient admission data, and the care plan data. The system met all of the project's objectives and has been enthusiastically received by the nursing home staff.  相似文献   

10.
An interactive computer-controlled system is described that is used for visual studies including Visual Evoked Potentials in humans and animals and Visual Receptive Field recordings in animals. Visual stimuli are generated by a display system and the brain activity is monitored by microelectrodes (for animal recordings) and scalp electrodes (for human recordings). The signals are amplified, digitized, and stored. The software uses a response feedback algorithm for mapping the receptive fields. Initially random patterns are presented on a TV monitor and the neural response is recorded. Depending on the response to the pattern and the light distribution in it, the algorithm calculates a new pattern, always trying to maximize the response. As the process goes on, the stimuli patterns become near optimal and thus the receptive field of the neuron is mapped automatically, as a result that for many years has been formed by trial and error. The same system is used for analysis of the recorded results and recordings of the Visual Evoked Potentials in animals and humans. For the human evoked potentials different patterns are generated on the display monitor with a variety of choices, ranging from the simplest (checkerboard and gratings) to the most complicated ones (faces and scenes).  相似文献   

11.

Objective

This paper describes the approaches the authors developed while participating in the i2b2/VA 2010 challenge to automatically extract medical concepts and annotate assertions on concepts and relations between concepts.

Design

The authors''approaches rely on both rule-based and machine-learning methods. Natural language processing is used to extract features from the input texts; these features are then used in the authors'' machine-learning approaches. The authors used Conditional Random Fields for concept extraction, and Support Vector Machines for assertion and relation annotation. Depending on the task, the authors tested various combinations of rule-based and machine-learning methods.

Results

The authors''assertion annotation system obtained an F-measure of 0.931, ranking fifth out of 21 participants at the i2b2/VA 2010 challenge. The authors'' relation annotation system ranked third out of 16 participants with a 0.709 F-measure. The 0.773 F-measure the authors obtained on concept extraction did not make it to the top 10.

Conclusion

On the one hand, the authors confirm that the use of only machine-learning methods is highly dependent on the annotated training data, and thus obtained better results for well-represented classes. On the other hand, the use of only a rule-based method was not sufficient to deal with new types of data. Finally, the use of hybrid approaches combining machine-learning and rule-based approaches yielded higher scores.  相似文献   

12.
Objective Semantic role labeling (SRL), which extracts a shallow semantic relation representation from different surface textual forms of free text sentences, is important for understanding natural language. Few studies in SRL have been conducted in the medical domain, primarily due to lack of annotated clinical SRL corpora, which are time-consuming and costly to build. The goal of this study is to investigate domain adaptation techniques for clinical SRL leveraging resources built from newswire and biomedical literature to improve performance and save annotation costs.Materials and Methods Multisource Integrated Platform for Answering Clinical Questions (MiPACQ), a manually annotated SRL clinical corpus, was used as the target domain dataset. PropBank and NomBank from newswire and BioProp from biomedical literature were used as source domain datasets. Three state-of-the-art domain adaptation algorithms were employed: instance pruning, transfer self-training, and feature augmentation. The SRL performance using different domain adaptation algorithms was evaluated by using 10-fold cross-validation on the MiPACQ corpus. Learning curves for the different methods were generated to assess the effect of sample size.Results and Conclusion When all three source domain corpora were used, the feature augmentation algorithm achieved statistically significant higher F-measure (83.18%), compared to the baseline with MiPACQ dataset alone (F-measure, 81.53%), indicating that domain adaptation algorithms may improve SRL performance on clinical text. To achieve a comparable performance to the baseline method that used 90% of MiPACQ training samples, the feature augmentation algorithm required <50% of training samples in MiPACQ, demonstrating that annotation costs of clinical SRL can be reduced significantly by leveraging existing SRL resources from other domains.  相似文献   

13.
During the last decade, capillary electrophoresis (CE) has emerged as an interesting alternative to traditional analysis of serum, plasma and urine proteins by agarose gel electrophoresis. Initially there was a considerable difference in resolution between the two methods but the quality of CE has improved significantly. We thus wanted to evaluate a second generation of automated multicapillary instruments (Capillarys, Sebia, Paris, France) and the high resolution (HR) buffer for serum or plasma protein analysis with an automated agarose gel electrophoresis system for the detection of M-components. The comparison between the two systems was performed with patients samples with and without M-components. The comparison included 76 serum samples with M-components > 1 g/L. There was a total agreement between the two methods for detection of these M-components. When studying samples containing oligoclonal bands/small M-components, there were differences between the two systems. The capillary electrophoresis system detected a slightly higher number of samples with oligoclonal bands but the two systems found oligoclonal bands in different samples. When looking at resolution, the agarose gel electrophoresis system yielded a slightly better resolution in the alpha and beta regions, but it required an experienced interpreter to be able to benefit from the increased resolution. The capillary electrophoresis has shorter turn-around times and bar-code reader that allows positive sample identification. The Capillarys in combination with HR buffer gives better resolution of the alpha and beta regions than the same instrument with the beta1-beta2+ buffer or the Paragon CZE2000 (Beckman) which was the first generation of capillary electrophoresis systems.  相似文献   

14.
15.
深入分析了古籍数字化标准体系内涵,探讨了医药古籍文献数字化标准体系的规范性、系统性、现实性、拓展性等构建原则,提出了医药古籍文献数字化标准体系构建的基础标准、技术标准、人才标准、管理标准以及相应措施。  相似文献   

16.

Objective

To provide a natural language processing method for the automatic recognition of events, temporal expressions, and temporal relations in clinical records.

Materials and Methods

A combination of supervised, unsupervised, and rule-based methods were used. Supervised methods include conditional random fields and support vector machines. A flexible automated feature selection technique was used to select the best subset of features for each supervised task. Unsupervised methods include Brown clustering on several corpora, which result in our method being considered semisupervised.

Results

On the 2012 Informatics for Integrating Biology and the Bedside (i2b2) shared task data, we achieved an overall event F1-measure of 0.8045, an overall temporal expression F1-measure of 0.6154, an overall temporal link detection F1-measure of 0.5594, and an end-to-end temporal link detection F1-measure of 0.5258. The most competitive system was our event recognition method, which ranked third out of the 14 participants in the event task.

Discussion

Analysis reveals the event recognition method has difficulty determining which modifiers to include/exclude in the event span. The temporal expression recognition method requires significantly more normalization rules, although many of these rules apply only to a small number of cases. Finally, the temporal relation recognition method requires more advanced medical knowledge and could be improved by separating the single discourse relation classifier into multiple, more targeted component classifiers.

Conclusions

Recognizing events and temporal expressions can be achieved accurately by combining supervised and unsupervised methods, even when only minimal medical knowledge is available. Temporal normalization and temporal relation recognition, however, are far more dependent on the modeling of medical knowledge.  相似文献   

17.
18.
An automated clinical medical record and audit system was developed to evaluate the effect of modifying physician behavior at the control points in the ambulatory care process and to determine if this change was reflected in patient care cost outcomes. This study compared clinical and cost results of patients in an experimental group, who had the automated record and audit system, to a control group, who had a traditional clinic record without chart audit. Physicians responded to the automated audit suggestions at a rate of 50.25 in the experimental group and 37.3 in the control group. No major differences were observed in clinical outcomes, with the exception of the number of days of hospitalizations and, consequently, the cost of hospitalizations. The experimental group cost for hospitalizations was one-third of the control group and accounted for a majority of the differences in the total annual cost for the two groups.  相似文献   

19.
摘要:目的探讨全自动核酸纯化系统在实时荧光定量检测中的应用价值。方法手工法和全自动核酸纯化系统配套不同磁珠法试剂,分别提取HCV及肠道病毒核酸样本,通过实施荧光定量检测不同方法提取的RNA,对比CT值,比较不同方法的提取效率。结果同一检测项目不同试剂的机提之间无显著差异(t1=0.805,P1=〉005;1.952,P4〉O.05),机提与手工法之间有显著差异(t3=-3.314,P3〈0.01;t1=-3.576,P2〈0.01);手工法提取30个样本所需时间均为2h左右;而全自动核酸纯化系统同时提取32个样本所需时间均在1h以内;结论全自动核酸纯化系统提取效率明显优于手工法,NP968全自动核酸纯化系统磁珠法试剂是全开放性的,全自动核酸纯化系统操作简单、快速、提取效率高,值得临床检验推广。  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号