期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Never too old for anonymity: a statistical standard for demographic data sharing via the HIPAA Privacy Rule

Bradley Malin Kathleen Benitez Daniel Masys 《J Am Med Inform Assoc》2011,18(1):3-10

Objective

Healthcare organizations must de-identify patient records before sharing data. Many organizations rely on the Safe Harbor Standard of the HIPAA Privacy Rule, which enumerates 18 identifiers that must be suppressed (eg, ages over 89). An alternative model in the Privacy Rule, known as the Statistical Standard, can facilitate the sharing of more detailed data, but is rarely applied because of a lack of published methodologies. The authors propose an intuitive approach to de-identifying patient demographics in accordance with the Statistical Standard.

Design

The authors conduct an analysis of the demographics of patient cohorts in five medical centers developed for the NIH-sponsored Electronic Medical Records and Genomics network, with respect to the US census. They report the re-identification risk of patient demographics disclosed according to the Safe Harbor policy and the relative risk rate for sharing such information via alternative policies.

Measurements

The re-identification risk of Safe Harbor demographics ranged from 0.01% to 0.19%. The findings show alternative de-identification models can be created with risks no greater than Safe Harbor. The authors illustrate that the disclosure of patient ages over the age of 89 is possible when other features are reduced in granularity.

Limitations

The de-identification approach described in this paper was evaluated with demographic data only and should be evaluated with other potential identifiers.

Conclusion

Alternative de-identification policies to the Safe Harbor model can be derived for patient demographics to enable the disclosure of values that were previously suppressed. The method is generalizable to any environment in which population statistics are available. 相似文献

2.

High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge

Jon Patrick Min Li 《J Am Med Inform Assoc》2010,17(5):524-527

Objective

Medication information comprises a most valuable source of data in clinical records. This paper describes use of a cascade of machine learners that automatically extract medication information from clinical records.

Design

Authors developed a novel supervised learning model that incorporates two machine learning algorithms and several rule-based engines.

Measurements

Evaluation of each step included precision, recall and F-measure metrics. The final outputs of the system were scored using the i2b2 workshop evaluation metrics, including strict and relaxed matching with a gold standard.

Results

Evaluation results showed greater than 90% accuracy on five out of seven entities in the name entity recognition task, and an F-measure greater than 95% on the relationship classification task. The strict micro averaged F-measure for the system output achieved best submitted performance of the competition, at 85.65%.

Limitations

Clinical staff will only use practical processing systems if they have confidence in their reliability. Authors estimate that an acceptable accuracy for a such a working system should be approximately 95%. This leaves a significant performance gap of 5 to 10% from the current processing capabilities.

Conclusion

A multistage method with mixed computational strategies using a combination of rule-based classifiers and statistical classifiers seems to provide a near-optimal strategy for automated extraction of medication information from clinical records.Many of the potential benefits of the electronic medical record (EMR) rely significantly on our ability to automatically process the free-text content in the EMR. To understand the limitations and difficulties of exploiting the EMR we have designed an information extraction engine to identify medication events within patient discharge summaries, as specified by the i2b2 medication extraction shared task. 相似文献

3.

BoB,a best-of-breed automated text de-identification system for VHA clinical documents

Oscar Ferrández Brett R South Shuying Shen F Jeffrey Friedlin Matthew H Samore Stéphane M Meystre 《J Am Med Inform Assoc》2013,20(1):77-83

Objective

De-identification allows faster and more collaborative clinical research while protecting patient confidentiality. Clinical narrative de-identification is a tedious process that can be alleviated by automated natural language processing methods. The goal of this research is the development of an automated text de-identification system for Veterans Health Administration (VHA) clinical documents.

Materials and methods

We devised a novel stepwise hybrid approach designed to improve the current strategies used for text de-identification. The proposed system is based on a previous study on the best de-identification methods for VHA documents. This best-of-breed automated clinical text de-identification system (aka BoB) tackles the problem as two separate tasks: (1) maximize patient confidentiality by redacting as much protected health information (PHI) as possible; and (2) leave de-identified documents in a usable state preserving as much clinical information as possible.

Results

We evaluated BoB with a manually annotated corpus of a variety of VHA clinical notes, as well as with the 2006 i2b2 de-identification challenge corpus. We present evaluations at the instance- and token-level, with detailed results for BoB''s main components. Moreover, an existing text de-identification system was also included in our evaluation.

Discussion

BoB''s design efficiently takes advantage of the methods implemented in its pipeline, resulting in high sensitivity values (especially for sensitive PHI categories) and a limited number of false positives.

Conclusions

Our system successfully addressed VHA clinical document de-identification, and its hybrid stepwise design demonstrates robustness and efficiency, prioritizing patient confidentiality while leaving most clinical information intact. 相似文献

4.

Medication information extraction with linguistic pattern matching and semantic rules

Irena Spasi? Farzaneh Sarafraz John A Keane Goran Nenadi? 《J Am Med Inform Assoc》2010,17(5):532-535

Objective

This study presents a system developed for the 2009 i2b2 Challenge in Natural Language Processing for Clinical Data, whose aim was to automatically extract certain information about medications used by a patient from his/her medical report. The aim was to extract the following information for each medication: name, dosage, mode/route, frequency, duration and reason.

Design

The system implements a rule-based methodology, which exploits typical morphological, lexical, syntactic and semantic features of the targeted information. These features were acquired from the training dataset and public resources such as the UMLS and relevant web pages. Information extracted by pattern matching was combined together using context-sensitive heuristic rules.

Measurements

The system was applied to a set of 547 previously unseen discharge summaries, and the extracted information was evaluated against a manually prepared gold standard consisting of 251 documents. The overall ranking of the participating teams was obtained using the micro-averaged F-measure as the primary evaluation metric.

Results

The implemented method achieved the micro-averaged F-measure of 81% (with 86% precision and 77% recall), which ranked this system third in the challenge. The significance tests revealed the system''s performance to be not significantly different from that of the second ranked system. Relative to other systems, this system achieved the best F-measure for the extraction of duration (53%) and reason (46%).

Conclusion

Based on the F-measure, the performance achieved (81%) was in line with the initial agreement between human annotators (82%), indicating that such a system may greatly facilitate the process of extracting relevant information from medical records by providing a solid basis for a manual review process.The 2009 i2b2 medication extraction challenge¹ focused on the extraction of medication-related information including: medication name (m), dosage (do), mode (mo), frequency (f), duration (du) and reason (r) from discharge summaries. In other words, free-text medical records needed to be converted into a structured form by filling a template (a data structure with the predefined slots)² with the relevant information extracted (slot fillers). For example, the following sentence:“In the past two months, she had been taking Ativan of 3–4 mg q.d. for anxiety.”should be converted automatically into a structured form as follows:m=“ativan” ‖ do=“3–4 mg” ‖ mo=“nm” ‖ f=“q.d.” ‖ du=“two months” ‖ r=“for anxiety”Note that only explicitly mentioned information was to be extracted with no attempt to map it to standardized terminology or to interpret it semantically. 相似文献

5.

Automated identification of drug and food allergies entered using non-standard terminology

Richard H Epstein Paul St Jacques Michael Stockin Brian Rothman Jesse M Ehrenfeld Joshua C Denny 《J Am Med Inform Assoc》2013,20(5):962-968

Objective

An accurate computable representation of food and drug allergy is essential for safe healthcare. Our goal was to develop a high-performance, easily maintained algorithm to identify medication and food allergies and sensitivities from unstructured allergy entries in electronic health record (EHR) systems.

Materials and methods

An algorithm was developed in Transact-SQL to identify ingredients to which patients had allergies in a perioperative information management system. The algorithm used RxNorm and natural language processing techniques developed on a training set of 24 599 entries from 9445 records. Accuracy, specificity, precision, recall, and F-measure were determined for the training dataset and repeated for the testing dataset (24 857 entries from 9430 records).

Results

Accuracy, precision, recall, and F-measure for medication allergy matches were all above 98% in the training dataset and above 97% in the testing dataset for all allergy entries. Corresponding values for food allergy matches were above 97% and above 93%, respectively. Specificities of the algorithm were 90.3% and 85.0% for drug matches and 100% and 88.9% for food matches in the training and testing datasets, respectively.

Discussion

The algorithm had high performance for identification of medication and food allergies. Maintenance is practical, as updates are managed through upload of new RxNorm versions and additions to companion database tables. However, direct entry of codified allergy information by providers (through autocompleters or drop lists) is still preferred to post-hoc encoding of the data. Data tables used in the algorithm are available for download.

Conclusions

A high performing, easily maintained algorithm can successfully identify medication and food allergies from free text entries in EHR systems. 相似文献

6.

Evaluating re-identification risks with respect to the HIPAA privacy rule

Kathleen Benitez Bradley Malin 《J Am Med Inform Assoc》2010,17(2):169-177

Objective

Many healthcare organizations follow data protection policies that specify which patient identifiers must be suppressed to share “de-identified” records. Such policies, however, are often applied without knowledge of the risk of “re-identification”. The goals of this work are: (1) to estimate re-identification risk for data sharing policies of the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule; and (2) to evaluate the risk of a specific re-identification attack using voter registration lists.

Measurements

We define several risk metrics: (1) expected number of re-identifications; (2) estimated proportion of a population in a group of size g or less, and (3) monetary cost per re-identification. For each US state, we estimate the risk posed to hypothetical datasets, protected by the HIPAA Safe Harbor and Limited Dataset policies by an attacker with full knowledge of patient identifiers and with limited knowledge in the form of voter registries.

Results

The percentage of a state''s population estimated to be vulnerable to unique re-identification (ie, g=1) when protected via Safe Harbor and Limited Datasets ranges from 0.01% to 0.25% and 10% to 60%, respectively. In the voter attack, this number drops for many states, and for some states is 0%, due to the variable availability of voter registries in the real world. We also find that re-identification cost ranges from $0 to $17 000, further confirming risk variability.

Conclusions

This work illustrates that blanket protection policies, such as Safe Harbor, leave different organizations vulnerable to re-identification at different rates. It provides justification for locally performed re-identification risk estimates prior to sharing data. 相似文献

7.

Hybrid methods for improving information access in clinical documents: concept,assertion, and relation identification

Anne-Lyse Minard Anne-Laure Ligozat Asma Ben Abacha Delphine Bernhard Bruno Cartoni Louise Deléger Brigitte Grau Sophie Rosset Pierre Zweigenbaum Cyril Grouin 《J Am Med Inform Assoc》2011,18(5):588-593

Objective

This paper describes the approaches the authors developed while participating in the i2b2/VA 2010 challenge to automatically extract medical concepts and annotate assertions on concepts and relations between concepts.

Design

The authors''approaches rely on both rule-based and machine-learning methods. Natural language processing is used to extract features from the input texts; these features are then used in the authors'' machine-learning approaches. The authors used Conditional Random Fields for concept extraction, and Support Vector Machines for assertion and relation annotation. Depending on the task, the authors tested various combinations of rule-based and machine-learning methods.

Results

The authors''assertion annotation system obtained an F-measure of 0.931, ranking fifth out of 21 participants at the i2b2/VA 2010 challenge. The authors'' relation annotation system ranked third out of 16 participants with a 0.709 F-measure. The 0.773 F-measure the authors obtained on concept extraction did not make it to the top 10.

Conclusion

On the one hand, the authors confirm that the use of only machine-learning methods is highly dependent on the annotated training data, and thus obtained better results for well-represented classes. On the other hand, the use of only a rule-based method was not sufficient to deal with new types of data. Finally, the use of hybrid approaches combining machine-learning and rule-based approaches yielded higher scores. 相似文献

8.

Extracting medical information from narrative patient records: the case of medication-related information

Louise Deléger Cyril Grouin Pierre Zweigenbaum 《J Am Med Inform Assoc》2010,17(5):555-558

Objective

While essential for patient care, information related to medication is often written as free text in clinical records and, therefore, difficult to use in computerized systems. This paper describes an approach to automatically extract medication information from clinical records, which was developed to participate in the i2b2 2009 challenge, as well as different strategies to improve the extraction.

Design

Our approach relies on a semantic lexicon and extraction rules as a two-phase strategy: first, drug names are recognized and, then, the context of these names is explored to extract drug-related information (mode, dosage, etc) according to rules capturing the document structure and the syntax of each kind of information. Different configurations are tested to improve this baseline system along several dimensions, particularly drug name recognition—this step being a determining factor to extract drug-related information. Changes were tested at the level of the lexicons and of the extraction rules.

Results

The initial system participating in i2b2 achieved good results (global F-measure of 77%). Further testing of different configurations substantially improved the system (global F-measure of 81%), performing well for all types of information (eg, 84% for drug names and 88% for modes), except for durations and reasons, which remain problematic.

Conclusion

This study demonstrates that a simple rule-based system can achieve good performance on the medication extraction task. We also showed that controlled modifications (lexicon filtering and rule refinement) were the improvements that best raised the performance. 相似文献

9.

Automated concept-level information extraction to reduce the need for custom software and rules development

Leonard W D'Avolio Thien M Nguyen Sergey Goryachev Louis D Fiore 《J Am Med Inform Assoc》2011,18(5):607-613

Objective

Despite at least 40 years of promising empirical performance, very few clinical natural language processing (NLP) or information extraction systems currently contribute to medical science or care. The authors address this gap by reducing the need for custom software and rules development with a graphical user interface-driven, highly generalizable approach to concept-level retrieval.

Materials and methods

A ‘learn by example’ approach combines features derived from open-source NLP pipelines with open-source machine learning classifiers to automatically and iteratively evaluate top-performing configurations. The Fourth i2b2/VA Shared Task Challenge''s concept extraction task provided the data sets and metrics used to evaluate performance.

Results

Top F-measure scores for each of the tasks were medical problems (0.83), treatments (0.82), and tests (0.83). Recall lagged precision in all experiments. Precision was near or above 0.90 in all tasks.

Discussion

With no customization for the tasks and less than 5 min of end-user time to configure and launch each experiment, the average F-measure was 0.83, one point behind the mean F-measure of the 22 entrants in the competition. Strong precision scores indicate the potential of applying the approach for more specific clinical information extraction tasks. There was not one best configuration, supporting an iterative approach to model creation.

Conclusion

Acceptable levels of performance can be achieved using fully automated and generalizable approaches to concept-level information extraction. The described implementation and related documentation is available for download. 相似文献

10.

A classification approach to coreference in discharge summaries: 2011 i2b2 challenge

Xu Y Liu J Wu J Wang Y Tu Z Sun JT Tsujii J Chang EI 《J Am Med Inform Assoc》2012,19(5):897-905

Objective

To create a highly accurate coreference system in discharge summaries for the 2011 i2b2 challenge. The coreference categories include Person, Problem, Treatment, and Test.

Design

An integrated coreference resolution system was developed by exploiting Person attributes, contextual semantic clues, and world knowledge. It includes three subsystems: Person coreference system based on three Person attributes, Problem/Treatment/Test system based on numerous contextual semantic extractors and world knowledge, and Pronoun system based on a multi-class support vector machine classifier. The three Person attributes are patient, relative and hospital personnel. Contextual semantic extractors include anatomy, position, medication, indicator, temporal, spatial, section, modifier, equipment, operation, and assertion. The world knowledge is extracted from external resources such as Wikipedia.

Measurements

Micro-averaged precision, recall and F-measure in MUC, BCubed and CEAF were used to evaluate results.

Results

The system achieved an overall micro-averaged precision, recall and F-measure of 0.906, 0.925, and 0.915, respectively, on test data (from four hospitals) released by the challenge organizers. It achieved a precision, recall and F-measure of 0.905, 0.920 and 0.913, respectively, on test data without Pittsburgh data. We ranked the first out of 20 competing teams. Among the four sub-tasks on Person, Problem, Treatment, and Test, the highest F-measure was seen for Person coreference.

Conclusions

This system achieved encouraging results. The Person system can determine whether personal pronouns and proper names are coreferent or not. The Problem/Treatment/Test system benefits from both world knowledge in evaluating the similarity of two mentions and contextual semantic extractors in identifying semantic clues. The Pronoun system can automatically detect whether a Pronoun mention is coreferent to that of the other four types. This study demonstrates that it is feasible to accomplish the coreference task in discharge summaries. 相似文献

11.

Integrating existing natural language processing tools for medication extraction from discharge summaries

Son Doan Lisa Bastarache Sergio Klimkowski Joshua C Denny Hua Xu 《J Am Med Inform Assoc》2010,17(5):528-531

Objective

To develop an automated system to extract medications and related information from discharge summaries as part of the 2009 i2b2 natural language processing (NLP) challenge. This task required accurate recognition of medication name, dosage, mode, frequency, duration, and reason for drug administration.

Design

We developed an integrated system using several existing NLP components developed at Vanderbilt University Medical Center, which included MedEx (to extract medication information), SecTag (a section identification system for clinical notes), a sentence splitter, and a spell checker for drug names. Our goal was to achieve good performance with minimal to no specific training for this document corpus; thus, evaluating the portability of those NLP tools beyond their home institution. The integrated system was developed using 17 notes that were annotated by the organizers and evaluated using 251 notes that were annotated by participating teams.

Measurements

The i2b2 challenge used standard measures, including precision, recall, and F-measure, to evaluate the performance of participating systems. There were two ways to determine whether an extracted textual finding is correct or not: exact matching or inexact matching. The overall performance for all six types of medication-related findings across 251 annotated notes was considered as the primary metric in the challenge.

Results

Our system achieved an overall F-measure of 0.821 for exact matching (0.839 precision; 0.803 recall) and 0.822 for inexact matching (0.866 precision; 0.782 recall). The system ranked second out of 20 participating teams on overall performance at extracting medications and related information.

Conclusions

The results show that the existing MedEx system, together with other NLP components, can extract medication information in clinical text from institutions other than the site of algorithm development with reasonable performance. 相似文献

12.

Comprehensive temporal information detection from clinical text: medical events,time, and TLINK identification

Sunghwan Sohn Kavishwar B Wagholikar Dingcheng Li Siddhartha R Jonnalagadda Cui Tao Ravikumar Komandur Elayavilli Hongfang Liu 《J Am Med Inform Assoc》2013,20(5):836-842

Background

Temporal information detection systems have been developed by the Mayo Clinic for the 2012 i2b2 Natural Language Processing Challenge.

Objective

To construct automated systems for EVENT/TIMEX3 extraction and temporal link (TLINK) identification from clinical text.

Materials and methods

The i2b2 organizers provided 190 annotated discharge summaries as the training set and 120 discharge summaries as the test set. Our Event system used a conditional random field classifier with a variety of features including lexical information, natural language elements, and medical ontology. The TIMEX3 system employed a rule-based method using regular expression pattern match and systematic reasoning to determine normalized values. The TLINK system employed both rule-based reasoning and machine learning. All three systems were built in an Apache Unstructured Information Management Architecture framework.

Results

Our TIMEX3 system performed the best (F-measure of 0.900, value accuracy 0.731) among the challenge teams. The Event system produced an F-measure of 0.870, and the TLINK system an F-measure of 0.537.

Conclusions

Our TIMEX3 system demonstrated good capability of regular expression rules to extract and normalize time information. Event and TLINK machine learning systems required well-defined feature sets to perform well. We could also leverage expert knowledge as part of the machine learning features to further improve TLINK identification performance. 相似文献

13.

A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction

Qi Li Haijun Zhai Louise Deleger Todd Lingren Megan Kaiser Laura Stoutenborough Imre Solti 《J Am Med Inform Assoc》2013,20(5):915-921

Objective

The goal of this work was to evaluate machine learning methods, binary classification and sequence labeling, for medication–attribute linkage detection in two clinical corpora.

Data and methods

We double annotated 3000 clinical trial announcements (CTA) and 1655 clinical notes (CN) for medication named entities and their attributes. A binary support vector machine (SVM) classification method with parsimonious feature sets, and a conditional random fields (CRF)-based multi-layered sequence labeling (MLSL) model were proposed to identify the linkages between the entities and their corresponding attributes. We evaluated the system''s performance against the human-generated gold standard.

Results

The experiments showed that the two machine learning approaches performed statistically significantly better than the baseline rule-based approach. The binary SVM classification achieved 0.94 F-measure with individual tokens as features. The SVM model trained on a parsimonious feature set achieved 0.81 F-measure for CN and 0.87 for CTA. The CRF MLSL method achieved 0.80 F-measure on both corpora.

Discussion and conclusions

We compared the novel MLSL method with a binary classification and a rule-based method. The MLSL method performed statistically significantly better than the rule-based method. However, the SVM-based binary classification method was statistically significantly better than the MLSL method for both the CTA and CN corpora. Using parsimonious feature sets both the SVM-based binary classification and CRF-based MLSL methods achieved high performance in detecting medication name and attribute linkages in CTA and CN. 相似文献

14.

Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries

Xu Y Hong K Tsujii J Chang EI 《J Am Med Inform Assoc》2012,19(5):824-832

Objective

A system that translates narrative text in the medical domain into structured representation is in great demand. The system performs three sub-tasks: concept extraction, assertion classification, and relation identification.

Design

The overall system consists of five steps: (1) pre-processing sentences, (2) marking noun phrases (NPs) and adjective phrases (APs), (3) extracting concepts that use a dosage-unit dictionary to dynamically switch two models based on Conditional Random Fields (CRF), (4) classifying assertions based on voting of five classifiers, and (5) identifying relations using normalized sentences with a set of effective discriminating features.

Measurements

Macro-averaged and micro-averaged precision, recall and F-measure were used to evaluate results.

Results

The performance is competitive with the state-of-the-art systems with micro-averaged F-measure of 0.8489 for concept extraction, 0.9392 for assertion classification and 0.7326 for relation identification.

Conclusions

The system exploits an array of common features and achieves state-of-the-art performance. Prudent feature engineering sets the foundation of our systems. In concept extraction, we demonstrated that switching models, one of which is especially designed for telegraphic sentences, improved extraction of the treatment concept significantly. In assertion classification, a set of features derived from a rule-based classifier were proven to be effective for the classes such as conditional and possible. These classes would suffer from data scarcity in conventional machine-learning methods. In relation identification, we use two-staged architecture, the second of which applies pairwise classifiers to possible candidate classes. This architecture significantly improves performance. 相似文献

15.

Research and applications: Learning regular expressions for clinical text classification

Duy Duc An Bui Qing Zeng-Treitler 《J Am Med Inform Assoc》2014,21(5):850-857

Objectives

Natural language processing (NLP) applications typically use regular expressions that have been developed manually by human experts. Our goal is to automate both the creation and utilization of regular expressions in text classification.

Methods

We designed a novel regular expression discovery (RED) algorithm and implemented two text classifiers based on RED. The RED+ALIGN classifier combines RED with an alignment algorithm, and RED+SVM combines RED with a support vector machine (SVM) classifier. Two clinical datasets were used for testing and evaluation: the SMOKE dataset, containing 1091 text snippets describing smoking status; and the PAIN dataset, containing 702 snippets describing pain status. We performed 10-fold cross-validation to calculate accuracy, precision, recall, and F-measure metrics. In the evaluation, an SVM classifier was trained as the control.

Results

The two RED classifiers achieved 80.9–83.0% in overall accuracy on the two datasets, which is 1.3–3% higher than SVM''s accuracy (p<0.001). Similarly, small but consistent improvements have been observed in precision, recall, and F-measure when RED classifiers are compared with SVM alone. More significantly, RED+ALIGN correctly classified many instances that were misclassified by the SVM classifier (8.1–10.3% of the total instances and 43.8–53.0% of SVM''s misclassifications).

Conclusions

Machine-generated regular expressions can be effectively used in clinical text classification. The regular expression-based classifier can be combined with other classifiers, like SVM, to improve classification performance. 相似文献

16.

Building public trust in uses of Health Insurance Portability and Accountability Act de-identified data

Deven McGraw 《J Am Med Inform Assoc》2013,20(1):29-34

Objectives

The aim of this paper is to summarize concerns with the de-identification standard and methodologies established under the Health Insurance Portability and Accountability Act (HIPAA) regulations, and report some potential policies to address those concerns that were discussed at a recent workshop attended by industry, consumer, academic and research stakeholders.

Target audience

The target audience includes researchers, industry stakeholders, policy makers and consumer advocates concerned about preserving the ability to use HIPAA de-identified data for a range of important secondary uses.

Scope

HIPAA sets forth methodologies for de-identifying health data; once such data are de-identified, they are no longer subject to HIPAA regulations and can be used for any purpose. Concerns have been raised about the sufficiency of HIPAA de-identification methodologies, the lack of legal accountability for unauthorized re-identification of de-identified data, and insufficient public transparency about de-identified data uses. Although there is little published evidence of the re-identification of properly de-identified datasets, such concerns appear to be increasing. This article discusses policy proposals intended to address de-identification concerns while maintaining de-identification as an effective tool for protecting privacy and preserving the ability to leverage health data for secondary purposes. 相似文献

17.

Using global unique identifiers to link autism collections

Stephen B Johnson Glen Whitney Matthew McAuliffe Hailong Wang Evan McCreedy Leon Rozenblit Clark C Evans 《J Am Med Inform Assoc》2010,17(6):689-695

Objective

To propose a centralized method for generating global unique identifiers to link collections of research data and specimens.

Design

The work is a collaboration between the Simons Foundation Autism Research Initiative and the National Database for Autism Research. The system is implemented as a web service: an investigator inputs identifying information about a participant into a client application and sends encrypted information to a server application, which returns a generated global unique identifier. The authors evaluated the system using a volume test of one million simulated individuals and a field test on 2000 families (over 8000 individual participants) in an autism study.

Measurements

Inverse probability of hash codes; rate of false identity of two individuals; rate of false split of single individual; percentage of subjects for which identifying information could be collected; percentage of hash codes generated successfully.

Results

Large-volume simulation generated no false splits or false identity. Field testing in the Simons Foundation Autism Research Initiative Simplex Collection produced identifiers for 96% of children in the study and 77% of parents. On average, four out of five hash codes per subject were generated perfectly (only one perfect hash is required for subsequent matching).

Discussion

The system must achieve balance among the competing goals of distinguishing individuals, collecting accurate information for matching, and protecting confidentiality. Considerable effort is required to obtain approval from institutional review boards, obtain consent from participants, and to achieve compliance from sites during a multicenter study.

Conclusion

Generic unique identifiers have the potential to link collections of research data, augment the amount and types of data available for individuals, support detection of overlap between collections, and facilitate replication of research findings. 相似文献

18.

Linguistic approach for identification of medication names and related information in clinical narratives

Thierry Hamon Natalia Grabar 《J Am Med Inform Assoc》2010,17(5):549-554

Background

Pharmacotherapy is an integral part of any medical care process and plays an important role in the medical history of most patients. Information on medication is crucial for several tasks such as pharmacovigilance, medical decision or biomedical research.

Objectives

Within a narrative text, medication-related information can be buried within other non-relevant data. Specific methods, such as those provided by text mining, must be designed for accessing them, and this is the objective of this study.

Methods

The authors designed a system for analyzing narrative clinical documents to extract from them medication occurrences and medication-related information. The system also attempts to deduce medications not covered by the dictionaries used.

Results

Results provided by the system were evaluated within the framework of the I2B2 NLP challenge held in 2009. The system achieved an F-measure of 0.78 and ranked 7th out of 20 participating teams (the highest F-measure was 0.86). The system provided good results for the annotation and extraction of medication names, their frequency, dosage and mode of administration (F-measure over 0.81), while information on duration and reasons is poorly annotated and extracted (F-measure 0.36 and 0.29, respectively). The performance of the system was stable between the training and test sets. 相似文献

19.

Predefined headings in a multiprofessional electronic health record system

Annika Terner Helena Lindstedt Karin Sonnander 《J Am Med Inform Assoc》2012,19(6):1032-1038

相似文献

20.

A flexible framework for deriving assertions from electronic medical records

Kirk Roberts Sanda M Harabagiu 《J Am Med Inform Assoc》2011,18(5):568-573

Objective

This paper describes natural-language-processing techniques for two tasks: identification of medical concepts in clinical text, and classification of assertions, which indicate the existence, absence, or uncertainty of a medical problem. Because so many resources are available for processing clinical texts, there is interest in developing a framework in which features derived from these resources can be optimally selected for the two tasks of interest.

Materials and methods

The authors used two machine-learning (ML) classifiers: support vector machines (SVMs) and conditional random fields (CRFs). Because SVMs and CRFs can operate on a large set of features extracted from both clinical texts and external resources, the authors address the following research question: Which features need to be selected for obtaining optimal results? To this end, the authors devise feature-selection techniques which greatly reduce the amount of manual experimentation and improve performance.

Results

The authors evaluated their approaches on the 2010 i2b2/VA challenge data. Concept extraction achieves 79.59 micro F-measure. Assertion classification achieves 93.94 micro F-measure.

Discussion

Approaching medical concept extraction and assertion classification through ML-based techniques has the advantage of easily adapting to new data sets and new medical informatics tasks. However, ML-based techniques perform best when optimal features are selected. By devising promising feature-selection techniques, the authors obtain results that outperform the current state of the art.

Conclusion

This paper presents two ML-based approaches for processing language in the clinical texts evaluated in the 2010 i2b2/VA challenge. By using novel feature-selection methods, the techniques presented in this paper are unique among the i2b2 participants. 相似文献