期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Semantic Classification of Diseases in Discharge Summaries Using a Context-aware Rule-based Classifier

Illés Solt Domonkos Tikk Zsolt T. Kardkovács 《J Am Med Inform Assoc》2009,16(4):580-584

Objective

Automated and disease-specific classification of textual clinical discharge summaries is of great importance in human life science, as it helps physicians to make medical studies by providing statistically relevant data for analysis. This can be further facilitated if, at the labeling of discharge summaries, semantic labels are also extracted from text, such as whether a given disease is present, absent, questionable in a patient, or is unmentioned in the document. The authors present a classification technique that successfully solves the semantic classification task.

Design

The authors introduce a context-aware rule-based semantic classification technique for use on clinical discharge summaries. The classification is performed in subsequent steps. First, some misleading parts are removed from the text; then the text is partitioned into positive, negative, and uncertain context segments, then a sequence of binary classifiers is applied to assign the appropriate semantic labels.

Measurement

For evaluation the authors used the documents of the i2b2 Obesity Challenge and adopted its evaluation measures: F₁-macro and F₁-micro for measurements.

Results

On the two subtasks of the Obesity Challenge (textual and intuitive classification) the system performed very well, and achieved a F₁-macro = 0.80 for the textual and F₁-macro = 0.67 for the intuitive tasks, and obtained second place at the textual and first place at the intuitive subtasks of the challenge.

Conclusions

The authors show in the paper that a simple rule-based classifier can tackle the semantic classification task more successfully than machine learning techniques, if the training data are limited and some semantic labels are very sparse. 相似文献

2.

A Rule-based Approach for Identifying Obesity and Its Comorbidities in Medical Discharge Summaries

Ninad K. Mishra David M. Cummo Jason Bonander 《J Am Med Inform Assoc》2009,16(4):576-579

Objective

Evaluate the effectiveness of a simple rule-based approach in classifying medical discharge summaries according to indicators for obesity and 15 associated co-morbidities as part of the 2008 i2b2 Obesity Challenge.

Methods

The authors applied a rule-based approach that looked for occurrences of morbidity-related keywords and identified the types of assertions in which those keywords occurred. The documents were then classified using a simple scoring algorithm based on a mapping of the assertion types to possible judgment categories.

Measurements

Results for the challenge were evaluated based on macro F-measure. We report micro and macro F-measure results for all morbidities combined and for each morbidity separately.

Results

Our rule-based approach achieved micro and macro F-measures of 0.97 and 0.77, respectively, ranking fifth out of the entries submitted by 28 teams participating in the classification task based on textual judgments and substantially outperforming the average for the challenge.

Conclusions

As shown by its ranking in the challenge results, this approach performed relatively well under conditions in which limited training data existed for some judgment categories. Further, the approach held up well in relation to more complex approaches applied to this classification task. The approach could be enhanced by the addition of expert rules to model more complex medical reasoning. 相似文献

3.

Using Empiric Semantic Correlation to Interpret Temporal Assertions in Clinical Texts

George Hripcsak Noémie Elhadad PhD Yueh-Hsia Chen MS Li Zhou BMed PhD Frances P. Morrison MD MPH 《J Am Med Inform Assoc》2009,16(2):220

Objective

To measure the uncertainty of temporal assertions like “3 weeks ago” in clinical texts.

Design

Temporal assertions extracted from narrative clinical reports were compared to facts extracted from a structured clinical database for the same patients.

Measurements

The authors correlated the assertions and the facts to determine the dependence of the uncertainty of the assertions on the semantic and lexical properties of the assertions.

Results

The observed deviation between the stated duration and actual duration averaged about 20% of the stated deviation. Linear regression revealed that assertions about events further in the past tend to be more uncertain, smaller numeric values tend to be more uncertain (1 mo v. 30 d), and round numbers tend to be more uncertain (10 versus 11 yrs).

Conclusions

The authors empirically derived semantics behind statements of duration using “ago,” and verified intuitions about how numbers are used. 相似文献

4.

A System for Classifying Disease Comorbidity Status from Medical Discharge Summaries Using Automated Hotspot and Negated Concept Detection

Kyle H. Ambert Aaron M. Cohen MD MS 《J Am Med Inform Assoc》2009,16(4):590-595

Objective

Free-text clinical reports serve as an important part of patient care management and clinical documentation of patient disease and treatment status. Free-text notes are commonplace in medical practice, but remain an under-used source of information for clinical and epidemiological research, as well as personalized medicine. The authors explore the challenges associated with automatically extracting information from clinical reports using their submission to the Integrating Informatics with Biology and the Bedside (i2b2) 2008 Natural Language Processing Obesity Challenge Task.

Design

A text mining system for classifying patient comorbidity status, based on the information contained in clinical reports. The approach of the authors incorporates a variety of automated techniques, including hot-spot filtering, negated concept identification, zero-vector filtering, weighting by inverse class-frequency, and error-correcting of output codes with linear support vector machines.

Measurements

Performance was evaluated in terms of the macroaveraged F1 measure.

Results

The automated system performed well against manual expert rule-based systems, finishing fifth in the Challenge's intuitive task, and 13^th in the textual task.

Conclusions

The system demonstrates that effective comorbidity status classification by an automated system is possible. 相似文献

5.

A Text Mining Approach to the Prediction of Disease Status from Clinical Discharge Summaries

Hui Yang 《J Am Med Inform Assoc》2009,16(4):596-600

Objective

The authors present a system developed for the Challenge in Natural Language Processing for Clinical Data—the i2b2 obesity challenge, whose aim was to automatically identify the status of obesity and 15 related co-morbidities in patients using their clinical discharge summaries. The challenge consisted of two tasks, textual and intuitive. The textual task was to identify explicit references to the diseases, whereas the intuitive task focused on the prediction of the disease status when the evidence was not explicitly asserted.

Design

The authors assembled a set of resources to lexically and semantically profile the diseases and their associated symptoms, treatments, etc. These features were explored in a hybrid text mining approach, which combined dictionary look-up, rule-based, and machine-learning methods.

Measurements

The methods were applied on a set of 507 previously unseen discharge summaries, and the predictions were evaluated against a manually prepared gold standard. The overall ranking of the participating teams was primarily based on the macro-averaged F-measure.

Results

The implemented method achieved the macro-averaged F-measure of 81% for the textual task (which was the highest achieved in the challenge) and 63% for the intuitive task (ranked 7^th out of 28 teams—the highest was 66%). The micro-averaged F-measure showed an average accuracy of 97% for textual and 96% for intuitive annotations.

Conclusions

The performance achieved was in line with the agreement between human annotators, indicating the potential of text mining for accurate and efficient prediction of disease statuses from clinical discharge summaries. 相似文献

6.

A Randomized Trial Comparing Telemedicine Case Management with Usual Care in Older, Ethnically Diverse, Medically Underserved Patients with Diabetes Mellitus: 5 Year Results of the IDEATel Study

Steven Shea Ruth S. Weinstock Jeanne A. Teresi Walter Palmas Justin Starren James J. Cimino Albert M. Lai Lesley Field Philip C. Morin Robin Goland Roberto E. Izquierdo Susana Ebner Stephanie Silver Eva Petkova Jian Kong Joseph P. Eimicke IDEATel Consortium 《J Am Med Inform Assoc》2009,16(4):446-456

Context

Telemedicine is a promising but largely unproven technology for providing case management services to patients with chronic conditions and lower access to care.

Objectives

To examine the effectiveness of a telemedicine intervention to achieve clinical management goals in older, ethnically diverse, medically underserved patients with diabetes.

Design, Setting, and Patients

A randomized controlled trial was conducted, comparing telemedicine case management to usual care, with blinded outcome evaluation, in 1,665 Medicare recipients with diabetes, aged ≥ 55 years, residing in federally designated medically underserved areas of New York State.

Interventions

Home telemedicine unit with nurse case management versus usual care.

Main Outcome Measures

The primary endpoints assessed over 5 years of follow-up were hemoglobin A1c (HgbA1c), low density lipoprotein (LDL) cholesterol, and blood pressure levels.

Results

Intention-to-treat mixed models showed that telemedicine achieved net overall reductions over five years of follow-up in the primary endpoints (HgbA1c, p = 0.001; LDL, p < 0.001; systolic and diastolic blood pressure, p = 0.024; p < 0.001). Estimated differences (95% CI) in year 5 were 0.29 (0.12, 0.46)% for HgbA1c, 3.84 (−0.08, 7.77) mg/dL for LDL cholesterol, and 4.32 (1.93, 6.72) mm Hg for systolic and 2.64 (1.53, 3.74) mm Hg for diastolic blood pressure. There were 176 deaths in the intervention group and 169 in the usual care group (hazard ratio 1.01 [0.82, 1.24]).

Conclusions

Telemedicine case management resulted in net improvements in HgbA1c, LDL-cholesterol and blood pressure levels over 5 years in medically underserved Medicare beneficiaries. Mortality was not different between the groups, although power was limited.

Trial Registration

http://clinicaltrials.gov Identifier: NCT00271739. 相似文献

7.

Automated Database Mediation Using Ontological Metadata Mappings

Luis Marenco Rixin Wang Prakash Nadkarni 《J Am Med Inform Assoc》2009,16(5):723-737

相似文献

8.

Resistance Is Futile: But It Is Slowing the Pace of EHR Adoption Nonetheless

Eric W. Ford Nir Menachemi Lori T. Peterson 《J Am Med Inform Assoc》2009,16(3):274-281

Objective

The purpose of this study is to reassess the projected rate of Electronic Health Record (EHR) diffusion and examine how the federal government's efforts to promote the use of EHR technology have influenced physicians' willingness to adopt such systems. The study recreates and extends the analyses conducted by Ford et al.¹ The two periods examined come before and after the U.S. Federal Government's concerted activity to promote EHR adoption.

Design

Meta-analysis and bass modeling are used to compare EHR diffusion rates for two distinct periods of government activity. Very low levels of government activity to promote EHR diffusion marked the first period, before 2004. In 2004, the President of the United States called for a “Universal EHR Adoption” by 2014 (10 yrs), creating the major wave of activity and increased awareness of how EHRs will impact physicians' practices.

Measurement

EHR adoption parameters—external and internal coefficients of influence—are estimated using bass diffusion models and future adoption rates are projected.

Results

Comparing the EHR adoption rates before and after 2004 (2001-2004 and 2001-2007 respectively) indicate the physicians' resistance to adoption has increased during the second period. Based on current levels of adoption, less than half the physicians working in small practices will have implemented an EHR by 2014 (47.3%).

Conclusions

The external forces driving EHR diffusion have grown in importance since 2004 relative to physicians' internal motivation to adopt such systems. Several national forces are likely contributing to the slowing pace of EHR diffusion. 相似文献

9.

Using Semantic and Structural Properties of the Unified Medical Language System to Discover Potential Terminological Relationships

Chintan O. Patel James J. Cimino 《J Am Med Inform Assoc》2009,16(3):346-353

Objective

To use the semantic and structural properties in the Unified Medical Language System (UMLS) Metathesaurus to characterize and discover potential relationships.

Design

The UMLS integrates knowledge from several biomedical terminologies. This knowledge can be used to discover implicit semantic relationships between concepts. In this paper, the authors propose a problem-independent approach for discovering potential terminological relationships that employs semantic abstraction of indirect relationship paths to perform classification and analysis of network theoretical measures such as topological overlap, preferential attachment, graph partitioning, and number of indirect paths. Using different versions of the UMLS, the authors evaluate the proposed approach's ability to predict newly added relationships.

Measurements

Classification accuracy, precision-recall.

Results

Strong discriminative characteristics were observed with a semantic abstraction based classifier (classification accuracy of 91%), the average number of indirect paths, preferential attachment, and graph partitioning to identify potential relationships. The proposed relationship prediction algorithm resulted in 56% recall in top 10 results for new relationships added to subsequent versions of the UMLS between 2005 and 2007.

Conclusions

The UMLS has sufficient knowledge to enable discovery of potential terminological relationships. 相似文献

10.

Profiling Characteristics of Internet Medical Information Users

James B. Weaver III Darren Mays Gregg Lindner Do?an Ero?lu Frederick Fridinger Jay M. Bernhardt 《J Am Med Inform Assoc》2009,16(5):714-722

Objective

The Internet's potential to bolster health promotion and disease prevention efforts has attracted considerable attention. Existing research leaves two things unclear, however: the prevalence of online health and medical information seeking and the distinguishing characteristics of individuals who seek that information.

Design

This study seeks to clarify and extend the knowledge base concerning health and medical information use online by profiling adults using Internet medical information (IMI). Secondary analysis of survey data from a large sample (n = 6,119) representative of the Atlanta, GA, area informed this investigation.

Measurements

Five survey questions were used to assess IMI use and general computer and Internet use during the 30 days before the survey was administered. Five questions were also used to assess respondents' health care system use. Several demographic characteristics were measured.

Results

Contrary to most prior research, this study found relatively low prevalence of IMI-seeking behavior. Specifically, IMI use was reported by 13.2% of all respondents (n = 6,119) and by 21.1% of respondents with Internet access (n = 3,829). Logistic regression models conducted among respondents accessing the Internet in the previous 30 days revealed that, when controlling for several sociodemographic characteristics, home computer ownership, online time per week, and health care system use are all positively linked with IMI-seeking behavior.

Conclusions

The data suggest it may be premature to embrace unilaterally the Internet as an effective asset for health promotion and disease prevention efforts that target the public. 相似文献

11.

An Empiric Modification to the Probabilistic Record Linkage Algorithm Using Frequency-Based Weight Scaling

Vivienne J. Zhu Marc J. Overhage James Egg Shaun J. Grannis 《J Am Med Inform Assoc》2009,16(5):738-745

Objective

To incorporate value-based weight scaling into the Fellegi-Sunter (F-S) maximum likelihood linkage algorithm and evaluate the performance of the modified algorithm.

Background

Because healthcare data are fragmented across many healthcare systems, record linkage is a key component of fully functional health information exchanges. Probabilistic linkage methods produce more accurate, dynamic, and robust matching results than rule-based approaches, particularly when matching patient records that lack unique identifiers. Theoretically, the relative frequency of specific data elements can enhance the F-S method, including minimizing the false-positive or false-negative matches. However, to our knowledge, no frequency-based weight scaling modification to the F-S method has been implemented and specifically evaluated using real-world clinical data.

Methods

The authors implemented a value-based weight scaling modification using an information theoretical model, and formally evaluated the effectiveness of this modification by linking 51,361 records from Indiana statewide newborn screening data to 80,089 HL7 registration messages from the Indiana Network for Patient Care, an operational health information exchange. In addition to applying the weight scaling modification to all fields, we examined the effect of selectively scaling common or uncommon field-specific values.

Results

The sensitivity, specificity, and positive predictive value for applying weight scaling to all field-specific values were 95.4, 98.8, and 99.9%, respectively. Compared with nonweight scaling, the modified F-S algorithm demonstrated a 10% increase in specificity with a 3% decrease in sensitivity.

Conclusion

By eliminating false-positive matches, the value-based weight modification can enhance the specificity of the F-S method with minimal decrease in sensitivity. 相似文献

12.

The effect of root of rhododendron on the activation of NF-κB in a chronic glomerulonephritis rat model

Jing Xionga Zhonghua Zhua Jianshe Liua Yang Wangb a 《南京医科大学学报(英文版)》2009,23(1):73-78

Objective:We have explored the role of nuclear factor kappa B(NF-κB) in the pathogenesis of chronic glomerulonephritis,and investigated the effect of rhododendron root on the activation of NF-κB.Methods:Thirty-six Wistar rats were randomly divided into three groups:a control group,a glomerulonephritis model group and a therapy group(glomerulouephritis animals treated with the root of rhododendron).Bovine serum albumin(BSA) nephritis was induced by subcutaneous immunization and daily intraperitoneal administra-tion of BSA.Twenty-four-hour urinary protein and serum creatinine values were measured,and renal pathology was assessed histologi-cally by optical microscopy and electron microscopy.NF-κB activity was determined by an electrophoretic mobility shift assay(EMSA).Results:Compared with the control rats,glomerulonephrids model rats exhibited a significant increase in both 24 h urinary protein and serum creatinine,and had abnormal renal histology.The administration of the root of rhododendron ameliorated these changes.NF-κ B activity in glomerulonephritis model group was greater than that in rhododendron-treated group,and NF-κB activity was greater in both glomerulonephritis groups than in the control group(P<0.01).Conclusion:These observations suggest that NF-κ B plays a role in the pathogenesis of chronic glomerulonephritis,and rhododendron root may attenuate renal damages by downregulating the activation of NF-kB in this model. 相似文献

13.

Evaluation of a Method to Identify and Categorize Section Headers in Clinical Documents

Joshua C. Denny Anderson Spickard III Kevin B. Johnson Neeraja B. Peterson Josh F. Peterson Randolph A. Miller 《J Am Med Inform Assoc》2009,16(6):806-815

Objective

Clinical notes, typically written in natural language, often contain substructure that divides them into sections, such as “History of Present Illness” or “Family Medical History.” The authors designed and evaluated an algorithm (“SecTag”) to identify both labeled and unlabeled (implied) note section headers in “history and physical examination” documents (“H&P notes”).

Design

The SecTag algorithm uses a combination of natural language processing techniques, word variant recognition with spelling correction, terminology-based rules, and naive Bayesian scoring methods to identify note section headers. Eleven physicians evaluated SecTag's performance on 319 randomly chosen H&P notes.

Measurements

The primary outcomes were the algorithm's recall and precision in identifying all document sections and a predefined list of twenty-nine major sections. A secondary outcome was to evaluate the algorithm's ability to recognize the correct start and end boundaries of identified sections.

Results

The SecTag algorithm identified 16,036 total sections and 7,858 major sections. Physician evaluators classified 15,329 as true positives and identified 160 sections omitted by SecTag. The recall and precision of the SecTag algorithm were 99.0 and 95.6% for all sections, 98.6 and 96.2% for major sections, and 96.6 and 86.8% for unlabeled sections. The algorithm determined the correct starting and ending text boundaries for 94.8% of labeled sections and 85.9% of unlabeled sections.

Conclusions

The SecTag algorithm accurately identified both labeled and unlabeled sections in history and physical documents. This type of algorithm may assist in natural language processing applications, such as clinical decision support systems or competency assessment for medical trainees. 相似文献

14.

A Globally Optimal k-Anonymity Method for the De-Identification of Health Data

Khaled El Emam Fida Kamal Dankar Romeo Issa Elizabeth Jonker Daniel Amyot Elise Cogo Jean-Pierre Corriveau Mark Walker Sadrul Chowdhury Regis Vaillancourt Tyson Roffey Jim Bottomley 《J Am Med Inform Assoc》2009,16(5):670-682

Background

Explicit patient consent requirements in privacy laws can have a negative impact on health research, leading to selection bias and reduced recruitment. Often legislative requirements to obtain consent are waived if the information collected or disclosed is de-identified.

Objective

The authors developed and empirically evaluated a new globally optimal de-identification algorithm that satisfies the k-anonymity criterion and that is suitable for health datasets.

Design

Authors compared OLA (Optimal Lattice Anonymization) empirically to three existing k-anonymity algorithms, Datafly, Samarati, and Incognito, on six public, hospital, and registry datasets for different values of k and suppression limits.

Measurement

Three information loss metrics were used for the comparison: precision, discernability metric, and non-uniform entropy. Each algorithm's performance speed was also evaluated.

Results

The Datafly and Samarati algorithms had higher information loss than OLA and Incognito; OLA was consistently faster than Incognito in finding the globally optimal de-identification solution.

Conclusions

For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance. 相似文献

15.

The HL7-OMG Healthcare Services Specification Project: Motivation, Methodology, and Deliverables for Enabling a Semantically Interoperable Service-oriented Architecture for Healthcare

Kensaku Kawamoto Alan Honey 《J Am Med Inform Assoc》2009,16(6):874-881

Context

The healthcare industry could achieve significant benefits through the adoption of a service-oriented architecture (SOA). The specification and adoption of standard software service interfaces will be critical to achieving these benefits.

Objective

To develop a replicable, collaborative framework for standardizing the interfaces of software services important to healthcare.

Design

Iterative, peer-reviewed development of a framework for generating interoperable service specifications that build on existing and ongoing standardization efforts. The framework was created under the auspices of the Healthcare Services Specification Project (HSSP), which was initiated in 2005 as a joint initiative between Health Level7 (HL7) and the Object Management Group (OMG). In this framework, known as the HSSP Service Specification Framework, HL7 identifies candidates for service standardization and defines normative Service Functional Models (SFMs) that specify the capabilities and conformance criteria for these services. OMG then uses these SFMs to generate technical service specifications as well as reference implementations.

Measurements

The ability of the framework to support the creation of multiple, interoperable service specifications useful for healthcare.

Results

Functional specifications have been defined through HL7 for four services: the Decision Support Service; the Entity Identification Service; the Clinical Research Filtered Query Service; and the Retrieve, Locate, and Update Service. Technical specifications and commercial implementations have been developed for two of these services within OMG. Furthermore, three additional functional specifications are being developed through HL7.

Conclusions

The HSSP Service Specification Framework provides a replicable and collaborative approach to defining standardized service specifications for healthcare. 相似文献

16.

Describing and Modeling Workflow and Information Flow in Chronic Disease Care

Kim M. Unertl Matthew B. Weinger Kevin B. Johnson Nancy M. Lorenzi 《J Am Med Inform Assoc》2009,16(6):826-836

相似文献

17.

2010 i2b2/VA challenge on concepts,assertions, and relations in clinical text

?zlem Uzuner Brett R South Shuying Shen Scott L DuVall 《J Am Med Inform Assoc》2011,18(5):552-556

The 2010 i2b2/VA Workshop on Natural Language Processing Challenges for Clinical Records presented three tasks: a concept extraction task focused on the extraction of medical concepts from patient reports; an assertion classification task focused on assigning assertion types for medical problem concepts; and a relation classification task focused on assigning relation types that hold between medical problems, tests, and treatments. i2b2 and the VA provided an annotated reference standard corpus for the three tasks. Using this reference standard, 22 systems were developed for concept extraction, 21 for assertion classification, and 16 for relation classification.These systems showed that machine learning approaches could be augmented with rule-based systems to determine concepts, assertions, and relations. Depending on the task, the rule-based systems can either provide input for machine learning or post-process the output of machine learning. Ensembles of classifiers, information from unlabeled data, and external knowledge sources can help when the training data are inadequate. 相似文献

18.

Immune responses in wild-type mice against prion proteins induced using a DNA prime-protein boost strategy

Han Y Li Y Song J Wang Y Shi Q Chen C Zhang B Guo Y Li C Han J Dong X 《Biomedical and environmental sciences : BES》2011,24(5):523-529

Objective To break immune tolerance to prion (PrP) proteins using DNA vaccines.Methods Four different human prion DNA vaccine candidates were constructed based on the pcDNA3.1 vector:PrP‐WT expressing wild‐type PrP,Ubiq‐PrP expressing PrP fused to ubiquitin,PrP‐LII expressing PrP fused to the lysosomal integral membrane protein type II lysosome‐targeting signal,and PrP‐ER expressing PrP locating the ER.Using a prime‐boost strategy,three‐doses of DNA vaccine were injected intramuscularly into Balb/c mice,fol... 相似文献

19.

Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries

Xu Y Hong K Tsujii J Chang EI 《J Am Med Inform Assoc》2012,19(5):824-832

Objective

A system that translates narrative text in the medical domain into structured representation is in great demand. The system performs three sub-tasks: concept extraction, assertion classification, and relation identification.

Design

The overall system consists of five steps: (1) pre-processing sentences, (2) marking noun phrases (NPs) and adjective phrases (APs), (3) extracting concepts that use a dosage-unit dictionary to dynamically switch two models based on Conditional Random Fields (CRF), (4) classifying assertions based on voting of five classifiers, and (5) identifying relations using normalized sentences with a set of effective discriminating features.

Measurements

Macro-averaged and micro-averaged precision, recall and F-measure were used to evaluate results.

Results

The performance is competitive with the state-of-the-art systems with micro-averaged F-measure of 0.8489 for concept extraction, 0.9392 for assertion classification and 0.7326 for relation identification.

Conclusions

The system exploits an array of common features and achieves state-of-the-art performance. Prudent feature engineering sets the foundation of our systems. In concept extraction, we demonstrated that switching models, one of which is especially designed for telegraphic sentences, improved extraction of the treatment concept significantly. In assertion classification, a set of features derived from a rule-based classifier were proven to be effective for the classes such as conditional and possible. These classes would suffer from data scarcity in conventional machine-learning methods. In relation identification, we use two-staged architecture, the second of which applies pairwise classifiers to possible candidate classes. This architecture significantly improves performance. 相似文献

20.

Methods for Building Sense Inventories of Abbreviations in Clinical Notes

Hua Xu Peter D. Stetson Carol Friedman 《J Am Med Inform Assoc》2009,16(1):103-108

Objective

To develop methods for building corpus-specific sense inventories of abbreviations occurring in clinical documents.

Design

A corpus of internal medicine admission notes was collected and instances of each clinical abbreviation in the corpus were clustered to different sense clusters. One instance from each cluster was manually annotated to generate a final list of senses. Two clustering-based methods (Expectation Maximization—EM and Farthest First—FF) and one random sampling method for sense detection were evaluated using a set of 12 clinical abbreviations.

Measurements

The clustering-based sense detection methods were evaluated using a set of clinical abbreviations that were manually sense annotated. “Sense Completeness” and “Annotation Cost” were used to measure the performance of different methods. Clustering error rates were also reported for different clustering algorithms.

Results

A clustering-based semi-automated method was developed to build corpus-specific sense inventories for abbreviations in hospital admission notes. Evaluation demonstrated that this method could largely reduce manual annotation cost and increase the completeness of sense inventories when compared with a manual annotation method using random samples.

Conclusion

The authors developed an effective clustering-based method for building corpus-specific sense inventories for abbreviations in a clinical corpus. To the best of the authors knowledge, this is the first time clustering technologies have been used to help building sense inventories of abbreviations in clinical text. The results demonstrated that the clustering-based method performed better than the manual annotation method using random samples for the task of building sense inventories of clinical abbreviations. 相似文献