期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

DEEPEN: A negation detection system for clinical text incorporating dependency relation into NegEx

《Journal of biomedical informatics》2015

In Electronic Health Records (EHRs), much of valuable information regarding patients’ conditions is embedded in free text format. Natural language processing (NLP) techniques have been developed to extract clinical information from free text. One challenge faced in clinical NLP is that the meaning of clinical entities is heavily affected by modifiers such as negation. A negation detection algorithm, NegEx, applies a simplistic approach that has been shown to be powerful in clinical NLP. However, due to the failure to consider the contextual relationship between words within a sentence, NegEx fails to correctly capture the negation status of concepts in complex sentences. Incorrect negation assignment could cause inaccurate diagnosis of patients’ condition or contaminated study cohorts. We developed a negation algorithm called DEEPEN to decrease NegEx’s false positives by taking into account the dependency relationship between negation words and concepts within a sentence using Stanford dependency parser. The system was developed and tested using EHR data from Indiana University (IU) and it was further evaluated on Mayo Clinic dataset to assess its generalizability. The evaluation results demonstrate DEEPEN, which incorporates dependency parsing into NegEx, can reduce the number of incorrect negation assignment for patients with positive findings, and therefore improve the identification of patients with the target clinical findings in EHRs. 相似文献

2.

Development of Automated Detection of Radiology Reports Citing Adrenal Findings

Zopf JJ Langer JM Boonn WW Kim W Zafar HM 《Journal of digital imaging》2012,25(1):43-49

相似文献

3.

NegAIT: A new parser for medical text simplification using morphological,sentential and double negation

《Journal of biomedical informatics》2017

Many different text features influence text readability and content comprehension. Negation is commonly suggested as one such feature, but few general-purpose tools exist to discover negation and studies of the impact of negation on text readability are rare. In this paper, we introduce a new negation parser (NegAIT) for detecting morphological, sentential, and double negation. We evaluated the parser using a human annotated gold standard containing 500 Wikipedia sentences and achieved 95%, 89% and 67% precision with 100%, 80%, and 67% recall, respectively. We also investigate two applications of this new negation parser. First, we performed a corpus statistics study to demonstrate different negation usage in easy and difficult text. Negation usage was compared in six corpora: patient blogs (4 K sentences), Cochrane reviews (91 K sentences), PubMed abstracts (20 K sentences), clinical trial texts (48 K sentences), and English and Simple English Wikipedia articles for different medical topics (60 K and 6 K sentences). The most difficult text contained the least negation. However, when comparing negation types, difficult texts (i.e., Cochrane, PubMed, English Wikipedia and clinical trials) contained significantly (p < 0.01) more morphological negations. Second, we conducted a predictive analytics study to show the importance of negation in distinguishing between easy and difficulty text. Five binary classifiers (Naïve Bayes, SVM, decision tree, logistic regression and linear regression) were trained using only negation information. All classifiers achieved better performance than the majority baseline. The Naïve Bayes’ classifier achieved the highest accuracy at 77% (9% higher than the majority baseline). 相似文献

4.

Detecting hedge cues and their scope in biomedical text with conditional random fields

Shashank Agarwal Hong Yu 《Journal of biomedical informatics》2010,43(6):953-961

ObjectiveHedging is frequently used in both the biological literature and clinical notes to denote uncertainty or speculation. It is important for text-mining applications to detect hedge cues and their scope; otherwise, uncertain events are incorrectly identified as factual events. However, due to the complexity of language, identifying hedge cues and their scope in a sentence is not a trivial task. Our objective was to develop an algorithm that would automatically detect hedge cues and their scope in biomedical literature.MethodologyWe used conditional random fields (CRFs), a supervised machine-learning algorithm, to train models to detect hedge cue phrases and their scope in biomedical literature. The models were trained on the publicly available BioScope corpus. We evaluated the performance of the CRF models in identifying hedge cue phrases and their scope by calculating recall, precision and F1-score. We compared our models with three competitive baseline systems.ResultsOur best CRF-based model performed statistically better than the baseline systems, achieving an F1-score of 88% and 86% in detecting hedge cue phrases and their scope in biological literature and an F1-score of 93% and 90% in detecting hedge cue phrases and their scope in clinical notes.ConclusionsOur approach is robust, as it can identify hedge cues and their scope in both biological and clinical text. To benefit text-mining applications, our system is publicly available as a Java API and as an online application at http://hedgescope.askhermes.org. To our knowledge, this is the first publicly available system to detect hedge cues and their scope in biomedical literature. 相似文献

5.

Customizing clinical narratives for the electronic medical record interface using cognitive methods

Sharda P Das AK Cohen TA Patel V 《International journal of medical informatics》2006,75(5):346-368

相似文献

6.

Automatic generation of repeated patient information for tailoring clinical notes

Meng F Taira RK Bui AA Kangarloo H Churchill BM 《International journal of medical informatics》2005,74(7-8):663-673

Generating clear, readable, and accurate reports can be a time-consuming task for physicians. Clinical notes, which document patient encounters, often contain a certain set of patient information including demographics, medical history, surgical history, examination results or the current medical condition that is propagated from one clinical note to all subsequent clinical notes for the same patient. To this end, we present a system, which automatically generates this patient information for the creation of a new clinical note. We use semantic patterns and an approximate sequence matching algorithm for capturing the discourse role of sentences, which we show to be a useful feature for determining whether the sentence should be repeated. Our system is shown to perform better than a simple baseline metric using precision/recall results. We believe such a system would allow clinical notes to be more complete, timely, and accurate. 相似文献

7.

ConText: An algorithm for determining negation,experiencer, and temporal status from clinical reports

Henk Harkema John N. Dowling Tyler Thornblade Wendy W. Chapman 《Journal of biomedical informatics》2009,42(5):839-851

In this paper we describe an algorithm called ConText for determining whether clinical conditions mentioned in clinical reports are negated, hypothetical, historical, or experienced by someone other than the patient. The algorithm infers the status of a condition with regard to these properties from simple lexical clues occurring in the context of the condition. The discussion and evaluation of the algorithm presented in this paper address the questions of whether a simple surface-based approach which has been shown to work well for negation can be successfully transferred to other contextual properties of clinical conditions, and to what extent this approach is portable among different clinical report types. In our study we find that ConText obtains reasonable to good performance for negated, historical, and hypothetical conditions across all report types that contain such conditions. Conditions experienced by someone other than the patient are very rarely found in our report set. A comprehensive solution to the problem of determining whether a clinical condition is historical or recent requires knowledge above and beyond the surface clues picked up by ConText. 相似文献

8.

Evaluation of Negation and Uncertainty Detection and its Impact on Precision and Recall in Search

Andrew S. Wu Bao H. Do Jinsuh Kim Daniel L. Rubin 《Journal of digital imaging》2011,24(2):234-242

Radiology reports contain information that can be mined using a search engine for teaching, research, and quality assurance purposes. Current search engines look for exact matches to the search term, but they do not differentiate between reports in which the search term appears in a positive context (i.e., being present) from those in which the search term appears in the context of negation and uncertainty. We describe RadReportMiner, a context-aware search engine, and compare its retrieval performance with a generic search engine, Google Desktop. We created a corpus of 464 radiology reports which described at least one of five findings (appendicitis, hydronephrosis, fracture, optic neuritis, and pneumonia). Each report was classified by a radiologist as positive (finding described to be present) or negative (finding described to be absent or uncertain). The same reports were then classified by RadReportMiner and Google Desktop. RadReportMiner achieved a higher precision (81%), compared with Google Desktop (27%; p < 0.0001). RadReportMiner had a lower recall (72%) compared with Google Desktop (87%; p = 0.006). We conclude that adding negation and uncertainty identification to a word-based radiology report search engine improves the precision of search results over a search engine that does not take this information into account. Our approach may be useful to adopt into current report retrieval systems to help radiologists to more accurately search for radiology reports. 相似文献

9.

北京城区16～30个月正常幼儿语法发育状况

郝波梁卫兰王爽季成叶杨艳玲张致祥左启华 Tardif T Fletcher P 《中国心理卫生杂志》2005,19(1):25-27

目的：调查北京市城区正常幼儿语法发育水平，了解幼儿语法发育的趋势和特点，以便有针对性地对父母提供指导。方法：采用现况定量研究方法。用多阶段分层不等比例抽样方法在北京4个城区中的2个区抽取样本。用”中文早期语言与沟通发展量表”及个人背景问卷，对北京城区1056名16～30个月幼儿母亲或养护人进行面对面问卷调查。用中位数、均数描述幼儿语法发育趋势和特点，用两样本成组秩和检验比较男女童语法发育水平。结果：调查地区16～30个月龄正常幼儿会说语法表达结构中位数随月龄呈直线上升趋势。平均会说语法表达结构得分由16个月时的5分增加到24个月的67分，30个月时达91分。30个月时幼儿会说的语法表达结构已占量表表达结构总分的90％。儿童早期语法发育有较大的个体差异，月龄越小，个体差异越大。结论：女童在23个月前、男童在25个月前是语法发育的快速期和关键期。儿童早期语法发育有较大的个体差异。相似文献

10.

Assessment of commercial NLP engines for medication information extraction from dictated clinical notes

Jagannathan V Mullett CJ Arbogast JG Halbritter KA Yellapragada D Regulapati S Bandaru P 《International journal of medical informatics》2009,78(4):284-291

PURPOSE: We assessed the current state of commercial natural language processing (NLP) engines for their ability to extract medication information from textual clinical documents. METHODS: Two thousand de-identified discharge summaries and family practice notes were submitted to four commercial NLP engines with the request to extract all medication information. The four sets of returned results were combined to create a comparison standard which was validated against a manual, physician-derived gold standard created from a subset of 100 reports. Once validated, the individual vendor results for medication names, strengths, route, and frequency were compared against this automated standard with precision, recall, and F measures calculated. RESULTS: Compared with the manual, physician-derived gold standard, the automated standard was successful at accurately capturing medication names (F measure=93.2%), but performed less well with strength (85.3%) and route (80.3%), and relatively poorly with dosing frequency (48.3%). Moderate variability was seen in the strengths of the four vendors. The vendors performed better with the structured discharge summaries than with the clinic notes in an analysis comparing the two document types. CONCLUSION: Although automated extraction may serve as the foundation for a manual review process, it is not ready to automate medication lists without human intervention. 相似文献

11.

Natural language processing to extract medical problems from electronic clinical documents: performance evaluation

Meystre S Haug PJ 《Journal of biomedical informatics》2006,39(6):589-599

In this study, we evaluate the performance of a Natural Language Processing (NLP) application designed to extract medical problems from narrative text clinical documents. The documents come from a patient’s electronic medical record and medical problems are proposed for inclusion in the patient’s electronic problem list. This application has been developed to help maintain the problem list and make it more accurate, complete, and up-to-date. The NLP part of this system—analyzed in this study—uses the UMLS MetaMap Transfer (MMTx) application and a negation detection algorithm called NegEx to extract 80 different medical problems selected for their frequency of use in our institution. When using MMTx with its default data set, we measured a recall of 0.74 and a precision of 0.756. A custom data subset for MMTx was created, making it faster and significantly improving the recall to 0.896 with a non-significant reduction in precision. 相似文献

12.

Extractive text summarization system to aid data extraction from full text in systematic review development

《Journal of biomedical informatics》2016

ObjectivesExtracting data from publication reports is a standard process in systematic review (SR) development. However, the data extraction process still relies too much on manual effort which is slow, costly, and subject to human error. In this study, we developed a text summarization system aimed at enhancing productivity and reducing errors in the traditional data extraction process.MethodsWe developed a computer system that used machine learning and natural language processing approaches to automatically generate summaries of full-text scientific publications. The summaries at the sentence and fragment levels were evaluated in finding common clinical SR data elements such as sample size, group size, and PICO values. We compared the computer-generated summaries with human written summaries (title and abstract) in terms of the presence of necessary information for the data extraction as presented in the Cochrane review’s study characteristics tables.ResultsAt the sentence level, the computer-generated summaries covered more information than humans do for systematic reviews (recall 91.2% vs. 83.8%, p < 0.001). They also had a better density of relevant sentences (precision 59% vs. 39%, p < 0.001). At the fragment level, the ensemble approach combining rule-based, concept mapping, and dictionary-based methods performed better than individual methods alone, achieving an 84.7% F-measure.ConclusionComputer-generated summaries are potential alternative information sources for data extraction in systematic review development. Machine learning and natural language processing are promising approaches to the development of such an extractive summarization system. 相似文献

13.

Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers

Andrew J Vickers Angel M Cronin Elena B Elkin Mithat Gonen 《BMC medical informatics and decision making》2008,8(1):1-17

Background

Text-based patient medical records are a vital resource in medical research. In order to preserve patient confidentiality, however, the U.S. Health Insurance Portability and Accountability Act (HIPAA) requires that protected health information (PHI) be removed from medical records before they can be disseminated. Manual de-identification of large medical record databases is prohibitively expensive, time-consuming and prone to error, necessitating automatic methods for large-scale, automated de-identification.

Methods

We describe an automated Perl-based de-identification software package that is generally usable on most free-text medical records, e.g., nursing notes, discharge summaries, X-ray reports, etc. The software uses lexical look-up tables, regular expressions, and simple heuristics to locate both HIPAA PHI, and an extended PHI set that includes doctors' names and years of dates. To develop the de-identification approach, we assembled a gold standard corpus of re-identified nursing notes with real PHI replaced by realistic surrogate information. This corpus consists of 2,434 nursing notes containing 334,000 words and a total of 1,779 instances of PHI taken from 163 randomly selected patient records. This gold standard corpus was used to refine the algorithm and measure its sensitivity. To test the algorithm on data not used in its development, we constructed a second test corpus of 1,836 nursing notes containing 296,400 words. The algorithm's false negative rate was evaluated using this test corpus.

Results

Performance evaluation of the de-identification software on the development corpus yielded an overall recall of 0.967, precision value of 0.749, and fallout value of approximately 0.002. On the test corpus, a total of 90 instances of false negatives were found, or 27 per 100,000 word count, with an estimated recall of 0.943. Only one full date and one age over 89 were missed. No patient names were missed in either corpus.

Conclusion

We have developed a pattern-matching de-identification system based on dictionary look-ups, regular expressions, and heuristics. Evaluation based on two different sets of nursing notes collected from a U.S. hospital suggests that, in terms of recall, the software out-performs a single human de-identifier (0.81) and performs at least as well as a consensus of two human de-identifiers (0.94). The system is currently tuned to de-identify PHI in nursing notes and discharge summaries but is sufficiently generalized and can be customized to handle text files of any format. Although the accuracy of the algorithm is high, it is probably insufficient to be used to publicly disseminate medical data. The open-source de-identification software and the gold standard re-identified corpus of medical records have therefore been made available to researchers via the PhysioNet website to encourage improvements in the algorithm. 相似文献

14.

Yifei He Miriam Steines Gebhard Sammer Arne Nagels Tilo Kircher Benjamin Straube 《Brain topography》2018,31(5):838-847

Language and action have been thought of as closely related. Comprehending words or phrases that are related to actions commonly activates motor and premotor areas, and this comprehension process interacts with action preparation and/or execution. However, it remains unclear whether comprehending action-related language interacts with action observation. In the current study, we examined whether the observation of tool-use gesture subjects to interaction with language. In an electroencephalography (EEG) study (n?=?20), participants were presented with video clips of an actor performing tool-use (TU, e.g., hammering with a fist) and emblematic (EM, e.g., the thumb up sign for ‘good job’) gestures accompanied by either comprehensible German (G) or incomprehensible Russian sentences (R). Participants performed a semantic judging task, evaluating whether the co-speech gestures were object- or socially-related. Behavioral results from the semantic task showed faster response for the TU versus EM gestures only in the German condition. For EEG, we found that TU elicited beta power decrease (~?20 Hz) when compared to EM gestures, however this effect was reduced when gestures were accompanied by German instead of Russian sentences. We concluded that the processing of action-related sentences might facilitate gesture observation, in the sense that motor simulation required for TU gestures, as indexed by reduced beta power, was modulated when accompanied by comprehensible German speech. Our results corroborate the functional role of the beta oscillations during perception of hand gestures, and provide novel evidence concerning language–motor interaction. 相似文献

15.

Automatic analysis of medical dialogue in the home hemodialysis domain: structure induction and summarization

Lacson RC Barzilay R Long WJ 《Journal of biomedical informatics》2006,39(5):541-555

Spoken medical dialogue is a valuable source of information for patients and caregivers. This work presents a first step towards automatic analysis and summarization of spoken medical dialogue. We first abstract a dialogue into a sequence of semantic categories using linguistic and contextual features integrated in a supervised machine-learning framework. Our model has a classification accuracy of 73%, compared to 33% achieved by a majority baseline (p<0.01). We then describe and implement a summarizer that utilizes this automatically induced structure. Our evaluation results indicate that automatically generated summaries exhibit high resemblance to summaries written by humans. In addition, task-based evaluation shows that physicians can reasonably answer questions related to patient care by looking at the automatically generated summaries alone, in contrast to the physicians' performance when they were given summaries from a na?ve summarizer (p<0.05). This work demonstrates the feasibility of automatically structuring and summarizing spoken medical dialogue. 相似文献

16.

Accuracy of frozen section diagnosis in breast cancer detection. A review of 4436 biopsies and comparison with cytodiagnosis 总被引：3，自引：0，他引：3

L Fessia B Ghiringhello R Arisio G Botta V Aimone 《Pathology, research and practice》1984,179(1):61-66

Frozen section diagnosis (FSD) given in 4436 consecutive breast biopsies performed in 5 years in a single pathology laboratory were checked against the final pathological report. In 4284 cases (96.57%) there was no difference between the FSD and the definitive diagnosis. There were 74 (1.66%) false negative reports and no false positive diagnoses. The diagnosis was deferred to paraffin sections in 78 cases (1.75% of biopsies). The predictive value for positive results was 100% and for negative results 97.5%; the specificity was 100%, the sensitivity 94.6% and the accuracy 98.3%. Minimal breast cancer, in situ (CIS) especially, was the main source of false negative reports. In non minimal invasive cancers (NMIC) FSD was correct in 99.42%. In minimal invasive cancers (MIC) FSD was correct in 80.21%, false negatives and deferred diagnosis increased to 8.79% and 10.98%. In CIS false negatives increased to 76.82% and deferred diagnoses to 12.19%. The sensitivity of fine needle aspiration, performed before biopsy in a portion of the patients, was lower than FSD in NMIC (71.39% versus 99.21%) and in MIC (41.66% versus 80.55%), identical to FSD in CIS (7.40% versus 7.40%). The value of cytodiagnosis in addressing surgery is discussed. 相似文献

17.

Evaluation of electronic discharge summaries: a comparison of documentation in electronic and handwritten discharge summaries

Callen JL Alderton M McIntosh J 《International journal of medical informatics》2008,77(9):613-620

BACKGROUND: Hospital discharge summaries have traditionally been paper-based (handwritten or dictated), and deficiencies have often been reported. On the increase is the utilisation of electronic summaries, which are considered of higher quality than paper-based summaries. However, comparisons between electronic and paper-based summaries regarding documentation deficiencies have rarely been made and there have been none in recent years. OBJECTIVES: (1) To study the hospital discharge summaries, which were either handwritten or electronic, of a population of inpatients, with regard to documentation of information required for ongoing care; and (2) to compare the electronic with the handwritten summaries concerning documentation of this information. METHODS: The discharge summaries of 245 inpatients were examined for documentation of the items: discharge date; additional diagnoses; summary of the patient's progress in hospital; investigations; discharge medications; and follow-up (instructions to the patient's general practitioner). One hundred and fifty-one (62%) discharge summaries were electronically created and 94 (38%) were handwritten. Odds ratios (ORs) with their confidence intervals (CI) were estimated to show strength of association between the electronic summary and documentation of individual study items. RESULTS: Across all items studied, the electronic summaries contained a higher number of errors and/or omissions than the handwritten ones (OR 1.74, 95% CI 1.26-2.39, p<0.05). Electronic summaries more commonly documented a summary of the patient's progress in hospital (OR 18.3, 95% CI 3.33-100, p<0.05) and less commonly recorded date of discharge and additional diagnoses (respective ORs 0.17 (95% CI 0.09-0.31, p<0.05) and 0.33 (95% CI 0.15-0.89, p<0.05). CONCLUSION: It is not necessarily the case that electronic discharge summaries are of higher quality than handwritten ones, but free text items such as summary of the patient's progress may less likely be omitted in electronic summaries. It is unknown what factors contributed to incompleteness in creating the electronic discharge summaries investigated in this study. Possible causes for deficiencies include: insufficient training; insufficient education of, and thus realisation by, doctors regarding the importance of accurate, complete discharge summaries; inadequate computer literacy; inadequate user interaction design, and insufficient integration into routine work processes. Research into these factors is recommended. This study suggests that not enough care is taken by doctors when creating discharge summaries, and that this is independent of the type of method used. The importance of the discharge summary as a chief means of transferring patient information from the hospital to the primary care provider needs to be strongly emphasised. 相似文献

18.

Automatic Classification of Ultrasound Screening Examinations of the Abdominal Aorta

Craig Morioka Frank Meng Ricky Taira James Sayre Peter Zimmerman David Ishimitsu Jimmy Huang Luyao Shen Suzie El-Saden 《Journal of digital imaging》2016,29(6):742-748

Our work facilitates the identification of veterans who may be at risk for abdominal aortic aneurysms (AAA) based on the 2007 mandate to screen all veteran patients that meet the screening criteria. The main research objective is to automatically index three clinical conditions: pertinent negative AAA, pertinent positive AAA, and visually unacceptable image exams. We developed and evaluated a ConText-based algorithm with the GATE (General Architecture for Text Engineering) development system to automatically classify 1402 ultrasound radiology reports for AAA screening. Using the results from JAPE (Java Annotation Pattern Engine) transducer rules, we developed a feature vector to classify the radiology reports with a decision table classifier. We found that ConText performed optimally on precision and recall for pertinent negative (0.99 (0.98–0.99), 0.99 (0.99–1.00)) and pertinent positive AAA detection (0.98 (0.95–1.00), 0.97 (0.92–1.00)), and respectably for determination of non-diagnostic image studies (0.85 (0.77–0.91), 0.96 (0.91–0.99)). In addition, our algorithm can determine the AAA size measurements for further characterization of abnormality. We developed and evaluated a regular expression based algorithm using GATE for determining the three contextual conditions: pertinent negative, pertinent positive, and non-diagnostic from radiology reports obtained for evaluating the presence or absence of abdominal aortic aneurysm. ConText performed very well at identifying the contextual features. Our study also discovered contextual trigger terms to detect sub-standard ultrasound image quality. Limitations of performance included unknown dictionary terms, complex sentences, and vague findings that were difficult to classify and properly code. 相似文献

19.

Quality of discharge summaries prepared by first year internal medicine residents

KJ Legault J Ostro Z Khalid P Wasi JJ You 《BMC medical education》2012,12(1):77

ABSTRACT: BACKGROUND: Patients are particularly susceptible to medical error during transitions from inpatient to outpatient care. We evaluated discharge summaries produced by incoming postgraduate year 1 (PGY-1) internal medicine residents for their completeness, accuracy, and relevance to family physicians. METHODS: Consecutive discharge summaries prepared by PGY-1 residents for patients discharged from internal medicine wards were retrospectively evaluated by two independent reviewers for presence and accuracy of essential domains described by the Joint Commission for Hospital Accreditation. Family physicians rated the relevance of a separate sample of discharge summaries on domains that family physicians deemed important in previous studies. RESULTS: Ninety discharge summaries were assessed for completeness and accuracy. Most items were completely reported with a given item missing in 5 % of summaries or fewer, with the exception of the reason for medication changes, which was missing in 15.9 % of summaries. Discharge medication lists, medication changes, and the reason for medication changes---when present---were inaccurate in 35.7 %, 29.5 %, and 37.7 % of summaries, respectively. Twenty-one family physicians reviewed 68 discharge summaries. Communication of follow-up plans for further investigations was the most frequently identified area for improvement with 27.7 % of summaries rated as insufficient. CONCLUSIONS: This study found that medication details were frequently omitted or inaccurate, and that family physicians identified lack of clarity about follow-up plans regarding further investigations and visits to other consultants as the areas requiring the most improvement. Our findings will aid in the development of educational interventions for residents. 相似文献

20.

Applying semantic-based probabilistic context-free grammar to medical language processing--a preliminary study on parsing medication sentences

Xu H AbdelRahman S Lu Y Denny JC Doan S 《Journal of biomedical informatics》2011,44(6):1068-1075

Semantic-based sublanguage grammars have been shown to be an efficient method for medical language processing. However, given the complexity of the medical domain, parsers using such grammars inevitably encounter ambiguous sentences, which could be interpreted by different groups of production rules and consequently result in two or more parse trees. One possible solution, which has not been extensively explored previously, is to augment productions in medical sublanguage grammars with probabilities to resolve the ambiguity. In this study, we associated probabilities with production rules in a semantic-based grammar for medication findings and evaluated its performance on reducing parsing ambiguity. Using the existing data set from 2009 i2b2 NLP (Natural Language Processing) challenge for medication extraction, we developed a semantic-based CFG (Context Free Grammar) for parsing medication sentences and manually created a Treebank of 4564 medication sentences from discharge summaries. Using the Treebank, we derived a semantic-based PCFG (Probabilistic Context Free Grammar) for parsing medication sentences. Our evaluation using a 10-fold cross validation showed that the PCFG parser dramatically improved parsing performance when compared to the CFG parser. 相似文献