首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 864 毫秒
1.

Objective

To determine how well statistical text mining (STM) models can identify falls within clinical text associated with an ambulatory encounter.

Materials and Methods

2241 patients were selected with a fall-related ICD-9-CM E-code or matched injury diagnosis code while being treated as an outpatient at one of four sites within the Veterans Health Administration. All clinical documents within a 48-h window of the recorded E-code or injury diagnosis code for each patient were obtained (n=26 010; 611 distinct document titles) and annotated for falls. Logistic regression, support vector machine, and cost-sensitive support vector machine (SVM-cost) models were trained on a stratified sample of 70% of documents from one location (dataset Atrain) and then applied to the remaining unseen documents (datasets Atest–D).

Results

All three STM models obtained area under the receiver operating characteristic curve (AUC) scores above 0.950 on the four test datasets (Atest–D). The SVM-cost model obtained the highest AUC scores, ranging from 0.953 to 0.978. The SVM-cost model also achieved F-measure values ranging from 0.745 to 0.853, sensitivity from 0.890 to 0.931, and specificity from 0.877 to 0.944.

Discussion

The STM models performed well across a large heterogeneous collection of document titles. In addition, the models also generalized across other sites, including a traditionally bilingual site that had distinctly different grammatical patterns.

Conclusions

The results of this study suggest STM-based models have the potential to improve surveillance of falls. Furthermore, the encouraging evidence shown here that STM is a robust technique for mining clinical documents bodes well for other surveillance-related topics.  相似文献   

2.

Objectives

Natural language processing (NLP) applications typically use regular expressions that have been developed manually by human experts. Our goal is to automate both the creation and utilization of regular expressions in text classification.

Methods

We designed a novel regular expression discovery (RED) algorithm and implemented two text classifiers based on RED. The RED+ALIGN classifier combines RED with an alignment algorithm, and RED+SVM combines RED with a support vector machine (SVM) classifier. Two clinical datasets were used for testing and evaluation: the SMOKE dataset, containing 1091 text snippets describing smoking status; and the PAIN dataset, containing 702 snippets describing pain status. We performed 10-fold cross-validation to calculate accuracy, precision, recall, and F-measure metrics. In the evaluation, an SVM classifier was trained as the control.

Results

The two RED classifiers achieved 80.9–83.0% in overall accuracy on the two datasets, which is 1.3–3% higher than SVM''s accuracy (p<0.001). Similarly, small but consistent improvements have been observed in precision, recall, and F-measure when RED classifiers are compared with SVM alone. More significantly, RED+ALIGN correctly classified many instances that were misclassified by the SVM classifier (8.1–10.3% of the total instances and 43.8–53.0% of SVM''s misclassifications).

Conclusions

Machine-generated regular expressions can be effectively used in clinical text classification. The regular expression-based classifier can be combined with other classifiers, like SVM, to improve classification performance.  相似文献   

3.

Objective

Predicting patient outcomes from genome-wide measurements holds significant promise for improving clinical care. The large number of measurements (eg, single nucleotide polymorphisms (SNPs)), however, makes this task computationally challenging. This paper evaluates the performance of an algorithm that predicts patient outcomes from genome-wide data by efficiently model averaging over an exponential number of naive Bayes (NB) models.

Design

This model-averaged naive Bayes (MANB) method was applied to predict late onset Alzheimer''s disease in 1411 individuals who each had 312 318 SNP measurements available as genome-wide predictive features. Its performance was compared to that of a naive Bayes algorithm without feature selection (NB) and with feature selection (FSNB).

Measurement

Performance of each algorithm was measured in terms of area under the ROC curve (AUC), calibration, and run time.

Results

The training time of MANB (16.1 s) was fast like NB (15.6 s), while FSNB (1684.2 s) was considerably slower. Each of the three algorithms required less than 0.1 s to predict the outcome of a test case. MANB had an AUC of 0.72, which is significantly better than the AUC of 0.59 by NB (p<0.00001), but not significantly different from the AUC of 0.71 by FSNB. MANB was better calibrated than NB, and FSNB was even better in calibration. A limitation was that only one dataset and two comparison algorithms were included in this study.

Conclusion

MANB performed comparatively well in predicting a clinical outcome from a high-dimensional genome-wide dataset. These results provide support for including MANB in the methods used to predict outcomes from large, genome-wide datasets.  相似文献   

4.

Introduction

Clinical databases require accurate entity resolution (ER). One approach is to use algorithms that assign questionable cases to manual review. Few studies have compared the performance of common algorithms for such a task. Furthermore, previous work has been limited by a lack of objective methods for setting algorithm parameters. We compared the performance of common ER algorithms: using algorithmic optimization, rather than manual parameter tuning, and on two-threshold classification (match/manual review/non-match) as well as single-threshold (match/non-match).

Methods

We manually reviewed 20 000 randomly selected, potential duplicate record-pairs to identify matches (10 000 training set, 10 000 test set). We evaluated the probabilistic expectation maximization, simple deterministic and fuzzy inference engine (FIE) algorithms. We used particle swarm to optimize algorithm parameters for a single and for two thresholds. We ran 10 iterations of optimization using the training set and report averaged performance against the test set.

Results

The overall estimated duplicate rate was 6%. FIE and simple deterministic algorithms allowed a lower manual review set compared to the probabilistic method (FIE 1.9%, simple deterministic 2.5%, probabilistic 3.6%; p<0.001). For a single threshold, the simple deterministic algorithm performed better than the probabilistic method (positive predictive value 0.956 vs 0.887, sensitivity 0.985 vs 0.887, p<0.001). ER with FIE classifies 98.1% of record-pairs correctly (1/10 000 error rate), assigning the remainder to manual review.

Conclusions

Optimized deterministic algorithms outperform the probabilistic method. There is a strong case for considering optimized deterministic methods for ER.  相似文献   

5.
6.

Objective

To assess intensive care unit (ICU) nurses'' acceptance of electronic health records (EHR) technology and examine the relationship between EHR design, implementation factors, and nurse acceptance.

Design

The authors analyzed data from two cross-sectional survey questionnaires distributed to nurses working in four ICUs at a northeastern US regional medical center, 3 months and 12 months after EHR implementation.

Measurements

Survey items were drawn from established instruments used to measure EHR acceptance and usability, and the usefulness of three EHR functionalities, specifically computerized provider order entry (CPOE), the electronic medication administration record (eMAR), and a nursing documentation flowsheet.

Results

On average, ICU nurses were more accepting of the EHR at 12 months as compared to 3 months. They also perceived the EHR as being more usable and both CPOE and eMAR as being more useful. Multivariate hierarchical modeling indicated that EHR usability and CPOE usefulness predicted EHR acceptance at both 3 and 12 months. At 3 months postimplementation, eMAR usefulness predicted EHR acceptance, but its effect disappeared at 12 months. Nursing flowsheet usefulness predicted EHR acceptance but only at 12 months.

Conclusion

As the push toward implementation of EHR technology continues, more hospitals will face issues related to acceptance of EHR technology by staff caring for critically ill patients. This research suggests that factors related to technology design have strong effects on acceptance, even 1 year following the EHR implementation.  相似文献   

7.

Objective

To examine the feasibility of using statistical text classification to automatically identify health information technology (HIT) incidents in the USA Food and Drug Administration (FDA) Manufacturer and User Facility Device Experience (MAUDE) database.

Design

We used a subset of 570 272 incidents including 1534 HIT incidents reported to MAUDE between 1 January 2008 and 1 July 2010. Text classifiers using regularized logistic regression were evaluated with both ‘balanced’ (50% HIT) and ‘stratified’ (0.297% HIT) datasets for training, validation, and testing. Dataset preparation, feature extraction, feature selection, cross-validation, classification, performance evaluation, and error analysis were performed iteratively to further improve the classifiers. Feature-selection techniques such as removing short words and stop words, stemming, lemmatization, and principal component analysis were examined.

Measurements

κ statistic, F1 score, precision and recall.

Results

Classification performance was similar on both the stratified (0.954 F1 score) and balanced (0.995 F1 score) datasets. Stemming was the most effective technique, reducing the feature set size to 79% while maintaining comparable performance. Training with balanced datasets improved recall (0.989) but reduced precision (0.165).

Conclusions

Statistical text classification appears to be a feasible method for identifying HIT reports within large databases of incidents. Automated identification should enable more HIT problems to be detected, analyzed, and addressed in a timely manner. Semi-supervised learning may be necessary when applying machine learning to big data analysis of patient safety incidents and requires further investigation.  相似文献   

8.

Objective

To determine whether statistical and machine-learning methods, when applied to electronic health record (EHR) access data, could help identify suspicious (ie, potentially inappropriate) access to EHRs.

Methods

From EHR access logs and other organizational data collected over a 2-month period, the authors extracted 26 features likely to be useful in detecting suspicious accesses. Selected events were marked as either suspicious or appropriate by privacy officers, and served as the gold standard set for model evaluation. The authors trained logistic regression (LR) and support vector machine (SVM) models on 10-fold cross-validation sets of 1291 labeled events. The authors evaluated the sensitivity of final models on an external set of 58 events that were identified as truly inappropriate and investigated independently from this study using standard operating procedures.

Results

The area under the receiver operating characteristic curve of the models on the whole data set of 1291 events was 0.91 for LR, and 0.95 for SVM. The sensitivity of the baseline model on this set was 0.8. When the final models were evaluated on the set of 58 investigated events, all of which were determined as truly inappropriate, the sensitivity was 0 for the baseline method, 0.76 for LR, and 0.79 for SVM.

Limitations

The LR and SVM models may not generalize because of interinstitutional differences in organizational structures, applications, and workflows. Nevertheless, our approach for constructing the models using statistical and machine-learning techniques can be generalized. An important limitation is the relatively small sample used for the training set due to the effort required for its construction.

Conclusion

The results suggest that statistical and machine-learning methods can play an important role in helping privacy officers detect suspicious accesses to EHRs.  相似文献   

9.

Objective

An accurate computable representation of food and drug allergy is essential for safe healthcare. Our goal was to develop a high-performance, easily maintained algorithm to identify medication and food allergies and sensitivities from unstructured allergy entries in electronic health record (EHR) systems.

Materials and methods

An algorithm was developed in Transact-SQL to identify ingredients to which patients had allergies in a perioperative information management system. The algorithm used RxNorm and natural language processing techniques developed on a training set of 24 599 entries from 9445 records. Accuracy, specificity, precision, recall, and F-measure were determined for the training dataset and repeated for the testing dataset (24 857 entries from 9430 records).

Results

Accuracy, precision, recall, and F-measure for medication allergy matches were all above 98% in the training dataset and above 97% in the testing dataset for all allergy entries. Corresponding values for food allergy matches were above 97% and above 93%, respectively. Specificities of the algorithm were 90.3% and 85.0% for drug matches and 100% and 88.9% for food matches in the training and testing datasets, respectively.

Discussion

The algorithm had high performance for identification of medication and food allergies. Maintenance is practical, as updates are managed through upload of new RxNorm versions and additions to companion database tables. However, direct entry of codified allergy information by providers (through autocompleters or drop lists) is still preferred to post-hoc encoding of the data. Data tables used in the algorithm are available for download.

Conclusions

A high performing, easily maintained algorithm can successfully identify medication and food allergies from free text entries in EHR systems.  相似文献   

10.

Objective

Despite at least 40 years of promising empirical performance, very few clinical natural language processing (NLP) or information extraction systems currently contribute to medical science or care. The authors address this gap by reducing the need for custom software and rules development with a graphical user interface-driven, highly generalizable approach to concept-level retrieval.

Materials and methods

A ‘learn by example’ approach combines features derived from open-source NLP pipelines with open-source machine learning classifiers to automatically and iteratively evaluate top-performing configurations. The Fourth i2b2/VA Shared Task Challenge''s concept extraction task provided the data sets and metrics used to evaluate performance.

Results

Top F-measure scores for each of the tasks were medical problems (0.83), treatments (0.82), and tests (0.83). Recall lagged precision in all experiments. Precision was near or above 0.90 in all tasks.

Discussion

With no customization for the tasks and less than 5 min of end-user time to configure and launch each experiment, the average F-measure was 0.83, one point behind the mean F-measure of the 22 entrants in the competition. Strong precision scores indicate the potential of applying the approach for more specific clinical information extraction tasks. There was not one best configuration, supporting an iterative approach to model creation.

Conclusion

Acceptable levels of performance can be achieved using fully automated and generalizable approaches to concept-level information extraction. The described implementation and related documentation is available for download.  相似文献   

11.

Objective

To quantify and compare the time doctors and nurses spent on direct patient care, medication-related tasks, and interactions before and after electronic medication management system (eMMS) introduction.

Methods

Controlled pre–post, time and motion study of 129 doctors and nurses for 633.2 h on four wards in a 400-bed hospital in Sydney, Australia. We measured changes in proportions of time on tasks and interactions by period, intervention/control group, and profession.

Results

eMMS was associated with no significant change in proportions of time spent on direct care or medication-related tasks relative to control wards. In the post-period control ward, doctors spent 19.7% (2 h/10 h shift) of their time on direct care and 7.4% (44.4 min/10 h shift) on medication tasks, compared to intervention ward doctors (25.7% (2.6 h/shift; p=0.08) and 8.5% (51 min/shift; p=0.40), respectively). Control ward nurses in the post-period spent 22.1% (1.9 h/8.5 h shift) of their time on direct care and 23.7% on medication tasks compared to intervention ward nurses (26.1% (2.2 h/shift; p=0.23) and 22.6% (1.9 h/shift; p=0.28), respectively). We found intervention ward doctors spent less time alone (p=0.0003) and more time with other doctors (p=0.003) and patients (p=0.009). Nurses on the intervention wards spent less time with doctors following eMMS introduction (p=0.0001).

Conclusions

eMMS introduction did not result in redistribution of time away from direct care or towards medication tasks. Work patterns observed on these intervention wards were associated with previously reported significant reductions in prescribing error rates relative to the control wards.  相似文献   

12.

Objective

To evaluate the safety of shilajit by 91 days repeated administration in different dose levels in rats.

Methods

In this study the albino rats were divided into four groups. Group I received vehicle and group II, III and IV received 500, 2 500 and 5 000 mg/kg of shilajit, respectively. Finally animals were sacrificed and subjected to histopathology and iron was estimated by flame atomic absorption spectroscopy and graphite furnace.

Results

The result showed that there were no significant changes in iron level of treated groups when compared with control except liver (5 000 mg/kg) and histological slides of all organs revealed normal except negligible changes in liver and intestine with the highest dose of shilajit. The weight of all organs was normal when compared with control.

Conclusions

The result suggests that black shilajit, an Ayurvedic formulation, is safe for long term use as a dietary supplement for a number of disorders like iron deficiency anaemia.  相似文献   

13.

Objective

A comprehensive and machine-understandable cancer drug–side effect (drug–SE) relationship knowledge base is important for in silico cancer drug target discovery, drug repurposing, and toxicity predication, and for personalized risk–benefit decisions by cancer patients. While US Food and Drug Administration (FDA) drug labels capture well-known cancer drug SE information, much cancer drug SE knowledge remains buried the published biomedical literature. We present a relationship extraction approach to extract cancer drug–SE pairs from the literature.

Data and methods

We used 21 354 075 MEDLINE records as the text corpus. We extracted drug–SE co-occurrence pairs using a cancer drug lexicon and a clean SE lexicon that we created. We then developed two filtering approaches to remove drug–disease treatment pairs and subsequently a ranking scheme to further prioritize filtered pairs. Finally, we analyzed relationships among SEs, gene targets, and indications.

Results

We extracted 56 602 cancer drug–SE pairs. The filtering algorithms improved the precision of extracted pairs from 0.252 at baseline to 0.426, representing a 69% improvement in precision with no decrease in recall. The ranking algorithm further prioritized filtered pairs and achieved a precision of 0.778 for top-ranked pairs. We showed that cancer drugs that share SEs tend to have overlapping gene targets and overlapping indications.

Conclusions

The relationship extraction approach is effective in extracting many cancer drug–SE pairs from the literature. This unique knowledge base, when combined with existing cancer drug SE knowledge, can facilitate drug target discovery, drug repurposing, and toxicity prediction.  相似文献   

14.

Objective

The goal of this work was to evaluate machine learning methods, binary classification and sequence labeling, for medication–attribute linkage detection in two clinical corpora.

Data and methods

We double annotated 3000 clinical trial announcements (CTA) and 1655 clinical notes (CN) for medication named entities and their attributes. A binary support vector machine (SVM) classification method with parsimonious feature sets, and a conditional random fields (CRF)-based multi-layered sequence labeling (MLSL) model were proposed to identify the linkages between the entities and their corresponding attributes. We evaluated the system''s performance against the human-generated gold standard.

Results

The experiments showed that the two machine learning approaches performed statistically significantly better than the baseline rule-based approach. The binary SVM classification achieved 0.94 F-measure with individual tokens as features. The SVM model trained on a parsimonious feature set achieved 0.81 F-measure for CN and 0.87 for CTA. The CRF MLSL method achieved 0.80 F-measure on both corpora.

Discussion and conclusions

We compared the novel MLSL method with a binary classification and a rule-based method. The MLSL method performed statistically significantly better than the rule-based method. However, the SVM-based binary classification method was statistically significantly better than the MLSL method for both the CTA and CN corpora. Using parsimonious feature sets both the SVM-based binary classification and CRF-based MLSL methods achieved high performance in detecting medication name and attribute linkages in CTA and CN.  相似文献   

15.

Background

Prognostic studies of breast cancer survivability have been aided by machine learning algorithms, which can predict the survival of a particular patient based on historical patient data. However, it is not easy to collect labeled patient records. It takes at least 5 years to label a patient record as ‘survived’ or ‘not survived’. Unguided trials of numerous types of oncology therapies are also very expensive. Confidentiality agreements with doctors and patients are also required to obtain labeled patient records.

Proposed method

These difficulties in the collection of labeled patient data have led researchers to consider semi-supervised learning (SSL), a recent machine learning algorithm, because it is also capable of utilizing unlabeled patient data, which is relatively easier to collect. Therefore, it is regarded as an algorithm that could circumvent the known difficulties. However, the fact is yet valid even on SSL that more labeled data lead to better prediction. To compensate for the lack of labeled patient data, we may consider the concept of tagging virtual labels to unlabeled patient data, that is, ‘pseudo-labels,’ and treating them as if they were labeled.

Results

Our proposed algorithm, ‘SSL Co-training’, implements this concept based on SSL. SSL Co-training was tested using the surveillance, epidemiology, and end results database for breast cancer and it delivered a mean accuracy of 76% and a mean area under the curve of 0.81.  相似文献   

16.

Objective

To improve identification of pertussis cases by developing a decision model that incorporates recent, local, population-level disease incidence.

Design

Retrospective cohort analysis of 443 infants tested for pertussis (2003–7).

Measurements

Three models (based on clinical data only, local disease incidence only, and a combination of clinical data and local disease incidence) to predict pertussis positivity were created with demographic, historical, physical exam, and state-wide pertussis data. Models were compared using sensitivity, specificity, area under the receiver-operating characteristics (ROC) curve (AUC), and related metrics.

Results

The model using only clinical data included cyanosis, cough for 1 week, and absence of fever, and was 89% sensitive (95% CI 79 to 99), 27% specific (95% CI 22 to 32) with an area under the ROC curve of 0.80. The model using only local incidence data performed best when the proportion positive of pertussis cultures in the region exceeded 10% in the 8–14 days prior to the infant''s associated visit, achieving 13% sensitivity, 53% specificity, and AUC 0.65. The combined model, built with patient-derived variables and local incidence data, included cyanosis, cough for 1 week, and the variable indicating that the proportion positive of pertussis cultures in the region exceeded 10% 8–14 days prior to the infant''s associated visit. This model was 100% sensitive (p<0.04, 95% CI 92 to 100), 38% specific (p<0.001, 95% CI 33 to 43), with AUC 0.82.

Conclusions

Incorporating recent, local population-level disease incidence improved the ability of a decision model to correctly identify infants with pertussis. Our findings support fostering bidirectional exchange between public health and clinical practice, and validate a method for integrating large-scale public health datasets with rich clinical data to improve decision-making and public health.  相似文献   

17.

Objective

To formulate gentamicin liposphere by solvent-melting method using lipids and polyethylene glycol 4 000 (PEG-4 000) for oral administration.

Methods

Gentamicin lipospheres were prepared by melt-emulsification using 30% w/w Phospholipon® 90H in Beeswax as the lipid matrix containing PEG-4 000. These lipospheres were characterized by evaluating on encapsulation efficiency, loading capacity, change in pH and the release profile. Antimicrobial activities were evaluated against Escherichia coli, Pseudomonas aeruginosa, Salmonella paratyphii and Staphylococcus aureus using the agar diffusion method.

Results

Photomicrographs revealed spherical particles within a micrometer range with minimal growth after 1 month. The release of gentamicin in vitro varied widely with the PEG-4 000 contents. Moreover, significant (P>0.05) amount of gentamicin was released in vivo from the formulation. The encapsulation and loading capacity were all high, indicating the ability of the lipids to take up the drug. The antimicrobial activities were very high especially against Pseudomonas compare to other test organisms. This strongly suggested that the formulation retain its bioactive characteristics.

Conclusions

This study strongly suggest that the issue of gentamicin stability and poor absorption in oral formulation could be adequately addressed by tactical engineering of lipid drug delivery systems such as lipospheres.  相似文献   

18.

Methods

Clinical guideline adherence for diagnostic imaging (DI) and acceptance of electronic decision support in a rural community family practice clinic was assessed over 36 weeks. Physicians wrote 904 DI orders, 58% of which were addressed by the Canadian Association of Radiologists guidelines.

Results

Of those orders with guidelines, 76% were ordered correctly; 24% were inappropriate or unnecessary resulting in a prompt from clinical decision support. Physicians followed suggestions from decision support to improve their DI order on 25% of the initially inappropriate orders. The use of decision support was not mandatory, and there were significant variations in use rate. Initially, 40% reported decision support disruptive in their work flow, which dropped to 16% as physicians gained experience with the software.

Conclusions

Physicians supported the concept of clinical decision support but were reluctant to change clinical habits to incorporate decision support into routine work flow.  相似文献   

19.

Objective

In the 6 years since the National Library of Medicine began monthly releases of RxNorm, RxNorm has become a central resource for communicating about clinical drugs and supporting interoperation between drug vocabularies.

Materials and methods

Built on the idea of a normalized name for a medication at a given level of abstraction, RxNorm provides a set of names and relationships based on 11 different external source vocabularies. The standard model enables decision support to take place for a variety of uses at the appropriate level of abstraction. With the incorporation of National Drug File Reference Terminology (NDF-RT) from the Veterans Administration, even more sophisticated decision support has become possible.

Discussion

While related products such as RxTerms, RxNav, MyMedicationList, and MyRxPad have been recognized as helpful for various uses, tasks such as identifying exactly what is and is not on the market remain a challenge.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号