首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到14条相似文献,搜索用时 0 毫秒
1.
Social media offer insights of patients’ medical problems such as drug side effects and treatment failures. Patient reports of adverse drug events from social media have great potential to improve current practice of pharmacovigilance. However, extracting patient adverse drug event reports from social media continues to be an important challenge for health informatics research. In this study, we develop a research framework with advanced natural language processing techniques for integrated and high-performance patient reported adverse drug event extraction. The framework consists of medical entity extraction for recognizing patient discussions of drug and events, adverse drug event extraction with shortest dependency path kernel based statistical learning method and semantic filtering with information from medical knowledge bases, and report source classification to tease out noise. To evaluate the proposed framework, a series of experiments were conducted on a test bed encompassing about postings from major diabetes and heart disease forums in the United States. The results reveal that each component of the framework significantly contributes to its overall effectiveness. Our framework significantly outperforms prior work.  相似文献   

2.
Health insurers maintain large databases containing information on medical services utilized by claimants, often spanning several healthcare services and providers. Proper use of these databases could facilitate better clinical and administrative decisions. In these data sets, there exists many unequally spaced events, such as hospital visits. However, data mining of temporal data and point processes is still a developing research area and extracting useful information from such data series is a challenging task. In this paper, we developed a time series data mining approach to predict the number of days in hospital in the coming year for individuals from a general insured population based on their insurance claim data. In the proposed method, the data were windowed at four different timescales (bi-monthly, quarterly, half-yearly and yearly) to construct regularly spaced time series features extracted from such events, resulting in four associated prediction models. A comparison of these models indicates models using a half-yearly windowing scheme delivers the best performance on all three populations (the whole population, a senior sub-population and a non-senior sub-population). The superiority of the half-yearly model was found to be particularly pronounced in the senior sub-population. A bagged decision tree approach was able to predict ‘no hospitalization’ versus ‘at least one day in hospital’ with a Matthews correlation coefficient (MCC) of 0.426. This was significantly better than the corresponding yearly model, which achieved 0.375 for this group of customers. Further reducing the length of the analysis windows to three or two months did not produce further improvements.  相似文献   

3.
ObjectivesPatients increasingly visit online health communities to get help on managing health. The large scale of these online communities makes it impossible for the moderators to engage in all conversations; yet, some conversations need their expertise. Our work explores low-cost text classification methods to this new domain of determining whether a thread in an online health forum needs moderators’ help.MethodsWe employed a binary classifier on WebMD’s online diabetes community data. To train the classifier, we considered three feature types: (1) word unigram, (2) sentiment analysis features, and (3) thread length. We applied feature selection methods based on χ2 statistics and under sampling to account for unbalanced data. We then performed a qualitative error analysis to investigate the appropriateness of the gold standard.ResultsUsing sentiment analysis features, feature selection methods, and balanced training data increased the AUC value up to 0.75 and the F1-score up to 0.54 compared to the baseline of using word unigrams with no feature selection methods on unbalanced data (0.65 AUC and 0.40 F1-score). The error analysis uncovered additional reasons for why moderators respond to patients’ posts.DiscussionWe showed how feature selection methods and balanced training data can improve the overall classification performance. We present implications of weighing precision versus recall for assisting moderators of online health communities. Our error analysis uncovered social, legal, and ethical issues around addressing community members’ needs. We also note challenges in producing a gold standard, and discuss potential solutions for addressing these challenges.ConclusionSocial media environments provide popular venues in which patients gain health-related information. Our work contributes to understanding scalable solutions for providing moderators’ expertise in these large-scale, social media environments.  相似文献   

4.
5.
We present a dimensional information retrieval model for combining concept-based semantics and term statistics within multiple levels of document context to identify concise, variable length passages of text that answer a user query. Our results demonstrate improved search results in the presence of varying levels of semantic evidence, and higher performance using retrieval functions that combine document, as well as sentence and passage level information. Experimental results are promising. When ranking documents based on the most relevant extracted passages, the results exceed the state-of-the-art by 15.28% as assessed by the TREC 2005 Genomics track collection of 4.5 million MEDLINE citations.  相似文献   

6.
The knowledge on protein–protein interactions (PPI) and their related pathways are equally important to understand the biological functions of the living cell. Such information on human proteins is highly desirable to understand the mechanism of several diseases such as cancer, diabetes, and Alzheimer’s disease. Because much of that information is buried in biomedical literature, an automated text mining system for visualizing human PPI and pathways is highly desirable. In this paper, we present HPIminer, a text mining system for visualizing human protein interactions and pathways from biomedical literature. HPIminer extracts human PPI information and PPI pairs from biomedical literature, and visualize their associated interactions, networks and pathways using two curated databases HPRD and KEGG. To our knowledge, HPIminer is the first system to build interaction networks from literature as well as curated databases. Further, the new interactions mined only from literature and not reported earlier in databases are highlighted as new. A comparative study with other similar tools shows that the resultant network is more informative and provides additional information on interacting proteins and their associated networks.  相似文献   

7.
OBJECTIVE: Ontology in clinical domains is becoming a core research field in the realm of medical informatics. The objective of this study is to explore the potential role of formal concept analysis (FCA) in a context-based ontology building support in a clinical domain (e.g. cardiovascular medicine here). METHODOLOGY: We developed an ontology building support system that integrated an FCA module with a natural language processing (NLP) module. The user interface of the system was developed as a Protégé-2000 JAVA tab plug-in. A collection of 368 textual discharge summaries and a standard dictionary of Japanese diagnostic terms (MEDIS ver2.0) were used as the main knowledge sources. A preliminary evaluation was taken to show the usefulness of the system. RESULTS: Stability was shown on the MEDIS-based medical concept extraction with high precision. 73+/-14% (mean+/-S.D.) of the compound medical phrases extracted were sufficiently meaningful to form a medical concept from a clinical perspective. Also, 57.7% of attribute implication pairs (i.e. medical concept pairs) extracted were identified as positive from a clinical perspective. CONCLUSION: Under the framework of our ontology building support system using FCA, the clinical experts could reach a mass of both linguistic information and context-based knowledge that was demonstrated as useful to support their ontology building tasks.  相似文献   

8.
9.

Purpose

Despite the amount of health information available online, there are several barriers that limit the Internet from being adopted as a source of health information. The purpose of this study was to identify individual skill-related problems that users experience when accessing the Internet for health information and services.

Methods

Between November 2009 and February 2010, 88 subjects participated in a performance test in which participants had to complete health-related assignments on the Internet. Subjects were randomly selected from a telephone book. A selective quota sample was used and was divided over equal subsamples of gender, age, and education. Each subject was required to complete nine assignments on the Internet.

Results

The general population experiences many Internet skill-related problems, especially those related to information and strategic Internet skills. Aging and lower levels of education seemed to contribute to the amount of operational and formal skill-related problems experienced. Saving files, bookmarking websites, and using search engines were troublesome for these groups of people. With respect to information skills, the higher the level of educational attainment, the less problems the participants experienced. Although younger subjects experienced far less operational and formal skill-related problems, it was revealed that older subjects were less likely to select and use irrelevant search results and unreliable sources. Concerning the strategic Internet skills it was revealed that older subjects were less likely to make inappropriate decisions based on information gathered.

Conclusions

The amount of online health-related information and services is consistently growing; however, it appears that the general population experiences many skill-related problems, particularly those related to information and strategic Internet skills, and they become very important when it comes to health. These skills are also problematic for younger generations who are often seen as skilled Internet users. The results of the study call for policies that account for low levels of Internet skills.  相似文献   

10.
Introduce the notion of cross-sectional relatedness as an informational dependence relation between sentences in the conclusion section of a breast radiology report and sentences in the findings section of the same report. Assess inter-rater agreement of breast radiologists. Develop and evaluate a support vector machine (SVM) classifier for automatically detecting cross-sectional relatedness. A standard reference is manually created from 444 breast radiology reports by the first author. A subset of 37 reports is annotated by five breast radiologists. Inter-rater agreement is computed among their annotations and standard reference. Thirteen numerical features are developed to characterize pairs of sentences; the optimal feature set is sought through forward selection. Inter-rater agreement is F-measure 0.623. SVM classifier has F-measure of 0.699 in the 12-fold cross-validation protocol against standard reference. Report length does not correlate with the classifier’s performance (correlation coefficient = −0.073). SVM classifier has average F-measure of 0.505 against annotations by breast radiologists. Mediocre inter-rater agreement is possibly caused by: (1) definition is insufficiently actionable, (2) fine-grained nature of cross-sectional relatedness on sentence level, instead of, for instance, on paragraph level, and (3) higher-than-average complexity of 37-report sample. SVM classifier performs better against standard reference than against breast radiologists’s annotations. This is supportive of (3). SVM’s performance on standard reference is satisfactory. Since optimal feature set is not breast specific, results may transfer to non-breast anatomies. Applications include a smart report viewing environment and data mining.

Electronic supplementary material

The online version of this article (doi:10.1007/s10278-013-9612-9) contains supplementary material, which is available to authorized users.  相似文献   

11.
PURPOSE: To investigate the impacts of the first phase of Taiwan's Bureau of National Health Insurance (TBNHI) smart card project on existing hospital information systems. SETTING: TBNHI has launched a nationwide project for replacement of its paper-based health insurance cards by smart cards (or NHI-IC cards) since November 1999. The NHI-IC cards have been used since 1 July 2003, and they have fully replaced the paper-based cards since 1 January 2004. Hospitals must support the cards in order to provide medical services for insured patients. METHODS: We made a comprehensive study of the current phase of the NHI-IC card system, and conducted a questionnaire survey (from 1 October to 30 November, 2003) to investigate the impacts of NHI-IC cards on the existing hospital information systems. A questionnaire was distributed by mail to 479 hospitals, including 23 medical centers, 71 regional hospitals, and 355 district hospitals. The returned questionnaires were also collected by prepaid mail. RESULTS: The questionnaire return rates of the medical centers, regional hospitals and district hospitals were 39.1, 29.6 and 20.9%, respectively. In phase 1 of the project, the average number of card readers purchased per medical center, regional hospital, and district hospital were 202, 45 and 10, respectively. The average person-days for the enhancement of existing information systems of a medical center, regional hospital and district hospital were 175, 74 and 58, respectively. Three months after using the NHI-IC cards most hospitals (60.6%) experienced prolonged service time for their patients due to more interruptions caused mainly by: (1) impairment of the NHI-IC cards (31.2%), (2) failure in authentication of the SAMs (17.0%), (3) malfunction in card readers (15.3%) and (4) problems with interfaces between the card readers and hospital information systems (15.8%). The overall hospital satisfaction on the 5-point Likert scale was 2.86. Although most hospitals were OK with the project, there was about 22% dissatisfied and strongly dissatisfied, that is twice as many hospitals with satisfied (about 10%). CONCLUSIONS: Our recommendations for those who are planning to implement similar projects are: (1) provide public-awareness programs or campaigns across the country for elucidating the smart card policy and educate the public on the proper usage and storage of the cards, (2) improve the quality of the NHI-IC cards, (3) conduct comprehensive tests in software and hardware components associated with NHI-IC cards before operating the systems and (4) perform further investigations in authentication approaches and develop tools that can quickly identify where and what the problems are.  相似文献   

12.
Using text to build semantic networks for pharmacogenomics   总被引:1,自引:0,他引:1  
Most pharmacogenomics knowledge is contained in the text of published studies, and is thus not available for automated computation. Natural Language Processing (NLP) techniques for extracting relationships in specific domains often rely on hand-built rules and domain-specific ontologies to achieve good performance. In a new and evolving field such as pharmacogenomics (PGx), rules and ontologies may not be available. Recent progress in syntactic NLP parsing in the context of a large corpus of pharmacogenomics text provides new opportunities for automated relationship extraction. We describe an ontology of PGx relationships built starting from a lexicon of key pharmacogenomic entities and a syntactic parse of more than 87 million sentences from 17 million MEDLINE abstracts. We used the syntactic structure of PGx statements to systematically extract commonly occurring relationships and to map them to a common schema. Our extracted relationships have a 70–87.7% precision and involve not only key PGx entities such as genes, drugs, and phenotypes (e.g., VKORC1, warfarin, clotting disorder), but also critical entities that are frequently modified by these key entities (e.g., VKORC1 polymorphism, warfarin response, clotting disorder treatment). The result of our analysis is a network of 40,000 relationships between more than 200 entity types with clear semantics. This network is used to guide the curation of PGx knowledge and provide a computable resource for knowledge discovery.  相似文献   

13.
A robust, automated pattern recognition system for polysomnography data targeted to the sleep-waking state and stage identification is presented. Five patterns were searched for: slow-delta and theta wave predominance in the background electro-encephalogram (EEG) activity; presence of sleep spindles in the EEG; presence of rapid eye movements in an electro-oculogram; and presence of muscle tone in an electromyogram. The performance of the automated system was measured indirectly by evaluating sleep staging, based on the experts' accepted methodology, to relate the detected patterns in infants over four months of post-term age. The set of sleep-waking classes included wakefulness, REM sleep and non-REM sleep stages I, II, and III–IV. Several noise and artifact rejection methods were implemented, including filters, fuzzy quality indices, windows of variable sizes and detectors of limb movements and wakefulness. Eleven polysomnographic recordings of healthy infants were studied. The ages of the subjects ranged from 6 to 13 months old. Six recordings counting 2665 epochs were included in the training set. Results on a test set (2369 epochs from five recordings) show an overall agreement of 87.7% (kappa 0.840) between the automated system and the human expert. These results show significant improvements compared with previous work.  相似文献   

14.
This work investigates, whether openEHR with its reference model, archetypes and templates is suitable for the digital representation of demographic as well as clinical data. Moreover, it elaborates openEHR as a tool for modelling Hospital Information Systems on a regional level based on a national logical infrastructure. OpenEHR is a dual model approach developed for the modelling of Hospital Information Systems enabling semantic interoperability. A holistic solution to this represents the use of dual model based Electronic Healthcare Record systems. Modelling data in the field of obstetrics is a challenge, since different regions demand locally specific information for the process of treatment. Smaller health units in developing countries like Brazil or Malaysia, which until recently handled automatable processes like the storage of sensitive patient data in paper form, start organizational reconstruction processes. This archetype proof-of-concept investigation has tried out some elements of the openEHR methodology in cooperation with a health unit in Colombo, Brazil. Two legal forms provided by the Brazilian Ministry of Health have been analyzed and classified into demographic and clinical data. LinkEHR-Ed editor was used to read, edit and create archetypes. Results show that 33 clinical and demographic concepts, which are necessary to cover data demanded by the Unified National Health System, were identified. Out of the concepts 61% were reused and 39% modified to cover domain requirements. The detailed process of reuse, modification and creation of archetypes is shown. We conclude that, although a major part of demographic and clinical patient data were already represented by existing archetypes, a significant part required major modifications. In this study openEHR proved to be a highly suitable tool in the modelling of complex health data. In combination with LinkEHR-Ed software it offers user-friendly and highly applicable tools, although the complexity built by the vast specifications requires expert networks to define generally excepted clinical models. Finally, this project has pointed out main benefits enclosing high coverage of obstetrics data on the Clinical Knowledge Manager, simple modelling, and wide network and support using openEHR. Moreover, barriers described are enclosing the allocation of clinical content to respective archetypes, as well as stagnant adaption of changes on the Clinical Knowledge Manager leading to redundant efforts in data contribution that need to be addressed in future works.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号