首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Objective

To characterize PubMed usage over a typical day and compare it to previous studies of user behavior on Web search engines.

Design

We performed a lexical and semantic analysis of 2,689,166 queries issued on PubMed over 24 consecutive hours on a typical day.

Measurements

We measured the number of queries, number of distinct users, queries per user, terms per query, common terms, Boolean operator use, common phrases, result set size, MeSH categories, used semantic measurements to group queries into sessions, and studied the addition and removal of terms from consecutive queries to gauge search strategies.

Results

The size of the result sets from a sample of queries showed a bimodal distribution, with peaks at approximately 3 and 100 results, suggesting that a large group of queries was tightly focused and another was broad. Like Web search engine sessions, most PubMed sessions consisted of a single query. However, PubMed queries contained more terms.

Conclusion

PubMed’s usage profile should be considered when educating users, building user interfaces, and developing future biomedical information retrieval systems.  相似文献   

2.

Background

Due to the high cost of manual curation of key aspects from the scientific literature, automated methods for assisting this process are greatly desired. Here, we report a novel approach to facilitate MeSH indexing, a challenging task of assigning MeSH terms to MEDLINE citations for their archiving and retrieval.

Methods

Unlike previous methods for automatic MeSH term assignment, we reformulate the indexing task as a ranking problem such that relevant MeSH headings are ranked higher than those irrelevant ones. Specifically, for each document we retrieve 20 neighbor documents, obtain a list of MeSH main headings from neighbors, and rank the MeSH main headings using ListNet–a learning-to-rank algorithm. We trained our algorithm on 200 documents and tested on a previously used benchmark set of 200 documents and a larger dataset of 1000 documents.

Results

Tested on the benchmark dataset, our method achieved a precision of 0.390, recall of 0.712, and mean average precision (MAP) of 0.626. In comparison to the state of the art, we observe statistically significant improvements as large as 39% in MAP (p-value <0.001). Similar significant improvements were also obtained on the larger document set.

Conclusion

Experimental results show that our approach makes the most accurate MeSH predictions to date, which suggests its great potential in making a practical impact on MeSH indexing. Furthermore, as discussed the proposed learning framework is robust and can be adapted to many other similar tasks beyond MeSH indexing in the biomedical domain. All data sets are available at: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/indexing.  相似文献   

3.

Objective

To evaluate: (1) the effectiveness of wireless handheld computers for online information retrieval in clinical settings; (2) the role of MEDLINE® in answering clinical questions raised at the point of care.

Design

A prospective single-cohort study: accompanying medical teams on teaching rounds, five internal medicine residents used and evaluated MD on Tap, an application for handheld computers, to seek answers in real time to clinical questions arising at the point of care.

Measurements

All transactions were stored by an intermediate server. Evaluators recorded clinical scenarios and questions, identified MEDLINE citations that answered the questions, and submitted daily and summative reports of their experience. A senior medical librarian corroborated the relevance of the selected citation to each scenario and question.

Results

Evaluators answered 68% of 363 background and foreground clinical questions during rounding sessions using a variety of MD on Tap features in an average session length of less than four minutes. The evaluator, the number and quality of query terms, the total number of citations found for a query, and the use of auto-spellcheck significantly contributed to the probability of query success.

Conclusion

Handheld computers with Internet access are useful tools for healthcare providers to access MEDLINE in real time. MEDLINE citations can answer specific clinical questions when several medical terms are used to form a query. The MD on Tap application is an effective interface to MEDLINE in clinical settings, allowing clinicians to quickly find relevant citations.  相似文献   

4.

Background

Visual information is a crucial aspect of medical knowledge. Building a comprehensive medical image base, in the spirit of the Unified Medical Language System (UMLS), would greatly benefit patient education and self-care. However, collection and annotation of such a large-scale image base is challenging.

Objective

To combine visual object detection techniques with medical ontology to automatically mine web photos and retrieve a large number of disease manifestation images with minimal manual labeling effort.

Methods

As a proof of concept, we first learnt five organ detectors on three detection scales for eyes, ears, lips, hands, and feet. Given a disease, we used information from the UMLS to select affected body parts, ran the pretrained organ detectors on web images, and combined the detection outputs to retrieve disease images.

Results

Compared with a supervised image retrieval approach that requires training images for every disease, our ontology-guided approach exploits shared visual information of body parts across diseases. In retrieving 2220 web images of 32 diseases, we reduced manual labeling effort to 15.6% while improving the average precision by 3.9% from 77.7% to 81.6%. For 40.6% of the diseases, we improved the precision by 10%.

Conclusions

The results confirm the concept that the web is a feasible source for automatic disease image retrieval for health image database construction. Our approach requires a small amount of manual effort to collect complex disease images, and to annotate them by standard medical ontology terms.  相似文献   

5.

Background

The constantly growing publication rate of medical research articles puts increasing pressure on medical specialists who need to be aware of the recent developments in their field. The currently used literature retrieval systems allow researchers to find specific papers; however the search task is still repetitive and time-consuming.

Aim

In this paper we describe a system that retrieves medical publications by automatically generating queries based on data from an electronic patient record. This allows the doctor to focus on medical issues and provide an improved service to the patient, with higher confidence that it is underpinned by current research.

Method

Our research prototype automatically generates query terms based on the patient record and adds weight factors for each term. Currently the patient’s age is taken into account with a fuzzy logic derived weight, and terms describing blood-related anomalies are derived from recent blood test results. Conditionally selected homonyms are used for query expansion.The query retrieves matching records from a local index of PubMed publications and displays results in descending relevance for the given patient. Recent publications are clearly highlighted for instant recognition by the researcher.

Results

Nine medical specialists from the Royal Adelaide Hospital evaluated the system and submitted pre-trial and post-trial questionnaires. Throughout the study we received positive feedback as doctors felt the support provided by the prototype was useful, and which they would like to use in their daily routine.

Conclusion

By supporting the time-consuming task of query formulation and iterative modification as well as by presenting the search results in order of relevance for the specific patient, literature retrieval becomes part of the daily workflow of busy professionals.  相似文献   

6.
7.

Objectives

Study comparatively (1) concept-based search, using documents pre-indexed by a conceptual hierarchy; (2) context-sensitive search, using structured, labeled documents; and (3) traditional full-text search. Hypotheses were: (1) more contexts lead to better retrieval accuracy; and (2) adding concept-based search to the other searches would improve upon their baseline performances.

Design

Use our Vaidurya architecture, for search and retrieval evaluation, of structured documents classified by a conceptual hierarchy, on a clinical guidelines test collection.

Measurements

Precision computed at different levels of recall to assess the contribution of the retrieval methods. Comparisons of precisions done with recall set at 0.5, using t-tests.

Results

Performance increased monotonically with the number of query context elements. Adding context-sensitive elements, mean improvement was 11.1% at recall 0.5. With three contexts, mean query precision was 42% ± 17% (95% confidence interval [CI], 31% to 53%); with two contexts, 32% ± 13% (95% CI, 27% to 38%); and one context, 20% ± 9% (95% CI, 15% to 24%). Adding context-based queries to full-text queries monotonically improved precision beyond the 0.4 level of recall. Mean improvement was 4.5% at recall 0.5. Adding concept-based search to full-text search improved precision to 19.4% at recall 0.5.

Conclusions

The study demonstrated usefulness of concept-based and context-sensitive queries for enhancing the precision of retrieval from a digital library of semi-structured clinical guideline documents. Concept-based searches outperformed free-text queries, especially when baseline precision was low. In general, the more ontological elements used in the query, the greater the resulting precision.  相似文献   

8.

Objective

To design, build, and evaluate a storage model able to manage heterogeneous digital imaging and communications in medicine (DICOM) images. The model must be simple, but flexible enough to accommodate variable content without structural modifications; must be effective on answering query/retrieval operations according to the DICOM standard; and must provide performance gains on querying/retrieving content to justify its adoption by image-related projects.

Methods

The proposal adapts the original decomposed storage model, incorporating structural and organizational characteristics present in DICOM image files. Tag values are stored according to their data types/domains, in a schema built on top of a standard relational database management system (RDBMS). Evaluation includes storing heterogeneous DICOM images, querying metadata using a variable number of predicates, and retrieving full-content images for different hierarchical levels.

Results and discussion

When compared to a well established DICOM image archive, the proposal is 0.6–7.2 times slower in storing content; however, in querying individual tags, it is about 48.0% faster. In querying groups of tags, DICOM decomposed storage model (DCMDSM) is outperformed in scenarios with a large number of tags and low selectivity (being 66.5% slower); however, when the number of tags is balanced with better selectivity predicates, the performance gains are up to 79.1%. In executing full-content retrieval, in turn, the proposal is about 48.3% faster.

Conclusions

DCMDSM is a model built for the storage of heterogeneous DICOM content, based on a straightforward database design. The results obtained through its evaluation attest its suitability as a storage layer for projects where DICOM images are stored once, and queried/retrieved whenever necessary.  相似文献   

9.

Objectives

The aim of this study was to improve naïve Bayes prediction of Medical Subject Headings (MeSH) assignment to documents using optimal training sets found by an active learning inspired method.

Design

The authors selected 20 MeSH terms whose occurrences cover a range of frequencies. For each MeSH term, they found an optimal training set, a subset of the whole training set. An optimal training set consists of all documents including a given MeSH term (C 1 class) and those documents not including a given MeSH term (C −1 class) that are closest to the C 1 class. These small sets were used to predict MeSH assignments in the MEDLINE® database.

Measurements

Average precision was used to compare MeSH assignment using the naïve Bayes learner trained on the whole training set, optimal sets, and random sets. The authors compared 95% lower confidence limits of average precisions of naïve Bayes with upper bounds for average precisions of a K-nearest neighbor (KNN) classifier.

Results

For all 20 MeSH assignments, the optimal training sets produced nearly 200% improvement over use of the whole training sets. In 17 of those MeSH assignments, naïve Bayes using optimal training sets was statistically better than a KNN. In 15 of those, optimal training sets performed better than optimized feature selection. Overall naïve Bayes averaged 14% better than a KNN for all 20 MeSH assignments. Using these optimal sets with another classifier, C-modified least squares (CMLS), produced an additional 6% improvement over naïve Bayes.

Conclusion

Using a smaller optimal training set greatly improved learning with naïve Bayes. The performance is superior to a KNN. The small training set can be used with other sophisticated learning methods, such as CMLS, where using the whole training set would not be feasible.  相似文献   

10.

Objectives

To develop mechanisms to formulate queries over the semantic representation of cancer-related data services available through the cancer Biomedical Informatics Grid (caBIG).

Design

The semCDI query formulation uses a view of caBIG semantic concepts, metadata, and data as an ontology, and defines a methodology to specify queries using the SPARQL query language, extended with Horn rules. semCDI enables the joining of data that represent different concepts through associations modeled as object properties, and the merging of data representing the same concept in different sources through Common Data Elements (CDE) modeled as datatype properties, using Horn rules to specify additional semantics indicating conditions for merging data.

Validation

In order to validate this formulation, a prototype has been constructed, and two queries have been executed against currently available caBIG data services.

Discussion

The semCDI query formulation uses the rich semantic metadata available in caBIG to build queries and integrate data from multiple sources. Its promise will be further enhanced as more data services are registered in caBIG, and as more linkages can be achieved between the knowledge contained within caBIG''s NCI Thesaurus and the data contained in the Data Services.

Conclusion

semCDI provides a formulation for the creation of queries on the semantic representation of caBIG. This constitutes the foundation to build a semantic data integration system for more efficient and effective querying and exploratory searching of cancer-related data.  相似文献   

11.

Objective:

To investigate the clinical use of cone beam computed tomography in the diagnosis of patients with odontogenic jaw keratocyst and to guide computer-aided surgical treatment planning.

Methods:

Imaging, image processing, and visualization technologies were used to produce clear diagnosis, provide proper treatment, and formulate favourable prognosis. Cone beam computed tomography was used to collect medical information including site, extent, shape, and other characteristic features of a patient with large odontogenic jaw keratocyst.

Results:

The imaging technique produced excellent results in imaging, image processing and threedimensional (3D) visualization.

Conclusion:

The 3D digital reconstruction model of the odontogenic jaw keratocyst was shown intuitively.  相似文献   

12.

Objective

To describe the creation and evaluate the use of a wiki by medical residents, and to determine if a wiki would be a useful tool for improving the experience, efficiency, and education of housestaff.

Materials and methods

In 2008, a team of medical residents built a wiki containing institutional knowledge and reference information using Microsoft SharePoint. We tracked visit data for 3 years, and performed an audit of page views and updates in the second year. We evaluated the attitudes of medical residents toward the wiki using a survey.

Results

Users accessed the wiki 23 218, 35 094, and 40 545 times in each of three successive academic years from 2008 to 2011. In the year two audit, 85 users made a total of 1082 updates to 176 pages and of these, 91 were new page creations by 17 users. Forty-eight percent of residents edited a page. All housestaff felt the wiki improved their ability to complete tasks, and 90%, 89%, and 57% reported that the wiki improved their experience, efficiency, and education, respectively, when surveyed in academic year 2009–2010.

Discussion

A wiki is a useful and popular tool for organizing administrative and educational content for residents. Housestaff felt strongly that the wiki improved their workflow, but a smaller educational impact was observed. Nearly half of the housestaff edited the wiki, suggesting broad buy-in among the residents.

Conclusion

A wiki is a feasible and useful tool for improving information retrieval for house officers.  相似文献   

13.

Objectives

Large databases of published medical research can support clinical decision making by providing physicians with the best available evidence. The time required to obtain optimal results from these databases using traditional systems often makes accessing the databases impractical for clinicians. This article explores whether a hybrid approach of augmenting traditional information retrieval with knowledge-based methods facilitates finding practical clinical advice in the research literature.

Design

Three experimental systems were evaluated for their ability to find MEDLINE citations providing answers to clinical questions of different complexity. The systems (SemRep, Essie, and CQA-1.0), which rely on domain knowledge and semantic processing to varying extents, were evaluated separately and in combination. Fifteen therapy and prevention questions in three categories (general, intermediate, and specific questions) were searched. The first 10 citations retrieved by each system were randomized, anonymized, and evaluated on a three-point scale. The reasons for ratings were documented.

Measurements

Metrics evaluating the overall performance of a system (mean average precision, binary preference) and metrics evaluating the number of relevant documents in the first several presented to a physician were used.

Results

Scores (mean average precision = 0.57, binary preference = 0.71) for fusion of the retrieval results of the three systems are significantly (p < 0.01) better than those for any individual system. All three systems present three to four relevant citations in the first five for any question type.

Conclusion

The improvements in finding relevant MEDLINE citations due to knowledge-based processing show promise in assisting physicians to answer questions in clinical practice.  相似文献   

14.

Objectives

Effective health communication is often hindered by a “vocabulary gap” between language familiar to consumers and jargon used in medical practice and research. To present health information to consumers in a comprehensible fashion, we need to develop a mechanism to quantify health terms as being more likely or less likely to be understood by typical members of the lay public. Prior research has used approaches including syllable count, easy word list, and frequency count, all of which have significant limitations.

Design

In this article, we present a new method that predicts consumer familiarity using contextual information. The method was applied to a large query log data set and validated using results from two previously conducted consumer surveys.

Measurements

We measured the correlation between the survey result and the context-based prediction, syllable count, frequency count, and log normalized frequency count.

Results

The correlation coefficient between the context-based prediction and the survey result was 0.773 (p < 0.001), which was higher than the correlation coefficients between the survey result and the syllable count, frequency count, and log normalized frequency count (p ≤ 0.012).

Conclusions

The context-based approach provides a good alternative to the existing term familiarity assessment methods.  相似文献   

15.

Objective

To build an effective co-reference resolution system tailored to the biomedical domain.

Methods

Experimental materials used in this study were provided by the 2011 i2b2 Natural Language Processing Challenge. The 2011 i2b2 challenge involves co-reference resolution in medical documents. Concept mentions have been annotated in clinical texts, and the mentions that co-refer in each document are linked by co-reference chains. Normally, there are two ways of constructing a system to automatically discoverco-referent links. One is to manually build rules forco-reference resolution; the other is to use machine learning systems to learn automatically from training datasets and then perform the resolution task on testing datasets.

Results

The existing co-reference resolution systems are able to find some of the co-referent links; our rule based system performs well, finding the majority of the co-referent links. Our system achieved 89.6% overall performance on multiple medical datasets.

Conclusions

Manually crafted rules based on observation of training data is a valid way to accomplish high performance in this co-reference resolution task for the critical biomedical domain.  相似文献   

16.

Objective

A method for the automatic resolution of coreference between medical concepts in clinical records.

Materials and methods

A multiple pass sieve approach utilizing support vector machines (SVMs) at each pass was used to resolve coreference. Information such as lexical similarity, recency of a concept mention, synonymy based on Wikipedia redirects, and local lexical context were used to inform the method. Results were evaluated using an unweighted average of MUC, CEAF, and B3 coreference evaluation metrics. The datasets used in these research experiments were made available through the 2011 i2b2/VA Shared Task on Coreference.

Results

The method achieved an average F score of 0.821 on the ODIE dataset, with a precision of 0.802 and a recall of 0.845. These results compare favorably to the best-performing system with a reported F score of 0.827 on the dataset and the median system F score of 0.800 among the eight teams that participated in the 2011 i2b2/VA Shared Task on Coreference. On the i2b2 dataset, the method achieved an average F score of 0.906, with a precision of 0.895 and a recall of 0.918 compared to the best F score of 0.915 and the median of 0.859 among the 16 participating teams.

Discussion

Post hoc analysis revealed significant performance degradation on pathology reports. The pathology reports were characterized by complex synonymy and very few patient mentions.

Conclusion

The use of several simple lexical matching methods had the most impact on achieving competitive performance on the task of coreference resolution. Moreover, the ability to detect patients in electronic medical records helped to improve coreference resolution more than other linguistic analysis.  相似文献   

17.

Objective

Despite at least 40 years of promising empirical performance, very few clinical natural language processing (NLP) or information extraction systems currently contribute to medical science or care. The authors address this gap by reducing the need for custom software and rules development with a graphical user interface-driven, highly generalizable approach to concept-level retrieval.

Materials and methods

A ‘learn by example’ approach combines features derived from open-source NLP pipelines with open-source machine learning classifiers to automatically and iteratively evaluate top-performing configurations. The Fourth i2b2/VA Shared Task Challenge''s concept extraction task provided the data sets and metrics used to evaluate performance.

Results

Top F-measure scores for each of the tasks were medical problems (0.83), treatments (0.82), and tests (0.83). Recall lagged precision in all experiments. Precision was near or above 0.90 in all tasks.

Discussion

With no customization for the tasks and less than 5 min of end-user time to configure and launch each experiment, the average F-measure was 0.83, one point behind the mean F-measure of the 22 entrants in the competition. Strong precision scores indicate the potential of applying the approach for more specific clinical information extraction tasks. There was not one best configuration, supporting an iterative approach to model creation.

Conclusion

Acceptable levels of performance can be achieved using fully automated and generalizable approaches to concept-level information extraction. The described implementation and related documentation is available for download.  相似文献   

18.

Objective

Explore the automated acquisition of knowledge in biomedical and clinical documents using text mining and statistical techniques to identify disease-drug associations.

Design

Biomedical literature and clinical narratives from the patient record were mined to gather knowledge about disease-drug associations. Two NLP systems, BioMedLEE and MedLEE, were applied to Medline articles and discharge summaries, respectively. Disease and drug entities were identified using the NLP systems in addition to MeSH annotations for the Medline articles. Focusing on eight diseases, co-occurrence statistics were applied to compute and evaluate the strength of association between each disease and relevant drugs.

Results

Ranked lists of disease-drug pairs were generated and cutoffs calculated for identifying stronger associations among these pairs for further analysis. Differences and similarities between the text sources (i.e., biomedical literature and patient record) and annotations (i.e., MeSH and NLP-extracted UMLS concepts) with regards to disease-drug knowledge were observed.

Conclusion

This paper presents a method for acquiring disease-specific knowledge and a feasibility study of the method. The method is based on applying a combination of NLP and statistical techniques to both biomedical and clinical documents. The approach enabled extraction of knowledge about the drugs clinicians are using for patients with specific diseases based on the patient record, while it is also acquired knowledge of drugs frequently involved in controlled trials for those same diseases. In comparing the disease-drug associations, we found the results to be appropriate: the two text sources contained consistent as well as complementary knowledge, and manual review of the top five disease-drug associations by a medical expert supported their correctness across the diseases.  相似文献   

19.

Background

Current image sharing is carried out by manual transportation of CDs by patients or organization-coordinated sharing networks. The former places a significant burden on patients and providers. The latter faces challenges to patient privacy.

Objective

To allow healthcare providers efficient access to medical imaging data acquired at other unaffiliated healthcare facilities while ensuring strong protection of patient privacy and minimizing burden on patients, providers, and the information technology infrastructure.

Methods

An image sharing framework is described that involves patients as an integral part of, and with full control of, the image sharing process. Central to this framework is the Patient Controlled Access-key REgistry (PCARE) which manages the access keys issued by image source facilities. When digitally signed by patients, the access keys are used by any requesting facility to retrieve the associated imaging data from the source facility. A centralized patient portal, called a PCARE patient control portal, allows patients to manage all the access keys in PCARE.

Results

A prototype of the PCARE framework has been developed by extending open-source technology. The results for feasibility, performance, and user assessments are encouraging and demonstrate the benefits of patient-controlled image sharing.

Discussion

The PCARE framework is effective in many important clinical cases of image sharing and can be used to integrate organization-coordinated sharing networks. The same framework can also be used to realize a longitudinal virtual electronic health record.

Conclusion

The PCARE framework allows prior imaging data to be shared among unaffiliated healthcare facilities while protecting patient privacy with minimal burden on patients, providers, and infrastructure. A prototype has been implemented to demonstrate the feasibility and benefits of this approach.  相似文献   

20.

Objectives

The UMLS constitutes the largest existing collection of medical terms. However, little has been published about the users and uses of the UMLS. This study sheds light on these issues.

Design

We designed a questionnaire consisting of 26 questions and distributed it to the UMLS user mailing list. Participants were assured complete confidentiality of their replies. To further encourage list members to respond, we promised to provide them with early results prior to publication. Sector analysis of the responses, according to employment organizations is used to obtain insights into some responses.

Results

We received 70 responses. The study confirms two intended uses of the UMLS: access to source terminologies (75%), and mapping among them (44%). However, most access is just to a few sources, led by SNOMED, MeSH, and ICD. Out of 119 reported purposes of use, terminology research (37), information retrieval (19), and terminology translation (14) lead. Four important observations are that the UMLS is widely used as a terminology (77%), even though it was not designed as one; many users (73%) want the NLM to mark concepts with multiple parents in an indented hierarchy and to derive a terminology from the UMLS (73%). Finally, auditing the UMLS is a top budget priority (35%) for users.

Conclusions

The study reports many uses of the UMLS in a variety of subjects from terminology research to decision support and phenotyping. The study confirms that the UMLS is used to access its source terminologies and to map among them. Two primary concerns of the existing user base are auditing the UMLS and the design of a UMLS-based derived terminology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号