首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到15条相似文献,搜索用时 10 毫秒
1.
通过从时间、著者、期刊、机构、关键词5个方面对我国自然语言处理期刊论文进行的计量分析,揭示了我国自然语言处理的研究现状及重点,为国内相关领域科研人员更好地把握研究方向及未来的深入研究提供参考.  相似文献   

2.
阿尔茨海默病(Alzheimer's disease,AD)是一种不可逆的神经系统退行性病变,主要表现为记忆、执行、语言等方面的损害,早期发现并干预可延缓疾病的进展.目前,临床上常用传统纸笔式神经心理量表评估患者认知功能,但其具有一定的主观性及局限性.人工智能自然语言处理(natural language proces...  相似文献   

3.

Objective

To build an effective co-reference resolution system tailored to the biomedical domain.

Methods

Experimental materials used in this study were provided by the 2011 i2b2 Natural Language Processing Challenge. The 2011 i2b2 challenge involves co-reference resolution in medical documents. Concept mentions have been annotated in clinical texts, and the mentions that co-refer in each document are linked by co-reference chains. Normally, there are two ways of constructing a system to automatically discoverco-referent links. One is to manually build rules forco-reference resolution; the other is to use machine learning systems to learn automatically from training datasets and then perform the resolution task on testing datasets.

Results

The existing co-reference resolution systems are able to find some of the co-referent links; our rule based system performs well, finding the majority of the co-referent links. Our system achieved 89.6% overall performance on multiple medical datasets.

Conclusions

Manually crafted rules based on observation of training data is a valid way to accomplish high performance in this co-reference resolution task for the critical biomedical domain.  相似文献   

4.
Objective Many tasks in natural language processing utilize lexical pattern-matching techniques, including information extraction (IE), negation identification, and syntactic parsing. However, it is generally difficult to derive patterns that achieve acceptable levels of recall while also remaining highly precise.Materials and Methods We present a multiple sequence alignment (MSA)-based technique that automatically generates patterns, thereby leveraging language usage to determine the context of words that influence a given target. MSAs capture the commonalities among word sequences and are able to reveal areas of linguistic stability and variation. In this way, MSAs provide a systemic approach to generating lexical patterns that are generalizable, which will both increase recall levels and maintain high levels of precision.Results The MSA-generated patterns exhibited consistent F1-, F.5-, and F2- scores compared to two baseline techniques for IE across four different tasks. Both baseline techniques performed well for some tasks and less well for others, but MSA was found to consistently perform at a high level for all four tasks.Discussion The performance of MSA on the four extraction tasks indicates the method’s versatility. The results show that the MSA-based patterns are able to handle the extraction of individual data elements as well as relations between two concepts without the need for large amounts of manual intervention.Conclusion We presented an MSA-based framework for generating lexical patterns that showed consistently high levels of both performance and recall over four different extraction tasks when compared to baseline methods.  相似文献   

5.
文本挖掘在生物医学领域中的应用及其系统工具   总被引:2,自引:2,他引:2       下载免费PDF全文
系统介绍了生物医学文本挖掘的具体流程和文本挖掘技术在生物医学领域中的应用情况,并着重从自然语言处理和本体、命名实体识别、关系抽取、文本分类与聚类、共现分析、系统工具及评价、可视化等方面分别做了阐述.  相似文献   

6.
目的:构建基于自然语言处理的临床合理用药知识图谱。方法:以国家食品药品监督管理总局(CFDA)、美国食品药品监督管理总局(FDA)及某大型三甲医院药品库中药品说明书为数据源,构建了一种基于深度学习算法的临床合理用药知识图谱库。对随机抽取的500份药品说明书进行人工标注,将标注的数据划分为训练集、测试集、验证集。基于深度学习模型BRET进行训练,通过训练集训练模型和验证集验证训练过程中的性能及训练后通过测试集测试模型性能,用优化后的机器学习模型预测未标注的药品说明书。结果:最终抽取出30余万条“实体-关系-实体”的三元组关系,将机器学习模型产生的三元组与领域专家标注产生的三元组一起导入Neo4j图形数据库中存储,以知识图谱的形式展现给临床药师。结论:通过基于深度学习算法的临床合理用药知识库构建,在标引少量药品说明书的前提下,挖掘出药品说明书中所有的医疗关系和实体。自动构建基于药品说明书的合理用药知识图谱,可提高合理用药的自动化程度和准确度,降低不合理用药。  相似文献   

7.

Objective

Relation extraction in biomedical text mining systems has largely focused on identifying clause-level relations, but increasing sophistication demands the recognition of relations at discourse level. A first step in identifying discourse relations involves the detection of discourse connectives: words or phrases used in text to express discourse relations. In this study supervised machine-learning approaches were developed and evaluated for automatically identifying discourse connectives in biomedical text.

Materials and Methods

Two supervised machine-learning models (support vector machines and conditional random fields) were explored for identifying discourse connectives in biomedical literature. In-domain supervised machine-learning classifiers were trained on the Biomedical Discourse Relation Bank, an annotated corpus of discourse relations over 24 full-text biomedical articles (∼112 000 word tokens), a subset of the GENIA corpus. Novel domain adaptation techniques were also explored to leverage the larger open-domain Penn Discourse Treebank (∼1 million word tokens). The models were evaluated using the standard evaluation metrics of precision, recall and F1 scores.

Results and Conclusion

Supervised machine-learning approaches can automatically identify discourse connectives in biomedical text, and the novel domain adaptation techniques yielded the best performance: 0.761 F1 score. A demonstration version of the fully implemented classifier BioConn is available at: http://bioconn.askhermes.org.  相似文献   

8.

Objective

Named entity recognition (NER) is one of the fundamental tasks in natural language processing. In the medical domain, there have been a number of studies on NER in English clinical notes; however, very limited NER research has been carried out on clinical notes written in Chinese. The goal of this study was to systematically investigate features and machine learning algorithms for NER in Chinese clinical text.

Materials and methods

We randomly selected 400 admission notes and 400 discharge summaries from Peking Union Medical College Hospital in China. For each note, four types of entity—clinical problems, procedures, laboratory test, and medications—were annotated according to a predefined guideline. Two-thirds of the 400 notes were used to train the NER systems and one-third for testing. We investigated the effects of different types of feature including bag-of-characters, word segmentation, part-of-speech, and section information, and different machine learning algorithms including conditional random fields (CRF), support vector machines (SVM), maximum entropy (ME), and structural SVM (SSVM) on the Chinese clinical NER task. All classifiers were trained on the training dataset and evaluated on the test set, and micro-averaged precision, recall, and F-measure were reported.

Results

Our evaluation on the independent test set showed that most types of feature were beneficial to Chinese NER systems, although the improvements were limited. The system achieved the highest performance by combining word segmentation and section information, indicating that these two types of feature complement each other. When the same types of optimized feature were used, CRF and SSVM outperformed SVM and ME. More specifically, SSVM achieved the highest performance of the four algorithms, with F-measures of 93.51% and 90.01% for admission notes and discharge summaries, respectively.  相似文献   

9.
ObjectiveSeizure frequency and seizure freedom are among the most important outcome measures for patients with epilepsy. In this study, we aimed to automatically extract this clinical information from unstructured text in clinical notes. If successful, this could improve clinical decision-making in epilepsy patients and allow for rapid, large-scale retrospective research.Materials and MethodsWe developed a finetuning pipeline for pretrained neural models to classify patients as being seizure-free and to extract text containing their seizure frequency and date of last seizure from clinical notes. We annotated 1000 notes for use as training and testing data and determined how well 3 pretrained neural models, BERT, RoBERTa, and Bio_ClinicalBERT, could identify and extract the desired information after finetuning.ResultsThe finetuned models (BERTFT, Bio_ClinicalBERTFT, and RoBERTaFT) achieved near-human performance when classifying patients as seizure free, with BERTFT and Bio_ClinicalBERTFT achieving accuracy scores over 80%. All 3 models also achieved human performance when extracting seizure frequency and date of last seizure, with overall F1 scores over 0.80. The best combination of models was Bio_ClinicalBERTFT for classification, and RoBERTaFT for text extraction. Most of the gains in performance due to finetuning required roughly 70 annotated notes.Discussion and ConclusionOur novel machine reading approach to extracting important clinical outcomes performed at or near human performance on several tasks. This approach opens new possibilities to support clinical practice and conduct large-scale retrospective clinical research. Future studies can use our finetuning pipeline with minimal training annotations to answer new clinical questions.  相似文献   

10.
在调研2008-2012年自然语言检索研究与应用的基础上,对当前自然语言检索研究进行了综述,旨在克服传统网络检索技术的局限性,为知识检索提供支持。  相似文献   

11.
中医自然语言处理研究方法综述   总被引:2,自引:0,他引:2  
简要介绍自然语言处理在中医学中的应用,通过对相关文献的研究分析,阐述关联规则挖掘、聚类分析、信息抽取、机器学习等方法的特点与应用方向。总结构建中医知识网络的相关方法,基于构建知识网络的方法提出未来中医自然语言处理研究的新思路。  相似文献   

12.
ObjectiveOutcomes mentioned on online health communities (OHCs) by patients can serve as a source of evidence for off-label drug usage evaluation, but identifying these outcomes manually is tedious work. We have built a natural language processing model to identify off-label usage of drugs mentioned in these patient posts.Materials and MethodsSingle patient posts from 4 major OHCs were considered for this study. A text classification model was built to classify the posts as either relevant or not relevant based on patient experience. The relevant posts were passed through a spelling correction tool, CSpell, and then medications and indications from these posts were identified using cTAKES (clinical Text Analysis and Knowledge Extraction System), a named entity recognition tool. Drug and indication pairs were identified using a dependency parser. Finally, if the paired indication was not mentioned on the label of the drug approved by U.S. Food and Drug Administration, it was tagged as off-label use of that drug.ResultsUsing this algorithm, we identified 289 off-label indications, achieving a recall of 76%.ConclusionsThe method designed in this study identifies and extracts the semantic relationship between drugs and indications from demotic posts in OHCs. The results demonstrate the feasibility of using natural language processing techniques in identifying off-label drug usage across online health forums for a variety of drugs. Understanding patients’ off-label use of drugs may be able to help manufacturers innovate to better address patients’ needs and assist doctors’ prescribing decisions.  相似文献   

13.

Objectives

To test the feasibility of using text mining to depict meaningfully the experience of pain in patients with metastatic prostate cancer, to identify novel pain phenotypes, and to propose methods for longitudinal visualization of pain status.

Materials and methods

Text from 4409 clinical encounters for 33 men enrolled in a 15-year longitudinal clinical/molecular autopsy study of metastatic prostate cancer (Project to ELIminate lethal CANcer) was subjected to natural language processing (NLP) using Unified Medical Language System-based terms. A four-tiered pain scale was developed, and logistic regression analysis identified factors that correlated with experience of severe pain during each month.

Results

NLP identified 6387 pain and 13 827 drug mentions in the text. Graphical displays revealed the pain ‘landscape’ described in the textual records and confirmed dramatically increasing levels of pain in the last years of life in all but two patients, all of whom died from metastatic cancer. Severe pain was associated with receipt of opioids (OR=6.6, p<0.0001) and palliative radiation (OR=3.4, p=0.0002). Surprisingly, no severe or controlled pain was detected in two of 33 subjects’ clinical records. Additionally, the NLP algorithm proved generalizable in an evaluation using a separate data source (889 Informatics for Integrating Biology and the Bedside (i2b2) discharge summaries).

Discussion

Patterns in the pain experience, undetectable without the use of NLP to mine the longitudinal clinical record, were consistent with clinical expectations, suggesting that meaningful NLP-based pain status monitoring is feasible. Findings in this initial cohort suggest that ‘outlier’ pain phenotypes useful for probing the molecular basis of cancer pain may exist.

Limitations

The results are limited by a small cohort size and use of proprietary NLP software.

Conclusions

We have established the feasibility of tracking longitudinal patterns of pain by text mining of free text clinical records. These methods may be useful for monitoring pain management and identifying novel cancer phenotypes.  相似文献   

14.
ObjectiveAdherence to a treatment plan from HIV-positive patients is necessary to decrease their mortality and improve their quality of life, however some patients display poor appointment adherence and become lost to follow-up (LTFU). We applied natural language processing (NLP) to analyze indications towards or against LTFU in HIV-positive patients’ notes.Materials and MethodsUnstructured lemmatized notes were labeled with an LTFU or Retained status using a 183-day threshold. An NLP and supervised machine learning system with a linear model and elastic net regularization was trained to predict this status. Prevalence of characteristics domains in the learned model weights were evaluated.ResultsWe analyzed 838 LTFU vs 2964 Retained notes and obtained a weighted F1 mean of 0.912 via nested cross-validation; another experiment with notes from the same patients in both classes showed substantially lower metrics. “Comorbidities” were associated with LTFU through, for instance, “HCV” (hepatitis C virus) and likewise “Good adherence” with Retained, represented with “Well on ART” (antiretroviral therapy).DiscussionMentions of mental health disorders and substance use were associated with disparate retention outcomes, however history vs active use was not investigated. There remains further need to model transitions between LTFU and being retained in care over time.ConclusionWe provided an important step for the future development of a model that could eventually help to identify patients who are at risk for falling out of care and to analyze which characteristics could be factors for this. Further research is needed to enhance this method with structured electronic medical record fields.  相似文献   

15.
ObjectiveElectronic health record documentation by intensive care unit (ICU) clinicians may predict patient outcomes. However, it is unclear whether physician and nursing notes differ in their ability to predict short-term ICU prognosis. We aimed to investigate and compare the ability of physician and nursing notes, written in the first 48 hours of admission, to predict ICU length of stay and mortality using 3 analytical methods.Materials and MethodsThis was a retrospective cohort study with split sampling for model training and testing. We included patients ≥18 years of age admitted to the ICU at Beth Israel Deaconess Medical Center in Boston, Massachusetts, from 2008 to 2012. Physician or nursing notes generated within the first 48 hours of admission were used with standard machine learning methods to predict outcomes.ResultsFor the primary outcome of composite score of ICU length of stay ≥7 days or in-hospital mortality, the gradient boosting model had better performance than the logistic regression and random forest models. Nursing and physician notes achieved area under the curves (AUCs) of 0.826 and 0.796, respectively, with even better predictive power when combined (AUC, 0.839).DiscussionModels using only nursing notes more accurately predicted short-term prognosis than did models using only physician notes, but in combination, the models achieved the greatest accuracy in prediction. ConclusionsOur findings demonstrate that statistical models derived from text analysis in the first 48 hours of ICU admission can predict patient outcomes. Physicians’ and nurses’ notes are both uniquely important in mortality prediction and combining these notes can produce a better predictive model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号