首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In today’s medical world, data on symptoms of patients with various diseases are so widespread, that analysis and consideration of all factors is merely not possible by a person (doctor). Therefore, the need for an intelligent system to consider the various factors and identify a suitable model between the different parameters is evident. Knowledge of data mining, as the foundation of such systems, has played a vital role in the advancement of medical sciences, especially in diagnosis of various diseases. Type 2 diabetes is one of these diseases, which has increased in recent years, which if diagnosed late can lead to serious complications. In this paper, several data mining methods and algorithms have been used and applied to a set of screening data for type 2 diabetes in Tabriz, Iran. The performance of methods such as support vector machine, artificial neural network, decision tree, nearest neighbors, and Bayesian network has been compared in an effort to find the best algorithm for diagnosing this disease. Artificial neural network with an accuracy rate of 97.44 % has the best performance on the chosen dataset. Accuracy rates for support vector machine, decision tree, 5-nearest neighbor, and Bayesian network are 81.19, 95.03, 90.85, and 91.60 %, respectively. The results of the simulations show that the effectiveness of various classification techniques on a dataset depends on the application, as well as the nature and complexity of the dataset used. Moreover, it is not always possible to say that a classification technique will always have the best performance. Therefore, in cases where data mining is used for diagnosis or prediction of diseases, consultation with specialists is inevitable, for selecting the number and type of dataset parameters to obtain the best possible results.  相似文献   

2.
Background:Personal insulin pumps have shown to be effective in improving the quality of therapy for people with type 1 diabetes (T1D). However, the safety of this technology is limited by the possible infusion site failures, which are linked with hyperglycemia and ketoacidosis. Thanks to the large availability of collected data provided by modern therapeutic technologies, machine learning algorithms have the potential to provide new way to identify failures early and avert adverse events.Methods:A clinical dataset (N = 20) is used to evaluate a novel method for detecting real-time infusion site failures using unsupervised anomaly detection algorithms, previously proposed and developed on in-silico data. An adapted feature engineering procedure is introduced to make the method able to operate in the absence of a closed-loop (CL) system and meal announcements.Results:In the optimal configuration, we obtained a performance of 0.75 Sensitivity (15 out of 20 total failures detected) and 0.08 FP/day, outperforming previously proposed literature algorithms. The algorithm was able to anticipate the replacement of the malfunctioning infusion sets by ~2 h on average.Conclusions:On the considered dataset, the proposed algorithm showed the potential to improve the safety of patients treated with sensor-augmented pump systems.  相似文献   

3.
How does genome evolution affect the rate of diversification of biological lineages? Recent studies have suggested that the overall rate of genome evolution is correlated with the rate of diversification. If true, this claim has important consequences for understanding the process of diversification, and implications for the use of DNA sequence data to reconstruct evolutionary history. However, the generality and cause of this relationship have not been established. Here, we test the relationship between the rate of molecular evolution and net diversification with a 19-gene, 17-kb DNA sequence dataset from 64 families of birds. We show that rates of molecular evolution are positively correlated to net diversification in birds. Using a 7.6-kb dataset of protein-coding DNA, we show that the synonymous substitution rate, and therefore the mutation rate, is correlated to net diversification. Further analysis shows that the link between mutation rates and net diversification is unlikely to be the indirect result of correlations with life-history variables that may influence both quantities, suggesting that there might be a causal link between mutation rates and net diversification.  相似文献   

4.
Ma Luo 《Viruses》2022,14(6)
Natural immunity against HIV has been observed in many individuals in the world. Among them, a group of female sex workers enrolled in the Pumwani sex worker cohort remained HIV uninfected for more than 30 years despite high-risk sex work. Many studies have been carried out to understand this natural immunity to HIV in the hope to develop effective vaccines and preventions. This review focuses on two such examples. These studies started from identifying immunogenetic or genetic associations with resistance to HIV acquisition, and followed up with an in-depth investigation to understand the biological relevance of the correlations of protection, and to develop and test novel vaccines and preventions.  相似文献   

5.
BackgroundAccurate prognostic estimation for esophageal cancer (EC) patients plays an important role in the process of clinical decision-making. The objective of this study was to develop an effective model to predict the 5-year survival status of EC patients using machine learning (ML) algorithms.MethodsWe retrieved the information of patients diagnosed with EC between 2010 and 2015 from the Surveillance, Epidemiology, and End Results (SEER) Program, including 24 features. A total of 8 ML models were applied to the selected dataset to classify the EC patients in terms of 5-year survival status, including 3 newly developed gradient boosting models (GBM), XGBoost, CatBoost, and LightGBM, 2 commonly used tree-based models, gradient boosting decision trees (GBDT) and random forest (RF), and 3 other ML models, artificial neural networks (ANN), naive Bayes (NB), and support vector machines (SVM). A 5-fold cross-validation was used in model performance measurement.ResultsAfter excluding records with missing data, the final study population comprised 10,588 patients. Feature selection was conducted based on the χ2 test, however, the experiment results showed that the complete dataset provided better prediction of outcomes than the dataset with removal of non-significant features. Among the 8 models, XGBoost had the best performance [area under the receiver operating characteristic (ROC) curve (AUC): 0.852 for XGBoost, 0.849 for CatBoost, 0.850 for LightGBM, 0.846 for GBDT, 0.838 for RF, 0.844 for ANN, 0.833 for NB, and 0.789 for SVM]. The accuracy and logistic loss of XGBoost were 0.875 and 0.301, respectively, which were also the best performances. In the XGBoost model, the SHapley Additive exPlanations (SHAP) value was calculated and the result indicated that the four features: reason no cancer-directed surgery, Surg Prim Site, age, and stage group had the greatest impact on predicting the outcomes.ConclusionsThe XGBoost model and the complete dataset can be used to construct an accurate prognostic model for patients diagnosed with EC which may be applicable in clinical practice in the future.  相似文献   

6.
Coupled two-way clustering analysis of gene microarray data   总被引:27,自引:0,他引:27       下载免费PDF全文
We present a coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task. We present an algorithm, based on iterative clustering, that performs such a search. This analysis is especially suitable for gene microarray data, where the contributions of a variety of biological mechanisms to the gene expression levels are entangled in a large body of experimental data. The method was applied to two gene microarray data sets, on colon cancer and leukemia. By identifying relevant subsets of the data and focusing on them we were able to discover partitions and correlations that were masked and hidden when the full dataset was used in the analysis. Some of these partitions have clear biological interpretation; others can serve to identify possible directions for future research.  相似文献   

7.
Polychlorinated biphenyls and thyroid status in humans: a review.   总被引:3,自引:0,他引:3  
Lars Hagmar 《Thyroid》2003,13(11):1021-1028
Animal studies show that exposure to polychlorinated biphenyls (PCBs) or other persistent organochlorine compounds can disrupt thyroid hormone homeostasis. In some reports dietary exposures to PCBs have also been claimed to affect circulating levels of thyroid hormones and thyrotropin (TSH) in humans. The aim of the present study was to review available epidemiologic studies within this field. A total of 13 studies fulfilled the inclusion criteria for the review. The overall impression is a lack of consistency between studies of reported correlations, neither are there any obvious interstudy dose-response associations. Thus, it cannot presently be concluded that PCB exposure has been convincingly shown to affect thyroid hormone homeostasis in humans. On the other hand, available data do not exclude such associations. It is important to be aware of the intrinsic limitations of the cross-sectional epidemiologic studies used.  相似文献   

8.
The comprehensive properties of high-entropy alloys (HEAs) are highly-dependent on their phases. Although a large number of machine learning (ML) algorithms has been successfully applied to the phase prediction of HEAs, the accuracies among different ML algorithms based on the same dataset vary significantly. Therefore, selection of an efficient ML algorithm would significantly reduce the number and cost of the experiments. In this work, phase prediction of HEAs (PPH) is proposed by integrating criterion and machine learning recommendation method (MLRM). First, a meta-knowledge table based on characteristics of HEAs and performance of candidate algorithms is established, and meta-learning based on the meta-knowledge table is adopted to recommend an algorithm with desirable accuracy. Secondly, an MLRM based on improved meta-learning is engineered to recommend a more desirable algorithm for phase prediction. Finally, considering poor interpretability and generalization of single ML algorithms, a PPH combining the advantages of MLRM and criterion is proposed to improve the accuracy of phase prediction. The PPH is validated by 902 samples from 12 datasets, including 405 quinary HEAs, 359 senary HEAs, and 138 septenary HEAs. The experimental results shows that the PPH achieves performance than the traditional meta-learning method. The average prediction accuracy of PPH in all, quinary, senary, and septenary HEAs is 91.6%, 94.3%, 93.1%, and 95.8%, respectively.  相似文献   

9.
Zhao Y  Tian X  Li Z 《Clinical rheumatology》2007,26(9):1505-1512
It has been reported that citrullinated fibrin(ogen) deposits in the inflamed joints played an important role in the pathogenesis of rheumatoid arthritis (RA). Although antibodies to citrullinated fibrinogen (ACF) have been detected in the sera of RA patients, the associations between ACF and RA remain unclear. In this study, human fibrinogen was citrullinated by peptidylarginine deiminase in vitro, and the ACF were detected by an enzyme-linked immunosorbent assay in rheumatic patients, including 183 RA, 121 systemic lupus erythematosus, 48 osteoarthritis, and 108 healthy controls. The prevalence of ACF was determined, and the associations between ACF and RA were evaluated. It was shown that the sensitivity and specificity of ACF in RA were 67.21 and 84.84%, respectively. There were significant correlations between ACF and erythrocyte sedimentation rate, anti-cyclic citrullinated peptide antibody, and anti-keratin antibodies (AKA). In radiographic progression, the RA patients with ACF had higher scores than those without ACF according to the Sharp–van der Heijde method. In addition, ACF was often positive in the RA patients who were IgM rheumatoid factor negative or AKA negative or anti-perinuclear factor negative. The results indicate that ACF assay is helpful for the diagnosis of RA.  相似文献   

10.
Large-scale affinity purification and mass spectrometry studies have played important roles in the assembly and analysis of comprehensive protein interaction networks for lower eukaryotes. However, the development of such networks for human proteins has been slowed by the high cost and significant technical challenges associated with systematic studies of protein interactions. To address this challenge, we have developed a method for building local and focused networks. This approach couples vector algebra and statistical methods with normalized spectral counting (NSAF) derived from the analysis of affinity purifications via chromatography-based proteomics. After mathematical removal of contaminant proteins, the core components of multiprotein complexes are determined by singular value decomposition analysis and clustering. The probability of interactions within and between complexes is computed solely based upon NSAFs using Bayes' approach. To demonstrate the application of this method to small-scale datasets, we analyzed an expanded human TIP49a and TIP49b dataset. This dataset contained proteins affinity-purified with 27 different epitope-tagged components of the chromatin remodeling SRCAP, hINO80, and TRRAP/TIP60 complexes, and the nutrient sensing complex Uri/Prefoldin. Within a core network of 65 unique proteins, we captured all known components of these complexes and novel protein associations, especially in the Uri/Prefoldin complex. Finally, we constructed a probabilistic human interaction network composed of 557 protein pairs.  相似文献   

11.
Genome-wide association studies (GWASs) seek to understand the relationship between complex phenotype(s) (e.g., height) and up to millions of single-nucleotide polymorphisms (SNPs). Early analyses of GWASs are commonly believed to have “missed” much of the additive genetic variance estimated from correlations between relatives. A more recent method, genome-wide complex trait analysis (GCTA), obtains much higher estimates of heritability using a model of random SNP effects correlated between genotypically similar individuals. GCTA has now been applied to many phenotypes from schizophrenia to scholastic achievement. However, recent studies question GCTA’s estimates of heritability. Here, we show that GCTA applied to current SNP data cannot produce reliable or stable estimates of heritability. We show first that GCTA depends sensitively on all singular values of a high-dimensional genetic relatedness matrix (GRM). When the assumptions in GCTA are satisfied exactly, we show that the heritability estimates produced by GCTA will be biased and the standard errors will likely be inaccurate. When the population is stratified, we find that GRMs typically have highly skewed singular values, and we prove that the many small singular values cannot be estimated reliably. Hence, GWAS data are necessarily overfit by GCTA which, as a result, produces high estimates of heritability. We also show that GCTA’s heritability estimates are sensitive to the chosen sample and to measurement errors in the phenotype. We illustrate our results using the Framingham dataset. Our analysis suggests that results obtained using GCTA, and the results’ qualitative interpretations, should be interpreted with great caution.In recent years, genome-wide association studies (GWASs) have become an important tool for investigating the genetic contribution to complex phenotypes. These studies use statistical techniques to find associations between single nucleotide polymorphisms (SNPs) and phenotype(s) (e.g., continuous traits such as height or discrete traits such as presence/absence of a disease). A widely used measure of genetic influence on a phenotype is the (narrow-sense) heritability, defined as the ratio of the additive genetic variance to the total phenotypic variance. A major conundrum revealed by many analyses of GWAS data has been that the small number of significant associations explain much less of the heritability than is estimated from correlations between relatives [i.e., much heritability is “missing” (13)]. To address this problem, Yang et al. (4) posited that heritability is not missing but is “hidden.” The authors developed a statistical framework [genome-wide complex trait analysis (GCTA)] in which each SNP makes a random contribution to the phenotype, and these contributions are correlated between individuals who have similar genotypes. Applied to many GWASs, GCTA yields estimates of heritability far larger than those obtained using earlier analyses. GCTA has been used to estimate the heritability of many phenotypes from schizophrenia (5) to scholastic achievement (6). Despite its current wide use, recent studies (7, 8) have questioned the reliability of GCTA estimates.We show here that the results produced using GCTA hinge on accurate estimation of a high-dimensional genetic relatedness matrix (GRM). We show that even when the assumptions in GCTA are satisfied exactly, heritability estimates produced by GCTA will be biased, and it is unlikely that the confidence intervals will be accurate. When there is genetic stratification in the population, we show that GCTA’s heritability estimates are guaranteed to be unstable and unreliable, which is especially relevant because stratification is common in human GWASs.Our analysis has two other important consequences: (i) the heritability estimate produced by GCTA is sensitive to the choice of the sample used; and (ii) the estimate is sensitive to measurement errors in the phenotype. We argue that this instability and sensitivity are attributable to the fact that GCTA necessarily overfits typical GWASs. We show that a direct approach to eliminating this overfitting leads back to the small SNP heritability estimates derived previously from association studies. We illustrate our results using the Framingham dataset (9, 10) comprising information on 49,214 SNPs in 2,698 unrelated individuals.We conclude that application of GCTA to GWAS data may not reliably improve our understanding of the genomic basis of phenotypic variability. Even when the assumptions for GCTA all hold, we recommend the use of diagnostic tests, and we describe one such test. We also discuss several ways of moving toward better methods.  相似文献   

12.
Artificial neural networks are machine-learning algorithms designed to analyse data without a pre-existing hypothesis as to any associations that may exist. This technique has not previously been applied to the risk stratification of patients referred with suspected deep vein thrombosis (DVT). Current assessment is usually with a points-based clinical score, which may be combined with a D-dimer blood test. A neural network was trained to risk-stratify patients presenting with suspected DVT and its performance compared with existing tools. Data from 11 490 cases of suspected DVT presenting consecutively between 1 January 2011 and 31 December 2017 were analysed, and 7080 for whom all components of the Wells’ score, a D-dimer and an ultrasound result were available were included in the analysis. The data were broken into a training set of 5270 patients, used to develop the algorithm, and a testing set of 1810 patients to assess performance of the trained algorithm. This network was able to exclude DVT without the need for ultrasound in significantly more patients than existing risk assessment scores, whilst retaining very low false negatives rates. More generally, this approach may improve the analysis of complex data to support decision-making in other areas of clinical medicine.  相似文献   

13.
Data obtained with any research tool must be reproducible, a concept referred to as reliability. Three techniques are often used to evaluate reliability of tools using continuous data in aging research: intraclass correlation coefficients (ICC), Pearson correlations, and paired t tests. These are often construed as equivalent when applied to reliability. This is not correct, and may lead researchers to select instruments based on statistics that may not reflect actual reliability. The purpose of this paper is to compare the reliability estimates produced by these three techniques and determine the preferable technique. A hypothetical dataset was produced to evaluate the reliability estimates obtained with ICC, Pearson correlations, and paired t tests in three different situations. For each situation two sets of 20 observations were created to simulate an intrarater or inter-rater paradigm, based on 20 participants with two observations per participant. Situations were designed to demonstrate good agreement, systematic bias, or substantial random measurement error. In the situation demonstrating good agreement, all three techniques supported the conclusion that the data were reliable. In the situation demonstrating systematic bias, the ICC and t test suggested the data were not reliable, whereas the Pearson correlation suggested high reliability despite the systematic discrepancy. In the situation representing substantial random measurement error where low reliability was expected, the ICC and Pearson coefficient accurately illustrated this. The t test suggested the data were reliable. The ICC is the preferred technique to measure reliability. Although there are some limitations associated with the use of this technique, they can be overcome.  相似文献   

14.
Data obtained with any research tool must be reproducible, a concept referred to as reliability. Three techniques are often used to evaluate reliability of tools using continuous data in aging research: intraclass correlation coefficients (ICC), Pearson correlations, and paired t tests. These are often construed as equivalent when applied to reliability. This is not correct, and may lead researchers to select instruments based on statistics that may not reflect actual reliability. The purpose of this paper is to compare the reliability estimates produced by these three techniques and determine the preferable technique. A hypothetical dataset was produced to evaluate the reliability estimates obtained with ICC, Pearson correlations, and paired t tests in three different situations. For each situation two sets of 20 observations were created to simulate an intrarater or inter-rater paradigm, based on 20 participants with two observations per participant. Situations were designed to demonstrate good agreement, systematic bias, or substantial random measurement error. In the situation demonstrating good agreement, all three techniques supported the conclusionthat the data were reliable. In the situation demonstrating systematic bias, the ICC and t test suggested the data were not reliable, whereas the Pearson correlation suggested high reliability despite the systematic discrepancy. In the situation representing substantial randommeasurement error where low reliability was expected, the ICC and Pearson coefficient accurately illustrated this. The t test suggested the data were reliable. The ICC is the preferred technique to measure reliability. Although there are some limitations associated with the use of this technique, they can be overcome.  相似文献   

15.
Interest has recently been expressed in developing an Australian adult cardiac surgical registry. Complete national registries of adult cardiac surgery have already been established in many European countries, the USA, Canada and elsewhere. Participating centres contributing to a national registry benefit by being able to benchmark themselves against norms for their particular country. A risk-adjusted database can help surgeons advise their patients of the chances of a good operative outcome. For a surgeon or a surgical unit, the only way to obtain a relevant risk model is to use their own data, and data from units in their particular country. It is also useful to have comparative data from other national registries to compare one's own country with international benchmarks. Since 1996, the European Cardiac Surgical Registry (ECSUR) has put considerable effort into producing unified datasets, harmonised with each other for worldwide use. In 1997, ECSUR launched a minimum cardiac surgical dataset. The worldwide launch of the full international adult cardiac surgical dataset is scheduled for July/August 2000. This dataset would be highly useful for application in Australia. The ECSUR organisation has the capability to analyse data from other countries and could perform this for Australia if requested. However, a better approach would be a national centre in Australia. Funding for national registries around the world has been obtained from Ministries of Health, participating surgical centres, and surgical software vendors. If an Australian national registry is indeed established it will find a ready-made, highly appropriate international cardiac surgical dataset sponsored by ECSUR and the Society of Thoracic Surgeons waiting for adoption by Australia.  相似文献   

16.
Pharmacologic therapy to achieve rate control in patients with atrial fibrillation is often difficult and inadequate. For this reason, ventricular pacing strategies have been developed as an alternative to drug therapy to alleviate symptoms due to rapid and irregular ventricular rates. Ventricular pacing in combination with AV junctional ablation provides palliative improvement in a wide range of clinical outcomes. Because of the irreversible complete AV block associated with this procedure, strategies to control the ventricular response to atrial fibrillation by ventricular pacing alone have been investigated. These strategies are primarily directed at regularizing the ventricular response by pacing at or near the mean intrinsically conducted ventricular rate. These specialized ventricular pacing algorithms provide striking ventricular regularity at rest but may be less effective during activity. No study has yet demonstrated clinically significant improvements in clinical outcomes with these algorithms. The clinical benefits of rate regularization alone without the strict rate control provided by AV junctional ablation are likely to be very limited. Other device based approaches to control ventricular rate in atrial fibrillation include transvenous vagal stimulation. This strategy is in early stages of development but may be promising.  相似文献   

17.
Novel targeted therapies improve the survival of specific subgroups (defined by genetic variants) of patients with acute myeloid leukemia (AML), validating the paradigm of molecularly targeted therapy. However, identifying correlations between AML molecular attributes and effective therapies is challenging. Recent advances in highthroughput, in vitro drug sensitivity screening applied to primary AML blasts were used to uncover such correlations; however, these methods cannot predict the response of leukemic stem cells. Our study aimed to predict in vitro response to targeted therapies, based on molecular markers, with subsequent validation in leukemic stem cells. We performed ex vivo screening of sensitivity to 46 drugs on 29 primary AML samples at diagnosis or relapse. Using unsupervised hierarchical clustering analysis we identified a group with sensitivity to several tyrosine kinase inhibitors, including the multi-tyrosine kinase inhibitor, dasatinib, and searched for correlations between the response to dasatinib, exome sequencing and gene expression in our dataset and in the Beat AML dataset. Unsupervised hierarchical clustering analysis of gene expression resulted in clustering of dasatinib responders and non-responders. In vitro response to dasatinib could be predicted based on gene expression (area under the curve=0.78). Furthermore, mutations in FLT3/ITD and PTPN11 were enriched in the dasatinib-sensitive samples as opposed to mutations in TP53 which were enriched in resistant samples. Based on these results, we selected FLT3/ITD AML samples and injected them into NSG-SGM3 mice. Our results demonstrate that in a subgroup of FLT3/ITD AML (4 out of 9) dasatinib significantly inhibited leukemic stem cell engraftment. In summary we show that dasatinib has an anti-leukemic effect both on bulk blasts and, more importantly, on leukemic stem cells from a subset of AML patients that can be identified based on mutational and expression profiles. Our data provide a rational basis for clinical trials of dasatinib in a molecularly selected subset of AML patients.  相似文献   

18.
The stability, activity, and solubility of a protein sequence are determined by a delicate balance of molecular interactions in a variety of conformational states. Even so, most computational protein design methods model sequences in the context of a single native conformation. Simulations that model the native state as an ensemble have been mostly neglected due to the lack of sufficiently powerful optimization algorithms for multistate design. Here, we have applied our multistate design algorithm to study the potential utility of various forms of input structural data for design. To facilitate a more thorough analysis, we developed new methods for the design and high-throughput stability determination of combinatorial mutation libraries based on protein design calculations. The application of these methods to the core design of a small model system produced many variants with improved thermodynamic stability and showed that multistate design methods can be readily applied to large structural ensembles. We found that exhaustive screening of our designed libraries helped to clarify several sources of simulation error that would have otherwise been difficult to ascertain. Interestingly, the lack of correlation between our simulated and experimentally measured stability values shows clearly that a design procedure need not reproduce experimental data exactly to achieve success. This surprising result suggests potentially fruitful directions for the improvement of computational protein design technology.  相似文献   

19.
20.
Although several costing instruments have been previously developed, few have been validated or applied systematically to the delivery of evidence-based practices (EBPs). Using data collected from 26 organizations implementing the same EBP, this article examines the reliability, validity, and applicability of the brief Treatment Cost Analysis Tool (TCAT-Lite). The TCAT-Lite demonstrated good reliability—correlations between replications averaged 0.61. Validity also was high, with correlation of treated episodes per $100,000 between the TCAT-Lite and independent data of 0.57. In terms of applicability, cost calculations found that if all organizations had operated at optimal scale (124 client episodes per year), existing funds could have supported 64% more clients.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号