首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Imaging technology and machine learning algorithms for disease classification set the stage for high-throughput phenotyping and promising new avenues for genome-wide association studies (GWAS). Despite emerging algorithms, there has been no successful application in GWAS so far. We establish machine learning-based phenotyping in genetic association analysis as misclassification problem. To evaluate chances and challenges, we performed a GWAS based on automatically classified age-related macular degeneration (AMD) in UK Biobank (images from 135,500 eyes; 68,400 persons). We quantified misclassification of automatically derived AMD in internal validation data (4,001 eyes; 2,013 persons) and developed a maximum likelihood approach (MLA) to account for it when estimating genetic association. We demonstrate that our MLA guards against bias and artifacts in simulation studies. By combining a GWAS on automatically derived AMD and our MLA in UK Biobank data, we were able to dissect true association (ARMS2/HTRA1, CFH) from artifacts (near HERC2) and identified eye color as associated with the misclassification. On this example, we provide a proof-of-concept that a GWAS using machine learning-derived disease classification yields relevant results and that misclassification needs to be considered in analysis. These findings generalize to other phenotypes and emphasize the utility of genetic data for understanding misclassification structure of machine learning algorithms.  相似文献   

2.
Deep learning is a class of machine learning algorithms that are popular for building risk prediction models. When observations are censored, the outcomes are only partially observed and standard deep learning algorithms cannot be directly applied. We develop a new class of deep learning algorithms for outcomes that are potentially censored. To account for censoring, the unobservable loss function used in the absence of censoring is replaced by a censoring unbiased transformation. The resulting class of algorithms can be used to estimate both survival probabilities and restricted mean survival. We show how the deep learning algorithms can be implemented by adapting software for uncensored data by using a form of response transformation. We provide comparisons of the proposed deep learning algorithms to existing risk prediction algorithms for predicting survival probabilities and restricted mean survival through both simulated datasets and analysis of data from breast cancer patients.  相似文献   

3.
目的 比较两种基于偏最小二乘法的分类模型对肿瘤基因表达数据行多分类分析的效果,比较不同差异基因选取方法对分类结果的影响.方法 利用NCI60等4个肿瘤基因表达数据库,通过4种不同方法选取差异表达基因,在此基础上,用两种基于偏最小二乘的方法行多分类分析.一是偏最小二乘线性判别,首先运用偏最小二乘法行降维,再利用降维得到的成分作为输入变量作线性判别分析;二是偏最小二乘判别分析,利用偏最小二乘回归直接进行分类.分类效果采用留一法和10倍交叉验证法进行评价.结果 偏最小二乘判别分析的分类效果略优于偏最小二乘降维后的线性判别.以变量重要性指标选取差异表达基因时分类效果较好,其次是SAM法.结论 在对肿瘤基因表达数据行多分类分析时,偏最小二乘法既是一种高效的降维方法,也是一种实用的分类方法.  相似文献   

4.
In the analysis of trends in health outcomes, an ongoing issue is how to separate and estimate the effects of age, period, and cohort. As these 3 variables are perfectly collinear by definition, regression coefficients in a general linear model are not unique. In this tutorial, we review why identification is a problem, and how this problem may be tackled using partial least squares and principal components regression analyses. Both methods produce regression coefficients that fulfill the same collinearity constraint as the variables age, period, and cohort. We show that, because the constraint imposed by partial least squares and principal components regression is inherent in the mathematical relation among the 3 variables, this leads to more interpretable results. We use one dataset from a Taiwanese health-screening program to illustrate how to use partial least squares regression to analyze the trends in body heights with 3 continuous variables for age, period, and cohort. We then use another dataset of hepatocellular carcinoma mortality rates for Taiwanese men to illustrate how to use partial least squares regression to analyze tables with aggregated data. We use the second dataset to show the relation between the intrinsic estimator, a recently proposed method for the age-period-cohort analysis, and partial least squares regression. We also show that the inclusion of all indicator variables provides a more consistent approach. R code for our analyses is provided in the eAppendix.  相似文献   

5.
Array comparative genomic hybridization (aCGH) provides a genome‐wide information of DNA copy number that is potentially useful for disease classification. One immediate problem is that the data contain many features (probes) but only a few samples. Existing approaches to overcome this problem include features selection, ridge regression and partial least squares. However, these methods typically ignore the spatial characteristic of aCGH data. To explicitly make use of this spatial information we develop a procedure called smoothed logistic regression (SLR) model. The procedure is based on a mixed logistic regression model, where the random component is a mixture distribution that controls smoothness and sparseness. Conceptually such a procedure is straightforward, but its implementation is complicated due to computational problems. We develop a fast and reliable iterative weighted least‐squares algorithm based on the singular value decomposition. Simulated data and two real data sets are used to illustrate the procedure. For real data sets, error rates are calculated using the leave‐one‐out cross validation procedure. For both simulated and real data examples, SLR achieves better misclassification error rates compared with previous methods. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

6.
This work studies a new survival modeling technique based on least‐squares support vector machines. We propose the use of a least‐squares support vector machine combining ranking and regression. The advantage of this kernel‐based model is threefold: (i) the problem formulation is convex and can be solved conveniently by a linear system; (ii) non‐linearity is introduced by using kernels, componentwise kernels in particular are useful to obtain interpretable results; and (iii) introduction of ranking constraints makes it possible to handle censored data. In an experimental setup, the model is used as a preprocessing step for the standard Cox proportional hazard regression by estimating the functional forms of the covariates. The proposed model was compared with different survival models from the literature on the clinical German Breast Cancer Study Group data and on the high‐dimensional Norway/Stanford Breast Cancer Data set. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

7.
针对2型糖尿病(T2DM)并发症的诊断预测问题,传统检测方法主要通过血液和尿液检查来预测,这些方法既耗时又不能进行早期预测.目前,由于糖尿病发病率升高以及医疗数据的大幅增加,机器学习算法迅速发展为检测及诊断糖尿病的有效方法.用机器学习算法分析临床指标,探究2型糖尿病并发症的影响因素,构建并发症预测模型,可以很好地实现糖...  相似文献   

8.
目的:构建乳腺癌死亡率预测模型。方法:基于硒、铜、锌、镉、铬、锰和砷7种元素日常摄入量和自适应增强算偏最小二乘法,建立预测乳腺癌死亡率的集成模型,并对其性能进行评价。结果:集成模型的预测精度明显优于单个偏最小二乘模型。结论:自适应增强策略在该类任务中是一个有力工具。  相似文献   

9.
10.
ObjectivesThis paper uses deep (machine) learning techniques to develop and test how motor behaviors, derived from location and movement sensor tracking data, may be associated with falls, delirium, and urinary tract infections (UTIs) in long-term care (LTC) residents.DesignLongitudinal observational study.Setting and ParticipantsA total of 23 LTC residents (81,323 observations) with cognitive impairment or dementia in 2 northeast Department of Veterans Affairs LTC facilities.MethodsMore than 18 months of continuous (24/7) monitoring of motor behavior and activity levels used objective radiofrequency identification sensor data to track and record movement data. Occurrence of acute events was recorded each week. Unsupervised deep learning models were used to classify motor behaviors into 5 clusters; supervised decision tree algorithms used these clusters to predict acute health events (falls, delirium, and UTIs) the week before the week of the event.ResultsMotor behaviors were classified into 5 categories (Silhouette score = 0.67), and these were significantly different from each other. Motor behavior classifications were sensitive and specific to falls, delirium, and UTI predictions 1 week before the week of the event (sensitivity range = 0.88–0.91; specificity range = 0.71–0.88).Conclusion and ImplicationsIntraindividual changes in motor behaviors predict some of the most common and detrimental acute events in LTC populations. Study findings suggest real-time locating system sensor data and machine learning techniques may be used in clinical applications to effectively prevent falls and lead to the earlier recognition of risk for delirium and UTIs in this vulnerable population.  相似文献   

11.
Rare cases are a central problem when an expert system is constructed from example cases with machine learning techniques. It is difficult to make a decision support system (DSS) to cover all possible clinical cases. An inductive learning program can be used to construct an expert system for detecting cases that differ from routine cases. The ID3 algorithm and the pessimistic pruning algorithm were tested in this study: a DSS was built directly from the data of patient records. A decision tree was generated, and the cases misclassified by the decision tree as compared with the classifications of a clinician were listed on a checklist, which formed the feedback to the clinician. In clinical situations about 5-10% of functional thyroid disorders may be misclassified. At this error level, the method found over 90% of the errors with a specificity of 95%. In simple medical classification tasks this dynamic self-learning system can be used to create a DSS that can assist in the quality control of clinical decision making.  相似文献   

12.
Linking geospatial neighbourhood design characteristics to health and behavioural data from population-representative cohorts is limited by data availability and difficulty collecting information on environmental characteristics (e.g. greenery, building setbacks, dwelling structure). As an alternative, this study examined the feasibility of Generative Adversarial Networks (GANs) – machine learning – to measure neighbourhood design using ‘street view’ and aerial imagery to explore the relationship between the built environment and physical function. This study included 3102 adults aged 45 years and older clustered in 200 neighbourhoods in 2016 from the How Areas in Brisbane Influence Health and Activity (HABITAT) project in Brisbane, Australia. Exposure data were Google Street View and Google Maps images from within the 200 neighbourhoods, and outcome data were self-reported physical function using the PF-10 (a subset of the SF-36). Physical function scores were aggregated to the neighbourhood level, and the highest and lowest 20 neighbourhoods respectively were used in analysis. We found that the aerial imagery retrieved was unable to be used to adequately train the model, meaning that aerial imagery failed to produce meaningful results. Of the street view images, n = 56,330 images were downloaded and used to train the GAN model. Model outputs included augmented street view images between neighbourhoods classed as having high function and low function residents. The GAN model detected differences in neighbourhood design characteristics between neighbourhoods classed as high and low physical function at the aggregate level. Specifically, differences were identified in urban greenery (including tree heights) and dwelling structure (e.g. building height). This study provides important lessons for future work in this field, especially related to the uniqueness, diversity and amount of imagery required for successful applications of deep learning methods.  相似文献   

13.
目的研究用于处理解释变量与反应变量之间非线性关系或复杂关系的一种基于核函数的回归方法:核偏最小二乘回归。方法运用Monte-Carlo模拟方法,对核偏最小二乘回归的模型拟合效果和预测效果予以分析。结果模拟试验结果表明:核偏最小二乘回归估计性能均较高。结论核偏最小二乘回归是基于核函数的非线性回归方法,模型构建基于样本,而非解释变量空间,该方法特别适合于处理医学研究中各种类型资料,能够有效地处理解释变量与反应变量之间的非线性关系或复杂关系等方面。  相似文献   

14.
Brain computer interface (BCI) is a new communication way between man and machine. It identifies mental task patterns stored in electroencephalogram (EEG). So, it extracts brain electrical activities recorded by EEG and transforms them machine control commands. The main goal of BCI is to make available assistive environmental devices for paralyzed people such as computers and makes their life easier. This study deals with feature extraction and mental task pattern recognition on 2-D cursor control from EEG as offline analysis approach. The hemispherical power density changes are computed and compared on alpha–beta frequency bands with only mental imagination of cursor movements. First of all, power spectral density (PSD) features of EEG signals are extracted and high dimensional data reduced by principle component analysis (PCA) and independent component analysis (ICA) which are statistical algorithms. In the last stage, all features are classified with two types of support vector machine (SVM) which are linear and least squares (LS-SVM) and three different artificial neural network (ANN) structures which are learning vector quantization (LVQ), multilayer neural network (MLNN) and probabilistic neural network (PNN) and mental task patterns are successfully identified via k-fold cross validation technique.  相似文献   

15.
Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However, a data‐adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V‐fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS‐MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

16.
The current study examined the impact of a censored independent variable, after adjusting for a second independent variable, when estimating regression coefficients using ‘naïve’ ordinary least squares (OLS), ‘partial’ OLS and full‐likelihood models. We used Monte Carlo simulations to determine the bias associated with all three regression methods. We demonstrated that substantial bias was introduced in the estimation of the regression coefficient associated with the variable subject to a ceiling effect when naïve OLS regression was used. Furthermore, minor bias was transmitted to the estimation of the regression coefficient associated with the second independent variable. High correlation between the two independent variables improved estimation of the censored variable's coefficient at the expense of estimation of the other coefficient. The use of ‘partial’ OLS and maximum‐likelihood estimation were shown to result in, at most, negligible bias in estimation. Furthermore, we demonstrated that the full‐likelihood method was robust under misspecification of the joint distribution of the independent random variables. Lastly, we provided an empirical example using National Population Health Survey (NPHS) data to demonstrate the practical implications of our main findings and the simple methods available to circumvent the bias identified in the Monte Carlo simulations. Our results suggest that researchers need to be aware of the bias associated with the use of naïve ordinary least‐squares estimation when estimating regression models in which at least one independent variable is subject to a ceiling effect. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

17.
Purpose To develop a classification algorithm and accompanying computer-based clinical decision support tool to help categorize injured workers toward optimal rehabilitation interventions based on unique worker characteristics. Methods Population-based historical cohort design. Data were extracted from a Canadian provincial workers’ compensation database on all claimants undergoing work assessment between December 2009 and January 2011. Data were available on: (1) numerous personal, clinical, occupational, and social variables; (2) type of rehabilitation undertaken; and (3) outcomes following rehabilitation (receiving time loss benefits or undergoing repeat programs). Machine learning, concerned with the design of algorithms to discriminate between classes based on empirical data, was the foundation of our approach to build a classification system with multiple independent and dependent variables. Results The population included 8,611 unique claimants. Subjects were predominantly employed (85 %) males (64 %) with diagnoses of sprain/strain (44 %). Baseline clinician classification accuracy was high (ROC = 0.86) for selecting programs that lead to successful return-to-work. Classification performance for machine learning techniques outperformed the clinician baseline classification (ROC = 0.94). The final classifiers were multifactorial and included the variables: injury duration, occupation, job attachment status, work status, modified work availability, pain intensity rating, self-rated occupational disability, and 9 items from the SF-36 Health Survey. Conclusions The use of machine learning classification techniques appears to have resulted in classification performance better than clinician decision-making. The final algorithm has been integrated into a computer-based clinical decision support tool that requires additional validation in a clinical sample.  相似文献   

18.
19.
In most nonrandomized observational studies, differences between treatment groups may arise not only due to the treatment but also because of the effect of confounders. Therefore, causal inference regarding the treatment effect is not as straightforward as in a randomized trial. To adjust for confounding due to measured covariates, the average treatment effect is often estimated by using propensity scores. Typically, propensity scores are estimated by logistic regression. More recent suggestions have been to employ nonparametric classification algorithms from machine learning. In this article, we propose a weighted estimator combining parametric and nonparametric models. Some theoretical results regarding consistency of the procedure are given. Simulation studies are used to assess the performance of the newly proposed methods relative to existing methods, and a data analysis example from the Surveillance, Epidemiology and End Results database is presented.  相似文献   

20.
BackgroundArtificial Intelligence (AI) has great potential to transform health systems to improve the quality of healthcare services. However, AI is still new in Tanzania, and there is limited knowledge about the application of AI technology in the Tanzanian health sector.ObjectivesThis study aims to explore the current status, challenges, and opportunities for AI application in the health system in Tanzania.MethodsA scoping review was conducted using the Preferred Reporting Items for Systematic Review and Meta-Analysis Extensions for Scoping Review (PRISMA-ScR). We searched different electronic databases such as PubMed, Embase, African Journal Online, and Google Scholar.ResultsEighteen (18) studies met the inclusion criteria out of 2,017 studies from different electronic databases and known AI-related project websites. Amongst AI-driven solutions, the studies mostly used machine learning (ML) and deep learning for various purposes, including prediction and diagnosis of diseases and vaccine stock optimisation. The most commonly used algorithms were conventional machine learning, including Random Forest and Neural network, Naive Bayes K-Nearest Neighbour and Logistic regression.ConclusionsThis review shows that AI-based innovations may have a role in improving health service delivery, including early outbreak prediction and detection, disease diagnosis and treatment, and efficient management of healthcare resources in Tanzania. Our results indicate the need for developing national AI policies and regulatory frameworks for adopting responsible and ethical AI solutions in the health sector in accordance with the World Health Organisation (WHO) guidance on ethics and governance of AI for health.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号