首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
In clinical studies with time‐to‐event as a primary endpoint, one main interest is to find the best treatment strategy to maximize patients' mean survival time. Due to patient's heterogeneity in response to treatments, great efforts have been devoted to developing optimal treatment regimes by integrating individuals' clinical and genetic information. A main challenge arises in the selection of important variables that can help to build reliable and interpretable optimal treatment regimes as the dimension of predictors may be high. In this paper, we propose a robust loss‐based estimation framework that can be easily coupled with shrinkage penalties for both estimation of optimal treatment regimes and variable selection. The asymptotic properties of the proposed estimators are studied. Moreover, a model‐free estimator of restricted mean survival time under the derived optimal treatment regime is developed, and its asymptotic property is studied. Simulations are conducted to assess the empirical performance of the proposed method for parameter estimation, variable selection, and optimal treatment decision. An application to an AIDS clinical trial data set is given to illustrate the method. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

2.
Motivated by high‐throughput profiling studies in biomedical research, variable selection methods have been a focus for biostatisticians. In this paper, we consider semiparametric varying‐coefficient accelerated failure time models for right censored survival data with high‐dimensional covariates. Instead of adopting the traditional regularization approaches, we offer a novel sparse boosting (SparseL2Boosting) algorithm to conduct model‐based prediction and variable selection. One main advantage of this new method is that we do not need to perform the time‐consuming selection of tuning parameters. Extensive simulations are conducted to examine the performance of our sparse boosting feature selection techniques. We further illustrate our methods using a lung cancer data analysis.  相似文献   

3.

Objectives

Classification of breast cancer patients into different risk classes is very important in clinical applications. It is estimated that the advent of high-dimensional gene expression data could improve patient classification. In this study, a new method for transforming the high-dimensional gene expression data in a low-dimensional space based on wavelet transform (WT) is presented.

Methods

The proposed method was applied to three publicly available microarray data sets. After dimensionality reduction using supervised wavelet, a predictive support vector machine (SVM) model was built upon the reduced dimensional space. In addition, the proposed method was compared with the supervised principal component analysis (PCA).

Results

The performance of supervised wavelet and supervised PCA based on selected genes were better than the signature genes identified in the other studies. Furthermore, the supervised wavelet method generally performed better than the supervised PCA for predicting the 5-year survival status of patients with breast cancer based on microarray data. In addition, the proposed method had a relatively acceptable performance compared with the other studies.

Conclusion

The results suggest the possibility of developing a new tool using wavelets for the dimension reduction of microarray data sets in the classification framework.  相似文献   

4.
The objective of finding a parsimonious representation of the observed data by a statistical model that is also capable of accurate prediction is commonplace in all domains of statistical applications. The parsimony of the solutions obtained by variable selection is usually counterbalanced by a limited prediction capacity. On the other hand, methodologies that assure high prediction accuracy usually lead to models that are neither simple nor easily interpretable. Regularization methodologies have proven to be useful in addressing both prediction and variable selection problems. The Bayesian approach to regularization constitutes a particularly attractive alternative as it is suitable for high-dimensional modeling, offers valid standard errors, and enables simultaneous estimation of regression coefficients and complexity parameters via computationally efficient MCMC techniques. Bayesian regularization falls within the versatile framework of Bayesian hierarchical models, which encompasses a variety of other approaches suited for variable selection such as spike and slab models and the MC(3) approach. In this article, we review these Bayesian developments and evaluate their variable selection performance in a simulation study for the classical small p large n setting. The majority of the existing Bayesian methodology for variable selection deals only with classical linear regression. Here, we present two applications in the contexts of binary and survival regression, where the Bayesian approach was applied to select markers prognostically relevant for the development of rheumatoid arthritis and for overall survival in acute myeloid leukemia patients.  相似文献   

5.
临床结局观察性研究中的领先时间偏倚及控制   总被引:1,自引:1,他引:0       下载免费PDF全文
临床结局观察性研究中可能存在领先时间偏倚。本文以筛检诊断试验对肿瘤患者生存时间的影响研究、高效抗反转录病毒治疗对HIV/AIDS生存时间的影响研究,两个实例阐述领先时间偏倚的概念、产生原因及控制,为准确分析评价检测、治疗等干预措施对存在多个疾病病程或分期疾病的作用效果时提供控制该偏倚的思路和方法。  相似文献   

6.
Many complex diseases are known to be affected by the interactions between genetic variants and environmental exposures beyond the main genetic and environmental effects. Study of gene-environment (G×E) interactions is important for elucidating the disease etiology. Existing Bayesian methods for G×E interaction studies are challenged by the high-dimensional nature of the study and the complexity of environmental influences. Many studies have shown the advantages of penalization methods in detecting G×E interactions in “large p, small n” settings. However, Bayesian variable selection, which can provide fresh insight into G×E study, has not been widely examined. We propose a novel and powerful semiparametric Bayesian variable selection model that can investigate linear and nonlinear G×E interactions simultaneously. Furthermore, the proposed method can conduct structural identification by distinguishing nonlinear interactions from main-effects-only case within the Bayesian framework. Spike-and-slab priors are incorporated on both individual and group levels to identify the sparse main and interaction effects. The proposed method conducts Bayesian variable selection more efficiently than existing methods. Simulation shows that the proposed model outperforms competing alternatives in terms of both identification and prediction. The proposed Bayesian method leads to the identification of main and interaction effects with important implications in a high-throughput profiling study with high-dimensional SNP data.  相似文献   

7.
目的 比较L1正则化、L2正则化和弹性网三种惩罚logistic回归对SNPs数据的变量筛选能力。 方法 根据所设置的参数生成不同条件的SNPs仿真数据,利用正确率、错误率和正确指数从三个方面评价三种惩罚logistic回归的变量筛选能力。 结果 正确率表现为L2正则化惩罚logistic回归>弹性网惩罚logistic回归>L1正则化惩罚logistic回归;错误率表现为L2正则化惩罚logistic回归>弹性网惩罚logistic回归>L1正则化惩罚logistic回归;正确指数则表现为弹性网惩罚logistic回归>L1正则化惩罚logistic回归>L2正则化惩罚logistic回归。 结论 综合来看弹性网的筛选能力更优,弹性网融合L1、L2两种正则化的思想,在高维数据分析中既能保证模型的稀疏性,便于结果的解释,又解决了具有相关性自变量不能同时进入模型的问题。  相似文献   

8.
Linear mixed models (LMMs) and their extensions have been widely used for high-dimensional genomic data analyses. While LMMs hold great promise for risk prediction research, the high dimensionality of the data and different effect sizes of genomic regions bring great analytical and computational challenges. In this work, we present a multikernel linear mixed model with adaptive lasso (KLMM-AL) to predict phenotypes using high-dimensional genomic data. We develop two algorithms for estimating parameters from our model and also establish the asymptotic properties of LMM with adaptive lasso when only one dependent observation is available. The proposed KLMM-AL can account for heterogeneous effect sizes from different genomic regions, capture both additive and nonadditive genetic effects, and adaptively and efficiently select predictive genomic regions and their corresponding effects. Through simulation studies, we demonstrate that KLMM-AL outperforms most of existing methods. Moreover, KLMM-AL achieves high sensitivity and specificity of selecting predictive genomic regions. KLMM-AL is further illustrated by an application to the sequencing dataset obtained from the Alzheimer's disease neuroimaging initiative.  相似文献   

9.
Models for the ordered multiple categorical (OMC) response variable have already been extensively established and widely applied, but few studies have investigated linear regression problems with OMC predictors, especially in high-dimensional situations. In such settings, the pseudocategories of the discrete variable and other irrelevant explanatory variables need to be automatically selected. This paper introduces a transformation method of dummy variables for such OMC predictors, an L1 penalty regression method is proposed based on the transformation. Model selection consistency of the proposed method is derived under some common assumptions for high-dimensional situation. Both simulation studies and real data analysis present good performance of this method, showing its wide applicability in relevant regression analysis.  相似文献   

10.
Resampling techniques are often used to provide an initial assessment of accuracy for prognostic prediction models developed using high-dimensional genomic data with binary outcomes. Risk prediction is most important, however, in medical applications and frequently the outcome measure is a right-censored time-to-event variable such as survival. Although several methods have been developed for survival risk prediction with high-dimensional genomic data, there has been little evaluation of the use of resampling techniques for the assessment of such models. Using real and simulated datasets, we compared several resampling techniques for their ability to estimate the accuracy of risk prediction models. Our study showed that accuracy estimates for popular resampling methods, such as sample splitting and leave-one-out cross validation (Loo CV), have a higher mean square error than for other methods. Moreover, the large variability of the split-sample and Loo CV may make the point estimates of accuracy obtained using these methods unreliable and hence should be interpreted carefully. A k-fold cross-validation with k = 5 or 10 was seen to provide a good balance between bias and variability for a wide range of data settings and should be more widely adopted in practice.  相似文献   

11.
Patients with cancer or other recurrent diseases may undergo a long process of initial treatment, disease recurrences, and salvage treatments. It is important to optimize the multi‐stage treatment sequence in this process to maximally prolong patients' survival. Comparing disease‐free survival for each treatment stage over penalizes disease recurrences but under penalizes treatment‐related mortalities. Moreover, treatment regimes used in practice are dynamic; that is, the choice of next treatment depends on a patient's responses to previous therapies. In this article, using accelerated failure time models, we develop a method to optimize such dynamic treatment regimes. This method utilizes all the longitudinal data collected during the multi‐stage process of disease recurrences and treatments, and identifies the optimal dynamic treatment regime for each individual patient by maximizing his or her expected overall survival. We illustrate the application of this method using data from a study of acute myeloid leukemia, for which the optimal treatment strategies for different patient subgroups are identified. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

12.
Identification of biomarkers is an emerging area in oncology. In this article, we develop an efficient statistical procedure for the classification of protein markers according to their effect on cancer progression. A high-dimensional time-course dataset of protein markers for 80 patients motivates us for developing the model. The threshold value is formulated as a level of a marker having maximum impact on cancer progression. The classification algorithm technique for high-dimensional time-course data is developed and the algorithm is validated by comparing random components using both proportional hazard and accelerated failure time frailty models. The study elucidates the application of two separate joint modeling techniques using auto regressive-type model and mixed effect model for time-course data and proportional hazard model for survival data with proper utilization of Bayesian methodology. Also, a prognostic score is developed on the basis of few selected genes with application on patients. This study facilitates to identify relevant biomarkers from a set of markers.  相似文献   

13.
In high‐throughput cancer genomic studies, markers identified from the analysis of single datasets may have unsatisfactory properties because of low sample sizes. Integrative analysis pools and analyzes raw data from multiple studies, and can effectively increase sample size and lead to improved marker identification results. In this study, we consider the integrative analysis of multiple high‐throughput cancer prognosis studies. In the existing integrative analysis studies, the interplay among genes, which can be described using the network structure, has not been effectively accounted for. In network analysis, tightly connected nodes (genes) are more likely to have related biological functions and similar regression coefficients. The goal of this study is to develop an analysis approach that can incorporate the gene network structure in integrative analysis. To this end, we adopt an AFT (accelerated failure time) model to describe survival. A weighted least squares approach, which has low computational cost, is adopted for estimation. For marker selection, we propose a new penalization approach. The proposed penalty is composed of two parts. The first part is a group MCP penalty, and conducts gene selection. The second part is a Laplacian penalty, and smoothes the differences of coefficients for tightly connected genes. A group coordinate descent approach is developed to compute the proposed estimate. Simulation study shows satisfactory performance of the proposed approach when there exist moderate‐to‐strong correlations among genes. We analyze three lung cancer prognosis datasets, and demonstrate that incorporating the network structure can lead to the identification of important genes and improved prediction performance.  相似文献   

14.
Variable selection has been discussed under many contexts and especially, a large literature has been established for the analysis of right-censored failure time data. In this article, we discuss an interval-censored failure time situation where there exist two sets of covariates with one being low-dimensional and having possible nonlinear effects and the other being high-dimensional. For the problem, we present a penalized estimation procedure for simultaneous variable selection and estimation, and in the method, Bernstein polynomials are used to approximate the involved nonlinear functions. Furthermore, for implementation, a coordinate-wise optimization algorithm, which can accommodate most commonly used penalty functions, is developed. A numerical study is performed for the evaluation of the proposed approach and suggests that it works well in practical situations. Finally the method is applied to an Alzheimer's disease study that motivated this investigation.  相似文献   

15.
In this article, we study the estimation of high-dimensional single index models when the response variable is censored. We hybrid the estimation methods for high-dimensional single-index models (but without censorship) and univariate nonparametric models with randomly censored responses to estimate the index parameters and the link function and apply the proposed methods to analyze a genomic dataset from a study of diffuse large B-cell lymphoma. We evaluate the finite sample performance of the proposed procedures via simulation studies and establish large sample theories for the proposed estimators of the index parameter and the nonparametric link function under certain regularity conditions.  相似文献   

16.
Risk prediction procedures can be quite useful for the patient's treatment selection, prevention strategy, or disease management in evidence‐based medicine. Often, potentially important new predictors are available in addition to the conventional markers. The question is how to quantify the improvement from the new markers for prediction of the patient's risk in order to aid cost–benefit decisions. The standard method, using the area under the receiver operating characteristic curve, to measure the added value may not be sensitive enough to capture incremental improvements from the new markers. Recently, some novel alternatives to area under the receiver operating characteristic curve, such as integrated discrimination improvement and net reclassification improvement, were proposed. In this paper, we consider a class of measures for evaluating the incremental values of new markers, which includes the preceding two as special cases. We present a unified procedure for making inferences about measures in the class with censored event time data. The large sample properties of our procedures are theoretically justified. We illustrate the new proposal with data from a cancer study to evaluate a new gene score for prediction of the patient's survival. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

17.
The wide availability of multi‐dimensional genomic data has spurred increasing interests in integrating multi‐platform genomic data. Integrative analysis of cancer genome landscape can potentially lead to deeper understanding of the biological process of cancer. We integrate epigenetics (DNA methylation and microRNA expression) and gene expression data in tumor genome to delineate the association between different aspects of the biological processes and brain tumor survival. To model the association, we employ a flexible semiparametric linear transformation model that incorporates both the main effects of these genomic measures as well as the possible interactions among them. We develop variance component tests to examine different coordinated effects by testing various subsets of model coefficients for the genomic markers. A Monte Carlo perturbation procedure is constructed to approximate the null distribution of the proposed test statistics. We further propose omnibus testing procedures to synthesize information from fitting various parsimonious sub‐models to improve power. Simulation results suggest that our proposed testing procedures maintain proper size under the null and outperform standard score tests. We further illustrate the utility of our procedure in two genomic analyses for survival of glioblastoma multiforme patients. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

18.
Integrative analysis of high dimensional omics datasets has been studied by many authors in recent years. By incorporating prior known relationships among the variables, these analyses have been successful in elucidating the relationships between different sets of omics data. In this article, our goal is to identify important relationships between genomic expression and cytokine data from a human immunodeficiency virus vaccine trial. We proposed a flexible partial least squares technique, which incorporates group and subgroup structure in the modelling process. Our new method accounts for both grouping of genetic markers (eg, gene sets) and temporal effects. The method generalises existing sparse modelling techniques in the partial least squares methodology and establishes theoretical connections to variable selection methods for supervised and unsupervised problems. Simulation studies are performed to investigate the performance of our methods over alternative sparse approaches. Our R package sgspls is available at https://github.com/matt‐sutton/sgspls .  相似文献   

19.
Biomarkers that can help identify patients who will have an early clinical benefit from a treatment are important not only for patients' survival and quality of life, but also for the cost of health care. Owing to reasons such as biological variation and limited machine precision, biomarkers are sometimes measured with large errors. Adjusting for the measurement error in calculating the proportion of the treatment effect explained by markers has been a subject of research. The proportion of information gain (PIG), a new quantity to measure the importance of a biomarker, has not yet been studied for variables measured with error. In this article, we provide methods to account for the measurement error in the calculation of PIG for continuous, binary and time‐to‐event outcomes. Simulation shows that the adjusted estimator has little bias and has less variability compared to the naive estimator ignoring the measurement error. Data from an osteoporosis clinical study are used to illustrate the method for a binary outcome. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

20.
Background: The development of classification methods for personalized medicine is highly dependent on the identification of predictive genetic markers. In survival analysis, it is often necessary to discriminate between influential and noninfluential markers. It is common to perform univariate screening using Cox scores, which quantify the associations between survival and each of the markers to provide a ranking. Since Cox scores do not account for dependencies between the markers, their use is suboptimal in the presence of highly correlated markers. Methods: As an alternative to the Cox score, we propose the correlation-adjusted regression survival (CARS) score for right-censored survival outcomes. By removing the correlations between the markers, the CARS score quantifies the associations between the outcome and the set of “decorrelated” marker values. Estimation of the scores is based on inverse probability weighting, which is applied to log-transformed event times. For high-dimensional data, estimation is based on shrinkage techniques. Results: The consistency of the CARS score is proven under mild regularity conditions. In simulations with high correlations, survival models based on CARS score rankings achieved higher areas under the precision-recall curve than competing methods. Two example applications on prostate and breast cancer confirmed these results. CARS scores are implemented in the R package carSurv. Conclusions: In research applications involving high-dimensional genetic data, the use of CARS scores for marker selection is a favorable alternative to Cox scores even when correlations between covariates are low. Having a straightforward interpretation and low computational requirements, CARS scores are an easy-to-use screening tool in personalized medicine research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号