首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 312 毫秒
1.
In the analysis of gene expression data, dimension reduction techniques have been extensively adopted. The most popular one is perhaps the PCA (principal component analysis). To generate more reliable and more interpretable results, the SPCA (sparse PCA) technique has been developed. With the “small sample size, high dimensionality” characteristic of gene expression data, the analysis results generated from a single dataset are often unsatisfactory. Under contexts other than dimension reduction, integrative analysis techniques, which jointly analyze the raw data of multiple independent datasets, have been developed and shown to outperform “classic” meta‐analysis and other multidatasets techniques and single‐dataset analysis. In this study, we conduct integrative analysis by developing the iSPCA (integrative SPCA) method. iSPCA achieves the selection and estimation of sparse loadings using a group penalty. To take advantage of the similarity across datasets and generate more accurate results, we further impose contrasted penalties. Different penalties are proposed to accommodate different data conditions. Extensive simulations show that iSPCA outperforms the alternatives under a wide spectrum of settings. The analysis of breast cancer and pancreatic cancer data further shows iSPCA's satisfactory performance.  相似文献   

2.
In profiling studies, the analysis of a single dataset often leads to unsatisfactory results because of the small sample size. Multi‐dataset analysis utilizes information of multiple independent datasets and outperforms single‐dataset analysis. Among the available multi‐dataset analysis methods, integrative analysis methods aggregate and analyze raw data and outperform meta‐analysis methods, which analyze multiple datasets separately and then pool summary statistics. In this study, we conduct integrative analysis and marker selection under the heterogeneity structure, which allows different datasets to have overlapping but not necessarily identical sets of markers. Under certain scenarios, it is reasonable to expect some similarity of identified marker sets – or equivalently, similarity of model sparsity structures – across multiple datasets. However, the existing methods do not have a mechanism to explicitly promote such similarity. To tackle this problem, we develop a sparse boosting method. This method uses a BIC/HDBIC criterion to select weak learners in boosting and encourages sparsity. A new penalty is introduced to promote the similarity of model sparsity structures across datasets. The proposed method has a intuitive formulation and is broadly applicable and computationally affordable. In numerical studies, we analyze right censored survival data under the accelerated failure time model. Simulation shows that the proposed method outperforms alternative boosting and penalization methods with more accurate marker identification. The analysis of three breast cancer prognosis datasets shows that the proposed method can identify marker sets with increased similarity across datasets and improved prediction performance. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

3.
In cancer studies with high‐throughput genetic and genomic measurements, integrative analysis provides a way to effectively pool and analyze heterogeneous raw data from multiple independent studies and outperforms “classic” meta‐analysis and single‐dataset analysis. When marker selection is of interest, the genetic basis of multiple datasets can be described using the homogeneity model or the heterogeneity model. In this study, we consider marker selection under the heterogeneity model, which includes the homogeneity model as a special case and can be more flexible. Penalization methods have been developed in the literature for marker selection. This study advances from the published ones by introducing the contrast penalties, which can accommodate the within‐ and across‐dataset structures of covariates/regression coefficients and, by doing so, further improve marker selection performance. Specifically, we develop a penalization method that accommodates the across‐dataset structures by smoothing over regression coefficients. An effective iterative algorithm, which calls an inner coordinate descent iteration, is developed. Simulation shows that the proposed method outperforms the benchmark with more accurate marker identification. The analysis of breast cancer and lung cancer prognosis studies with gene expression measurements shows that the proposed method identifies genes different from those using the benchmark and has better prediction performance.  相似文献   

4.
This article proposes a joint modeling framework for longitudinal insomnia measurements and a stochastic smoking cessation process in the presence of a latent permanent quitting state (i.e., ‘cure’). We use a generalized linear mixed‐effects model and a stochastic mixed‐effects model for the longitudinal measurements of insomnia symptom and for the smoking cessation process, respectively. We link these two models together via the latent random effects. We develop a Bayesian framework and Markov Chain Monte Carlo algorithm to obtain the parameter estimates. We formulate and compute the likelihood functions involving time‐dependent covariates. We explore the within‐subject correlation between insomnia and smoking processes. We apply the proposed methodology to simulation studies and the motivating dataset, that is, the Alpha‐Tocopherol, Beta‐Carotene Lung Cancer Prevention study, a large longitudinal cohort study of smokers from Finland. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

5.
In this paper, we formalize the application of multivariate meta‐analysis and meta‐regression to synthesize estimates of multi‐parameter associations obtained from different studies. This modelling approach extends the standard two‐stage analysis used to combine results across different sub‐groups or populations. The most straightforward application is for the meta‐analysis of non‐linear relationships, described for example by regression coefficients of splines or other functions, but the methodology easily generalizes to any setting where complex associations are described by multiple correlated parameters. The modelling framework of multivariate meta‐analysis is implemented in the package mvmeta within the statistical environment R . As an illustrative example, we propose a two‐stage analysis for investigating the non‐linear exposure–response relationship between temperature and non‐accidental mortality using time‐series data from multiple cities. Multivariate meta‐analysis represents a useful analytical tool for studying complex associations through a two‐stage procedure. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

6.
In this paper, we propose a class of Box–Cox transformation regression models with multidimensional random effects for analyzing multivariate responses for individual patient data in meta‐analysis. Our modeling formulation uses a multivariate normal response meta‐analysis model with multivariate random effects, in which each response is allowed to have its own Box–Cox transformation. Prior distributions are specified for the Box–Cox transformation parameters as well as the regression coefficients in this complex model, and the deviance information criterion is used to select the best transformation model. Because the model is quite complex, we develop a novel Monte Carlo Markov chain sampling scheme to sample from the joint posterior of the parameters. This model is motivated by a very rich dataset comprising 26 clinical trials involving cholesterol‐lowering drugs where the goal is to jointly model the three‐dimensional response consisting of low density lipoprotein cholesterol (LDL‐C), high density lipoprotein cholesterol (HDL‐C), and triglycerides (TG) (LDL‐C, HDL‐C, TG). Because the joint distribution of (LDL‐C, HDL‐C, TG) is not multivariate normal and in fact quite skewed, a Box–Cox transformation is needed to achieve normality. In the clinical literature, these three variables are usually analyzed univariately; however, a multivariate approach would be more appropriate because these variables are correlated with each other. We carry out a detailed analysis of these data by using the proposed methodology. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

7.
Breast cancer is the leading cancer in women of reproductive age; more than a quarter of women diagnosed with breast cancer in the US are premenopausal. A common adjuvant treatment for this patient population is chemotherapy, which has been shown to cause premature menopause and infertility with serious consequences to quality of life. Luteinizing‐hormone‐releasing hormone (LHRH) agonists, which induce temporary ovarian function suppression (OFS), has been shown to be a useful alternative to chemotherapy in the adjuvant setting for estrogen‐receptor‐positive breast cancer patients. LHRH agonists have the potential to preserve fertility after treatment, thus, reducing the negative effects on a patient's reproductive health. However, little is known about the association between a patient's underlying degree of OFS and disease‐free survival (DFS) after receiving LHRH agonists. Specifically, we are interested in whether patients with lower underlying degrees of OFS (i.e. higher estrogen production) after taking LHRH agonists are at a higher risk for late breast cancer events. In this paper, we propose a latent class joint model (LCJM) to analyze a data set from International Breast Cancer Study Group (IBCSG) Trial VIII to investigate the association between OFS and DFS. Analysis of this data set is challenging due to the fact that the main outcome of interest, OFS, is unobservable and the available surrogates for this latent variable involve masked event and cured proportions. We employ a likelihood approach and the EM algorithm to obtain parameter estimates and present results from the IBCSG data analysis. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

8.
Genome‐wide association studies (GWAS) are now routinely imputed for untyped single nucleotide polymorphisms (SNPs) based on various powerful statistical algorithms for imputation trained on reference datasets. The use of predicted allele counts for imputed SNPs as the dosage variable is known to produce valid score test for genetic association. In this paper, we investigate how to best handle imputed SNPs in various modern complex tests for genetic associations incorporating gene–environment interactions. We focus on case‐control association studies where inference for an underlying logistic regression model can be performed using alternative methods that rely on varying degree on an assumption of gene–environment independence in the underlying population. As increasingly large‐scale GWAS are being performed through consortia effort where it is preferable to share only summary‐level information across studies, we also describe simple mechanisms for implementing score tests based on standard meta‐analysis of “one‐step” maximum‐likelihood estimates across studies. Applications of the methods in simulation studies and a dataset from GWAS of lung cancer illustrate ability of the proposed methods to maintain type‐I error rates for the underlying testing procedures. For analysis of imputed SNPs, similar to typed SNPs, the retrospective methods can lead to considerable efficiency gain for modeling of gene–environment interactions under the assumption of gene–environment independence. Methods are made available for public use through CGEN R software package.  相似文献   

9.
Multistate models are increasingly being used to model complex disease profiles. By modelling transitions between disease states, accounting for competing events at each transition, we can gain a much richer understanding of patient trajectories and how risk factors impact over the entire disease pathway. In this article, we concentrate on parametric multistate models, both Markov and semi‐Markov, and develop a flexible framework where each transition can be specified by a variety of parametric models including exponential, Weibull, Gompertz, Royston‐Parmar proportional hazards models or log‐logistic, log‐normal, generalised gamma accelerated failure time models, possibly sharing parameters across transitions. We also extend the framework to allow time‐dependent effects. We then use an efficient and generalisable simulation method to calculate transition probabilities from any fitted multistate model, and show how it facilitates the simple calculation of clinically useful measures, such as expected length of stay in each state, and differences and ratios of proportion within each state as a function of time, for specific covariate patterns. We illustrate our methods using a dataset of patients with primary breast cancer. User‐friendly Stata software is provided.  相似文献   

10.
The frailty model, an extension of the proportional hazards model, is often used to model clustered survival data. However, some extension of the ordinary frailty model is required when there exist competing risks within a cluster. Under competing risks, the underlying processes affecting the events of interest and competing events could be different but correlated. In this paper, the hierarchical likelihood method is proposed to infer the cause‐specific hazard frailty model for clustered competing risks data. The hierarchical likelihood incorporates fixed effects as well as random effects into an extended likelihood function, so that the method does not require intensive numerical methods to find the marginal distribution. Simulation studies are performed to assess the behavior of the estimators for the regression coefficients and the correlation structure among the bivariate frailty distribution for competing events. The proposed method is illustrated with a breast cancer dataset. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

11.
We develop a multivariate cure survival model to estimate lifetime patterns of colorectal cancer screening. Screening data cover long periods of time, with sparse observations for each person. Some events may occur before the study begins or after the study ends, so the data are both left‐censored and right‐censored, and some individuals are never screened (the ‘cured’ population). We propose a multivariate parametric cure model that can be used with left‐censored and right‐censored data. Our model allows for the estimation of the time to screening as well as the average number of times individuals will be screened. We calculate likelihood functions based on the observations for each subject using a distribution that accounts for within‐subject correlation and estimate parameters using Markov chain Monte Carlo methods. We apply our methods to the estimation of lifetime colorectal cancer screening behavior in the SEER‐Medicare data set. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

12.
Multivariate meta‐analysis allows the joint synthesis of effect estimates based on multiple outcomes from multiple studies, accounting for the potential correlations among them. However, standard methods for multivariate meta‐analysis for multiple outcomes are restricted to problems where the within‐study correlation is known or where individual participant data are available. This paper proposes an approach to approximating the within‐study covariances based on information about likely correlations between underlying outcomes. We developed methods for both continuous and dichotomous data and for combinations of the two types. An application to a meta‐analysis of treatments for stroke illustrates the use of the approximated covariance in multivariate meta‐analysis with correlated outcomes. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

13.
We address the problem of meta‐analysis of pairs of survival curves under heterogeneity. Starting point for the meta‐analysis is a set of studies, each comparing the same two treatments, containing information about multiple survival outcomes. Under heterogeneity, we model the number of events using an extension of the Poisson correlated gamma‐frailty model with serial within‐arm and positive between‐arm correlations. The parameters of the models are estimated following a two‐stage estimation procedure. In the first stage the underlying hazards and between‐study variance are estimated using the marginals, while a second stage is used to estimate both within‐arm and between‐arm correlations. The methodology is illustrated with an observational study on breast cancer. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

14.
Multilevel item response theory (MLIRT) models have been widely used to analyze the multivariate longitudinal data of mixed types (e.g., categorical and continuous) in clinical studies. The MLIRT models often have unidimensional assumption, that is, the multiple outcomes are clinical manifestations of a univariate latent variable. However, the unidimensional assumption may be unrealistic because some diseases may be heterogeneous and characterized by multiple impaired domains with variable clinical symptoms and disease progressions. We relax this assumption and propose a multidimensional latent trait linear mixed model (MLTLMM) to allow multiple latent variables and within‐item multidimensionality (one outcome can be a manifestation of more than one latent variable). We conduct extensive simulation studies to assess the unidimensional MLIRT model and the proposed MLTLMM model. The simulation studies suggest that the MLTLMM model outperforms unidimensional model when the multivariate longitudinal outcomes are manifested by multiple latent variables. The proposed model is applied to two motivating studies of amyotrophic lateral sclerosis: a clinical trial of ceftriaxone and the Pooled Resource Open‐Access ALS Clinical Trials database. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

15.
Flexible survival models are in need when modelling data from long term follow‐up studies. In many cases, the assumption of proportionality imposed by a Cox model will not be valid. Instead, a model that can identify time varying effects of fixed covariates can be used. Although there are several approaches that deal with this problem, it is not always straightforward how to choose which covariates should be modelled having time varying effects and which not. At the same time, it is up to the researcher to define appropriate time functions that describe the dynamic pattern of the effects. In this work, we suggest a model that can deal with both fixed and time varying effects and uses simple hypotheses tests to distinguish which covariates do have dynamic effects. The model is an extension of the parsimonious reduced rank model of rank 1. As such, the number of parameters is kept low, and thus, a flexible set of time functions, such as b‐splines, can be used. The basic theory is illustrated along with an efficient fitting algorithm. The proposed method is applied to a dataset of breast cancer patients and compared with a multivariate fractional polynomials approach for modelling time‐varying effects. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

16.
Genome‐wide association studies (GWAS) for complex diseases have focused primarily on single‐trait analyses for disease status and disease‐related quantitative traits. For example, GWAS on risk factors for coronary artery disease analyze genetic associations of plasma lipids such as total cholesterol, LDL‐cholesterol, HDL‐cholesterol, and triglycerides (TGs) separately. However, traits are often correlated and a joint analysis may yield increased statistical power for association over multiple univariate analyses. Recently several multivariate methods have been proposed that require individual‐level data. Here, we develop metaUSAT (where USAT is unified score‐based association test), a novel unified association test of a single genetic variant with multiple traits that uses only summary statistics from existing GWAS. Although the existing methods either perform well when most correlated traits are affected by the genetic variant in the same direction or are powerful when only a few of the correlated traits are associated, metaUSAT is designed to be robust to the association structure of correlated traits. metaUSAT does not require individual‐level data and can test genetic associations of categorical and/or continuous traits. One can also use metaUSAT to analyze a single trait over multiple studies, appropriately accounting for overlapping samples, if any. metaUSAT provides an approximate asymptotic P‐value for association and is computationally efficient for implementation at a genome‐wide level. Simulation experiments show that metaUSAT maintains proper type‐I error at low error levels. It has similar and sometimes greater power to detect association across a wide array of scenarios compared to existing methods, which are usually powerful for some specific association scenarios only. When applied to plasma lipids summary data from the METSIM and the T2D‐GENES studies, metaUSAT detected genome‐wide significant loci beyond the ones identified by univariate analyses. Evidence from larger studies suggest that the variants additionally detected by our test are, indeed, associated with lipid levels in humans. In summary, metaUSAT can provide novel insights into the genetic architecture of a common disease or traits.  相似文献   

17.
End‐stage renal disease (ESRD) is one of the most serious diabetes complications. Numerous studies have been devoted to revealing the risk factors of the onset time of ESRD. In this article, we propose a proportional mean residual life (MRL) model with latent variables to assess the effects of observed and latent risk factors on the MRL function of ESRD in a cohort of Chinese type 2 diabetic patients. The proposed model generalizes the conventional proportional MRL model to accommodate the latent risk factor that cannot be measured by a single observed variable. We employ a factor analysis model to characterize the latent risk factors via multiple observed variables. We develop a borrow‐strength estimation procedure, which incorporates the expectation–maximization algorithm and an extended estimating equation approach. The asymptotic properties of the proposed estimators are established. Simulation shows that the performance of the proposed methodology is satisfactory. The application to the study of type 2 diabetes reveals insights into the prevention of ESRD. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

18.
Motivated by the multivariate nature of microbiome data with hierarchical taxonomic clusters, counts that are often skewed and zero inflated, and repeated measures, we propose a Bayesian latent variable methodology to jointly model multiple operational taxonomic units within a single taxonomic cluster. This novel method can incorporate both negative binomial and zero‐inflated negative binomial responses, and can account for serial and familial correlations. We develop a Markov chain Monte Carlo algorithm that is built on a data augmentation scheme using Pólya‐Gamma random variables. Hierarchical centering and parameter expansion techniques are also used to improve the convergence of the Markov chain. We evaluate the performance of our proposed method through extensive simulations. We also apply our method to a human microbiome study.  相似文献   

19.
曾艳华 《中国妇幼保健》2012,27(35):5691-5694
目的:探讨金坛市子宫内膜癌发病的危险因素。方法:采用病例对照研究,选择2005年12月~2011年6月在金坛市人民医院妇产科就诊并经过病理诊断为子宫内膜癌的患者165例为病例组,同时选择528例健康体检者为对照组,采用单因素与多因素非条件Logistic回归分析子宫内膜癌发病的危险因素。结果:单因素分析表明,年龄≤50岁、年龄≥61岁、BMI超重、患有高血压、患有糖尿病、月经初潮年龄≤12岁、初次分娩年龄≤20岁、一级亲属中有乳癌、子宫内膜癌、结肠癌、卵巢癌患病史与子宫内膜癌发病有关。通过多因素Logistic逐步回归分析,最终引入回归方程的变量为年龄≤50岁、年龄≥61岁、BMI超重、患有高血压、患有糖尿病、月经初潮年龄≤12岁、一级亲属中有结肠癌及卵巢癌患病史。结论:年龄≥61岁、BMI超重、患有高血压、患有糖尿病、月经初潮年龄≤12岁、一级亲属中有结肠癌及卵巢癌患病史是子宫内膜癌发病的危险因素,年龄≤50岁是子宫内膜癌发病的保护因素。  相似文献   

20.
As part of the 9th Genetic Analysis Workshop held in Val Morin, Quebec, October 16-18, 1994, four workshop participants analyzed a large breast cancer data set. This data set consisted of phenotype and genetic marker data on 3884 individuals in 214 families with at least four cases of breast cancer contributed by members of an international breast cancer consortium. Two of the four papers [Barrett and Rigby; Commenges; this issue] utilized variants of affected pair methods to assess linkage of breast/ovarian cancer to the 17q markers in the data set. The third paper by Bansal et al. used a Monte-Carlo approach to examine the question of intrafamilial clustering of breast and ovarian cancer. The last paper in the series [Leal and Ott] described a method of computing support intervals for risk estimates when there are uncertainties associated with the parameter estimates used to compute these risks. © 1995 Wiley-Liss, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号