首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Genome‐wide association studies (GWAS) have become a very effective research tool to identify genetic variants of underlying various complex diseases. In spite of the success of GWAS in identifying thousands of reproducible associations between genetic variants and complex disease, in general, the association between genetic variants and a single phenotype is usually weak. It is increasingly recognized that joint analysis of multiple phenotypes can be potentially more powerful than the univariate analysis, and can shed new light on underlying biological mechanisms of complex diseases. In this paper, we develop a novel variable reduction method using hierarchical clustering method (HCM) for joint analysis of multiple phenotypes in association studies. The proposed method involves two steps. The first step applies a dimension reduction technique by using a representative phenotype for each cluster of phenotypes. Then, existing methods are used in the second step to test the association between genetic variants and the representative phenotypes rather than the individual phenotypes. We perform extensive simulation studies to compare the powers of multivariate analysis of variance (MANOVA), joint model of multiple phenotypes (MultiPhen), and trait‐based association test that uses extended simes procedure (TATES) using HCM with those of without using HCM. Our simulation studies show that using HCM is more powerful than without using HCM in most scenarios. We also illustrate the usefulness of using HCM by analyzing a whole‐genome genotyping data from a lung function study.  相似文献   

2.
Semicontinuous data, characterized by a sizable number of zeros and observations from a continuous distribution, are frequently encountered in health research concerning food consumptions, physical activities, medical and pharmacy claims expenditures, and many others. In analyzing such semicontinuous data, it is imperative that the excessive zeros be adequately accounted for to obtain unbiased and efficient inference. Although many methods have been proposed in the literature for the modeling and analysis of semicontinuous data, little attention has been given to clustering of semicontinuous data to identify important patterns that could be indicative of certain health outcomes or intervention effects. We propose a Bernoulli-normal mixture model for clustering of multivariate semicontinuous data and demonstrate its accuracy as compared to the well-known clustering method with the conventional normal mixture model. The proposed method is illustrated with data from a dietary intervention trial to promote healthy eating behavior among children with type 1 diabetes. In the trial, certain diabetes friendly foods (eg, total fruit, whole fruit, dark green and orange vegetables and legumes, whole grain) were only consumed by a proportion of study participants, yielding excessive zero values due to nonconsumption of the foods. Baseline foods consumptions data in the trial are used to explore preintervention dietary patterns among study participants. While the conventional normal mixture model approach fails to do so, the proposed Bernoulli-normal mixture model approach has shown to be able to identify a dietary profile that significantly differentiates the intervention effects from others, as measured by the popular healthy eating index at the end of the trial.  相似文献   

3.
In longitudinal studies of patients with the human immunodeficiency virus (HIV), objectives of interest often include modeling of individual-level trajectories of HIV ribonucleic acid (RNA) as a function of time. Such models can be used to predict the effects of different treatment regimens or to classify subjects into subgroups with similar trajectories. Empirical evidence, however, suggests that individual trajectories often possess multiple points of rapid change, which may vary from subject to subject. Additionally, some individuals may end up dropping out of the study and the tendency to drop out may be related to the level of the biomarker. Modeling of individual viral RNA profiles is challenging in the presence of these changes, and currently available methods do not address all the issues such as multiple changes, informative dropout, clustering, etc. in a single model. In this article, we propose a new joint model, where a multiple-changepoint model is proposed for the longitudinal viral RNA response and a proportional hazards model for the time of dropout process. Dirichlet process (DP) priors are used to model the distribution of the individual random effects and error distribution. In addition to robustifying the model against possible misspecifications, the DP leads to a natural clustering of subjects with similar trajectories which can be of importance in itself. Sharing of information among subjects with similar trajectories also results in improved parameter estimation. A fully Bayesian approach for model fitting and prediction is implemented using MCMC procedures on the ACTG 398 clinical trial data. The proposed model is seen to give rise to improved estimates of individual trajectories when compared with a parametric approach.  相似文献   

4.
A time‐varying latent variable model is proposed to jointly analyze multivariate mixed‐support longitudinal data. The proposal can be viewed as an extension of hidden Markov regression models with fixed covariates (HMRMFCs), which is the state of the art for modelling longitudinal data, with a special focus on the underlying clustering structure. HMRMFCs are inadequate for applications in which a clustering structure can be identified in the distribution of the covariates, as the clustering is independent from the covariates distribution. Here, hidden Markov regression models with random covariates are introduced by explicitly specifying state‐specific distributions for the covariates, with the aim of improving the recovering of the clusters in the data with respect to a fixed covariates paradigm. The hidden Markov regression models with random covariates class is defined focusing on the exponential family, in a generalized linear model framework. Model identifiability conditions are sketched, an expectation‐maximization algorithm is outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients, as well as of the hidden path parameters, are evaluated through simulation experiments and compared with those of HMRMFCs. The method is applied to physical activity data.  相似文献   

5.
Joint models initially dedicated to a single longitudinal marker and a single time‐to‐event need to be extended to account for the rich longitudinal data of cohort studies. Multiple causes of clinical progression are indeed usually observed, and multiple longitudinal markers are collected when the true latent trait of interest is hard to capture (e.g., quality of life, functional dependency, and cognitive level). These multivariate and longitudinal data also usually have nonstandard distributions (discrete, asymmetric, bounded, etc.). We propose a joint model based on a latent process and latent classes to analyze simultaneously such multiple longitudinal markers of different natures, and multiple causes of progression. A latent process model describes the latent trait of interest and links it to the observed longitudinal outcomes using flexible measurement models adapted to different types of data, and a latent class structure links the longitudinal and cause‐specific survival models. The joint model is estimated in the maximum likelihood framework. A score test is developed to evaluate the assumption of conditional independence of the longitudinal markers and each cause of progression given the latent classes. In addition, individual dynamic cumulative incidences of each cause of progression based on the repeated marker data are derived. The methodology is validated in a simulation study and applied on real data about cognitive aging obtained from a large population‐based study. The aim is to predict the risk of dementia by accounting for the competing death according to the profiles of semantic memory measured by two asymmetric psychometric tests. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

6.
Subgroup identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to define subgroups. Longitudinal gene expression profiles might provide additional information on disease progression than what is captured by baseline profiles alone. Therefore, subgroup identification could be more accurate and effective with the aid of longitudinal gene expression data. However, existing statistical methods are unable to fully utilize these data for patient clustering. In this article, we introduce a novel clustering method in the Bayesian setting based on longitudinal gene expression profiles. This method, called BClustLonG, adopts a linear mixed‐effects framework to model the trajectory of genes over time, while clustering is jointly conducted based on the regression coefficients obtained from all genes. In order to account for the correlations among genes and alleviate the high dimensionality challenges, we adopt a factor analysis model for the regression coefficients. The Dirichlet process prior distribution is utilized for the means of the regression coefficients to induce clustering. Through extensive simulation studies, we show that BClustLonG has improved performance over other clustering methods. When applied to a dataset of severely injured (burn or trauma) patients, our model is able to identify interesting subgroups. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

7.
The normality assumption of measurement error is a widely used distribution in joint models of longitudinal and survival data, but it may lead to unreasonable or even misleading results when longitudinal data reveal skewness feature. This paper proposes a new joint model for multivariate longitudinal and multivariate survival data by incorporating a nonparametric function into the trajectory function and hazard function and assuming that measurement errors in longitudinal measurement models follow a skew‐normal distribution. A Monte Carlo Expectation‐Maximization (EM) algorithm together with the penalized‐splines technique and the Metropolis–Hastings algorithm within the Gibbs sampler is developed to estimate parameters and nonparametric functions in the considered joint models. Case deletion diagnostic measures are proposed to identify the potential influential observations, and an extended local influence method is presented to assess local influence of minor perturbations. Simulation studies and a real example from a clinical trial are presented to illustrate the proposed methodologies. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

8.
A common paradigm in dealing with heterogeneity across tumors in cancer analysis is to cluster the tumors into subtypes using marker data on the tumor, and then to analyze each of the clusters separately. A more specific target is to investigate the association between risk factors and specific subtypes and to use the results for personalized preventive treatment. This task is usually carried out in two steps–clustering and risk factor assessment. However, two sources of measurement error arise in these problems. The first is the measurement error in the biomarker values. The second is the misclassification error when assigning observations to clusters. We consider the case with a specified set of relevant markers and propose a unified single‐likelihood approach for normally distributed biomarkers. As an alternative, we consider a two‐step procedure with the tumor type misclassification error taken into account in the second‐step risk factor analysis. We describe our method for binary data and also for survival analysis data using a modified version of the Cox model. We present asymptotic theory for the proposed estimators. Simulation results indicate that our methods significantly lower the bias with a small price being paid in terms of variance. We present an analysis of breast cancer data from the Nurses' Health Study to demonstrate the utility of our method. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

9.
We present a novel statistical method for linkage disequilibrium (LD) mapping of disease susceptibility loci in case-control studies. Such studies exploit the statistical correlation or LD that exist between variants physically close along the genome to identify those that correlate with disease status and might thus be close to a causative mutation, generally assumed unobserved. LD structure, however, varies markedly over short distances because of variation in local recombination rates, mutation and genetic drift among other factors. We propose a Bayesian multivariate probit model that flexibly accounts for the local spatial correlation between markers. In a case-control setting, we use a retrospective model that properly reflects the sampling scheme and identify regions where single- or multi-locus marker frequencies differ across cases and controls. We formally quantify these differences using information-theoretic distance measures while the fully Bayesian approach naturally accommodates unphased or missing genotype data. We demonstrate our approach on simulated data and on real data from the CYP2D6 region that has a confirmed role in drug metabolism.  相似文献   

10.
In clinical trials, it is often desirable to evaluate the effect of a prognostic factor such as a marker response on a survival outcome. However, the marker response and survival outcome are usually associated with some potentially unobservable factors. In this case, the conventional statistical methods that model these two outcomes separately may not be appropriate. In this paper, we propose a joint model for marker response and survival outcomes for clustered data, providing efficient statistical inference by considering these two outcomes simultaneously. We focus on a special type of marker response: a binary outcome, which is investigated together with survival data using a cluster-specific multivariate random effect variable. A multivariate penalized likelihood method is developed to make statistical inference for the joint model. However, the standard errors obtained from the penalized likelihood method are usually underestimated. This issue is addressed using a jackknife resampling method to obtain a consistent estimate of standard errors. We conduct extensive simulation studies to assess the finite sample performance of the proposed joint model and inference methods in different scenarios. The simulation studies show that the proposed joint model has excellent finite sample properties compared to the separate models when there exists an underlying association between the marker response and survival data. Finally, we apply the proposed method to a symptom control study conducted by Canadian Cancer Trials Group to explore the prognostic effect of covariates on pain control and overall survival.  相似文献   

11.
We propose a semiparametric multivariate skew–normal joint model for multivariate longitudinal and multivariate survival data. One main feature of the posited model is that we relax the commonly used normality assumption for random effects and within‐subject error by using a centered Dirichlet process prior to specify the random effects distribution and using a multivariate skew–normal distribution to specify the within‐subject error distribution and model trajectory functions of longitudinal responses semiparametrically. A Bayesian approach is proposed to simultaneously obtain Bayesian estimates of unknown parameters, random effects and nonparametric functions by combining the Gibbs sampler and the Metropolis–Hastings algorithm. Particularly, a Bayesian local influence approach is developed to assess the effect of minor perturbations to within‐subject measurement error and random effects. Several simulation studies and an example are presented to illustrate the proposed methodologies. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

12.
PURPOSE In this study, we assessed whether multivariate models and clinical decision rules can be used to reliably diagnose influenza.METHODS We conducted a systematic review of MEDLINE, bibliographies of relevant studies, and previous meta-analyses. We searched the literature (1962–2010) for articles evaluating the accuracy of multivariate models, clinical decision rules, or simple heuristics for the diagnosis of influenza. Each author independently reviewed and abstracted data from each article; discrepancies were resolved by consensus discussion. Where possible, we calculated sensitivity, specificity, predictive value, likelihood ratios, and areas under the receiver operating characteristic curve.RESULTS A total of 12 studies met our inclusion criteria. No study prospectively validated a multivariate model or clinical decision rule, and no study performed a split-sample or bootstrap validation of such a model. Simple heuristics such as the so-called fever and cough rule and the fever, cough, and acute onset rule were each evaluated by several studies in populations of adults and children. The areas under the receiver operating characteristic curves were 0.70 and 0.79, respectively. We could not calculate a single summary estimate, however, as the diagnostic threshold varied among studies.CONCLUSIONS The fever and cough, and the fever, cough, and acute onset heuristics have modest accuracy, but summary estimates could not be calculated. Further research is needed to develop and prospectively validate clinical decision rules to identify patients requiring testing, empiric treatment, or neither.  相似文献   

13.
In cancer research, high‐throughput profiling studies have been extensively conducted, searching for genes/single nucleotide polymorphisms (SNPs) associated with prognosis. Despite seemingly significant differences, different subtypes of the same cancer (or different types of cancers) may share common susceptibility genes. In this study, we analyze prognosis data on multiple subtypes of the same cancer but note that the proposed approach is directly applicable to the analysis of data on multiple types of cancers. We describe the genetic basis of multiple subtypes using the heterogeneity model that allows overlapping but different sets of susceptibility genes/SNPs for different subtypes. An accelerated failure time (AFT) model is adopted to describe prognosis. We develop a regularized gradient descent approach that conducts gene‐level analysis and identifies genes that contain important SNPs associated with prognosis. The proposed approach belongs to the family of gradient descent approaches, is intuitively reasonable, and has affordable computational cost. Simulation study shows that when prognosis‐associated SNPs are clustered in a small number of genes, the proposed approach outperforms alternatives with significantly more true positives and fewer false positives. We analyze an NHL (non‐Hodgkin lymphoma) prognosis study with SNP measurements and identify genes associated with the three major subtypes of NHL, namely, DLBCL, FL, and CLL/SLL. The proposed approach identifies genes different from using alternative approaches and has the best prediction performance.  相似文献   

14.
Multivariate outcomes measured longitudinally over time are common in medicine, public health, psychology and sociology. The typical (saturated) longitudinal multivariate regression model has a separate set of regression coefficients for each outcome. However, multivariate outcomes are often quite similar and many outcomes can be expected to respond similarly to changes in covariate values. Given a set of outcomes likely to share common covariate effects, we propose the clustered outcome common predictor effect model and offer a two step iterative algorithm to fit the model using available software for univariate longitudinal data. Outcomes that share predictor effects need not be chosen a priori; we propose model selection tools to let the data select outcome clusters. We apply the proposed methods to psychometric data from adolescent children of HIV+ parents. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

15.
High‐dimensional longitudinal data involving latent variables such as depression and anxiety that cannot be quantified directly are often encountered in biomedical and social sciences. Multiple responses are used to characterize these latent quantities, and repeated measures are collected to capture their trends over time. Furthermore, substantive research questions may concern issues such as interrelated trends among latent variables that can only be addressed by modeling them jointly. Although statistical analysis of univariate longitudinal data has been well developed, methods for modeling multivariate high‐dimensional longitudinal data are still under development. In this paper, we propose a latent factor linear mixed model (LFLMM) for analyzing this type of data. This model is a combination of the factor analysis and multivariate linear mixed models. Under this modeling framework, we reduced the high‐dimensional responses to low‐dimensional latent factors by the factor analysis model, and then we used the multivariate linear mixed model to study the longitudinal trends of these latent factors. We developed an expectation–maximization algorithm to estimate the model. We used simulation studies to investigate the computational properties of the expectation–maximization algorithm and compare the LFLMM model with other approaches for high‐dimensional longitudinal data analysis. We used a real data example to illustrate the practical usefulness of the model. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

16.
We propose a new method of linkage analysis based on using the grade of membership scores resulting from fuzzy clustering procedures to define new dependent variables for the various Haseman-Elston approaches. For a single continuous trait with low heritability, the aim was to identify subgroups such that the grade of membership scores to these subgroups would provide more information for linkage than the original trait. For a multivariate trait, the goal was to provide a means of data reduction and data mining. Simulation studies using continuous traits with relatively low heritability (H=0.1, 0.2, and 0.3) showed that the new approach does not enhance power for a single trait. However, for a multivariate continuous trait (with three components), it is more powerful than the principal component method and more powerful than the joint linkage test proposed by Mangin et al. ([1998] Biometrics 54:88-99) when there is pleiotropy.  相似文献   

17.
Multilevel item response theory (MLIRT) models have been widely used to analyze the multivariate longitudinal data of mixed types (e.g., categorical and continuous) in clinical studies. The MLIRT models often have unidimensional assumption, that is, the multiple outcomes are clinical manifestations of a univariate latent variable. However, the unidimensional assumption may be unrealistic because some diseases may be heterogeneous and characterized by multiple impaired domains with variable clinical symptoms and disease progressions. We relax this assumption and propose a multidimensional latent trait linear mixed model (MLTLMM) to allow multiple latent variables and within‐item multidimensionality (one outcome can be a manifestation of more than one latent variable). We conduct extensive simulation studies to assess the unidimensional MLIRT model and the proposed MLTLMM model. The simulation studies suggest that the MLTLMM model outperforms unidimensional model when the multivariate longitudinal outcomes are manifested by multiple latent variables. The proposed model is applied to two motivating studies of amyotrophic lateral sclerosis: a clinical trial of ceftriaxone and the Pooled Resource Open‐Access ALS Clinical Trials database. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

18.
We propose novel estimation approaches for generalized varying coefficient models that are tailored for unsynchronized, irregular and infrequent longitudinal designs/data. Unsynchronized longitudinal data refer to the time‐dependent response and covariate measurements for each individual measured at distinct time points. Data from the Comprehensive Dialysis Study motivate the proposed methods. We model the potential age‐varying association between infection‐related hospitalization status and the inflammatory marker, C‐reactive protein, within the first 2 years from initiation of dialysis. We cannot directly apply traditional longitudinal modeling to unsynchronized data, and no method exists to estimate time‐varying or age‐varying effects for generalized outcomes (e.g., binary or count data) to date. In addition, through the analysis of the Comprehensive Dialysis Study data and simulation studies, we show that preprocessing steps, such as binning, needed to synchronize data to apply traditional modeling can lead to significant loss of information in this context. In contrast, the proposed approaches discard no observation; they exploit the fact that although there is little information in a single subject trajectory because of irregularity and infrequency, the moments of the underlying processes can be accurately and efficiently recovered by pooling information from all subjects using functional data analysis. We derive subject‐specific mean response trajectory predictions and study finite sample properties of the estimators. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

19.
Many cohort studies and clinical trials are designed to compare rates of change over time in one or more disease markers in several groups. One major problem in such longitudinal studies is missing data due to patient drop-out. The bias and efficiency of six different methods to estimate rates of changes in longitudinal studies with incomplete observations were compared: generalized estimating equation estimates (GEE) proposed by Liang and Zeger (1986); unweighted average of ordinary least squares (OLSE) of individual rates of change (UWLS); weighted average of OLSE (WLS); conditional linear model estimates (CLE), a covariate type estimates proposed by Wu and Bailey (1989); random effect (RE), and joint multivariate RE (JMRE) estimates. The latter method combines a linear RE model for the underlying pattern of the marker with a log-normal survival model for informative drop-out process. The performance of these methods in the presence of missing data completely at random (MCAR), at random (MAR) and non-ignorable (NIM) were compared in simulation studies. Data for the disease marker were generated under the linear random effects model with parameter values derived from realistic examples in HIV infection. Rates of drop-out, assumed to increase over time, were allowed to be independent of marker values or to depend either only on previous marker values or on both previous and current marker values. Under MACR all six methods yielded unbiased estimates of both group mean rates and between-group difference. However, the cross-sectional view of the data in the GEE method resulted in seriously biased estimates under MAR and NIM drop-out process. The bias in the estimates ranged from 30 per cent to 50 per cent. The degree of bias in the GEE estimates increases with the severity of non-randomness and with the proportion of MAR data. Under MCAR and MAR all the other five methods performed relatively well. RE and JMRE estimates were more efficient(that is, had smaller variance) than UWLS, WLS and CL estimates. Under NIM, WLS and particularly RE estimates tended to underestimate the average rate of marker change (bias approximately 10 per cent). Under NIM, UWLS, CL and JMRE performed better in terms of bias (3-5 per cent) with the JMRE giving the most efficient estimates. Given that markers are key variables related to disease progression, missing marker data are likely to be at least MAR. Thus, the GEE method may not be appropriate for analysing such longitudinal marker data. The potential biases due to incomplete data require greater recognition in reports of longitudinal studies. Sensitivity analyses to assess the effect of drop-outs on inferences about the target parameters are important.  相似文献   

20.
Data augmentation has been commonly utilized to analyze correlated binary data using multivariate probit models in Bayesian analysis. However, the identification issue in the multivariate probit models necessitates a rigorous Metropolis-Hastings algorithm for sampling a correlation matrix, which may cause slow convergence and inefficiency of Markov chains. It is well-known that the parameter-expanded data augmentation, by introducing a working/artificial parameter or parameter vector, makes an identifiable model be non-identifiable and improves the mixing and convergence of data augmentation components. Therefore, we motivate to develop efficient parameter-expanded data augmentations to analyze correlated binary data using multivariate probit models. We investigate both the identifiable and non-identifiable multivariate probit models and develop the corresponding parameter-expanded data augmentation algorithms. We point out that the approaches, based on one non-identifiable model, circumvent a Metropolis-Hastings algorithm for sampling a correlation matrix and improve the convergence and mixing of correlation parameters; the identifiable model may produce the estimated regression parameters with smaller standard errors than the non-identifiable model does. We illustrate our proposed approaches using simulation studies and through the application to a longitudinal dataset from the Six Cities study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号