首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
In order to investigate currently used model fitting strategies for twin data, analysis of variance (ANOVA) and path-maximum-likelihood (PATH-ML) methods of analyzing twin data were compared using simulation studies of 50 monozygotic (MZ) and 50 dizygotic (DZ) twin pairs. Phenotypic covariance was partitioned into additive genetic effects (A), environmental effects common to cotwins (C), and environmental variance unique to individuals (E). ANOVA and PATH-ML had identical power to detect total covariance. The PATH-ML AE model was much more powerful than ANOVA comparisons of rMZ and rDZ to detect A. However, to be unbiased, the AE model requires the assumption that C = 0.0. To allow use of the AE model to estimate A, the null hypothesis C = 0.0 is tested by comparing the goodness of fit of the ACE and AE models. Simulation of 50 MZ and 50 DZ pairs revealed that C must be greater than 55% of total variance before the null hypothesis would be rejected (P < 0.05) 80% of the time. Several recent publications were reviewed in which the null hypothesis C = 0.0 was accepted and apparently upwardly biased estimates of A, containing C, were presented with unrealistic P values. It was concluded that use of the AE model to estimate A gives an inflated view of the power of relatively small twin studies. It was recommended that ANOVA or comparison of the ACE and CE PATH-ML models be used to estimate and test the significance of A as neither requires that C = 0.0. © Wiley-Liss, Inc.  相似文献   

2.
Familial aggregation and the role of genetic and environmental factors can be investigated through family studies analysed using the liability‐threshold model. The liability‐threshold model ignores the timing of events including the age of disease onset and right censoring, which can lead to estimates that are difficult to interpret and are potentially biased. We incorporate the time aspect into the liability‐threshold model for case‐control‐family data following the same approach that has been applied in the twin setting. Thus, the data are considered as arising from a competing risks setting and inverse probability of censoring weights are used to adjust for right censoring. In the case‐control‐family setting, recognising the existence of competing events is highly relevant to the sampling of control probands. Because of the presence of multiple family members who may be censored at different ages, the estimation of inverse probability of censoring weights is not as straightforward as in the twin setting but requires consideration. We propose to employ a composite likelihood conditioning on proband status that markedly simplifies adjustment for right censoring. We assess the proposed approach using simulation studies and apply it in the analysis of two Danish register‐based case‐control‐family studies: one on cancer diagnosed in childhood and adolescence, and one on early‐onset breast cancer. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

3.
Regression diagnostic methods are developed and investigated under the Class A regressive model proposed by Bonney [(1984) Am J Med Genet 18:731–749]. We call a family whose phenotypic distribution does not conform to the same genetic model as the majority of the families an etiotic family. The exact case‐deletion approach for identifying etiotic families, based on examining the changes in each model parameter estimate by excluding one family at a time, is very time‐consuming. We proposed three alternative diagnostic methods: the empirical influence function (EIF), the one‐step approximation, and the approximated one‐step approach. These methods can be computed efficiently and were incorporated into the existing software package S.A.G.E. A thorough Monte‐Carlo investigation of the performance of the diagnostic methods was conducted and generally supports the EIF approach as the recommended alternative. The phenotypic variance is the parameter whose associated regression diagnostic most frequently and correctly identified etiotic families in the models that were examined. An analysis of body mass index data from 402 individuals in 122 Muscatine, Iowa families is used to illustrate the methods. A Class A regressive model with a recessive major locus and equal mother‐offspring and father‐offspring correlations provided the best‐fitting model. The proposed regression diagnostics identified up to 7.4% of the 122 families as etiotic. As a result of this investigation, case‐deletion diagnostic assessment is now a practical component in the analysis of quantitative family data. Genet. Epidemiol. 17:174–187, 1999. © 1999 Wiley‐Liss, Inc.  相似文献   

4.
We wish to study the effects of genetic and environmental factors on disease risk, using data from families ascertained because they contain multiple cases of the disease. To do so, we must account for the way participants were ascertained, and for within-family correlations in both disease occurrences and covariates. We model the joint probability distribution of the covariates of ascertained family members, given family disease occurrence and pedigree structure. We describe two such covariate models: the random effects model and the marginal model. Both models assume a logistic form for the distribution of one person's covariates that involves a vector beta of regression parameters. The components of beta in the two models have different interpretations, and they differ in magnitude when the covariates are correlated within families. We describe ascertainment assumptions needed to estimate consistently the parameters beta(RE) in the random effects model and the parameters beta(M) in the marginal model. Under the ascertainment assumptions for the random effects model, we show that conditional logistic regression (CLR) of matched family data gives a consistent estimate beta(RE) for beta(RE) and a consistent estimate for the covariance matrix of beta(RE). Under the ascertainment assumptions for the marginal model, we show that unconditional logistic regression (ULR) gives a consistent estimate for beta(M), and we give a consistent estimator for its covariance matrix. The random effects/CLR approach is simple to use and to interpret, but it can use data only from families containing both affected and unaffected members. The marginal/ULR approach uses data from all individuals, but its variance estimates require special computations. A C program to compute these variance estimates is available at http://www.stanford.edu/dept/HRP/epidemiology. We illustrate these pros and cons by application to data on the effects of parity on ovarian cancer risk in mother/daughter pairs, and use simulations to study the performance of the estimates.  相似文献   

5.
Studies to detect genetic association with disease can be family-based, often using families with multiple affected members, or population based, as in population-based case-control studies. If data on both study types are available from the same population, it is useful to combine them to improve power to detect genetic associations. Two aspects of the data need to be accommodated, the sampling scheme and potential residual correlations among family members. We propose two approaches for combining data from a case-control study and a family study that collected families with multiple cases. In the first approach, we view a family as the sampling unit and specify the joint likelihood for the family members using a two-level mixed effects model to account for random familial effects and for residual genetic correlations among family members. The ascertainment of the families is accommodated by conditioning on the ascertainment event. The individuals in the case-control study are treated as families of size one, and their unconditional likelihood is combined with the conditional likelihood for the families. This approach yields subject specific maximum likelihood estimates of covariate effects. In the second approach, we view an individual as the sampling unit. The sampling scheme is accommodated using two-phase sampling techniques, marginal covariate effects are estimated, and correlations among family members are accounted for in the variance calculations. The models are compared in simulations. Data from a case-control and a family study from north-eastern Italy on melanoma and a low-risk melanoma-susceptibility gene, MC1R, are used to illustrate the approaches.  相似文献   

6.
The potential for bias due to misclassification error in regression analysis is well understood by statisticians and epidemiologists. Assuming little or no available data for estimating misclassification probabilities, investigators sometimes seek to gauge the sensitivity of an estimated effect to variations in the assumed values of those probabilities. We present an intuitive and flexible approach to such a sensitivity analysis, assuming an underlying logistic regression model. For outcome misclassification, we argue that a likelihood‐based analysis is the cleanest and the most preferable approach. In the case of covariate misclassification, we combine observed data on the outcome, error‐prone binary covariate of interest, and other covariates measured without error, together with investigator‐supplied values for sensitivity and specificity parameters, to produce corresponding positive and negative predictive values. These values serve as estimated weights to be used in fitting the model of interest to an appropriately defined expanded data set using standard statistical software. Jackknifing provides a convenient tool for incorporating uncertainty in the estimated weights into valid standard errors to accompany log odds ratio estimates obtained from the sensitivity analysis. Examples illustrate the flexibility of this unified strategy, and simulations suggest that it performs well relative to a maximum likelihood approach carried out via numerical optimization. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

7.
Maximum likelihood methods are used to incorporate partially observed covariate values in fitting logistic regression models. We extend these methods to data collected through complex surveys using the pseudo-likelihood approach. One can obtain parameter estimates of the logistic regression model using standard statistical software and their standard errors by Taylor series expansion or the jackknife method. We apply the approach to data from a two-phase survey screening for dementia in a community sample of African Americans age 65 and older living in Indianapolis. The binary response variable is dementia and the covariate with missing values is a daily functioning score collected from interviews with a relative of the study subject. © 1997 John Wiley & Sons, Ltd.  相似文献   

8.
Lynch Syndrome (LS) families harbor mutated mismatch repair genes,which predispose them to specific types of cancer. Because individuals within LS families can experience multiple cancers over their lifetime, we developed a progressive three‐state model to estimate the disease risk from a healthy (state 0) to a first cancer (state 1) and then to a second cancer (state 2). Ascertainment correction of the likelihood was made to adjust for complex sampling designs with carrier probabilities for family members with missing genotype information estimated using their family's observed genotype and phenotype information in a one‐step expectation–maximization algorithm. A sandwich variance estimator was employed to overcome possible model misspecification. The main objective of this paper is to estimate the disease risk (penetrance) for age at a second cancer after someone has experienced a first cancer that is also associated with a mutated gene. Simulation study results indicate that our approach generally provides unbiased risk estimates and low root mean squared errors across different family study designs, proportions of missing genotypes, and risk heterogeneities. An application to 12 large LS families from Newfoundland demonstrates that the risk for a second cancer was substantial and that the age at a first colorectal cancer significantly impacted the age at any LS subsequent cancer. This study provides new insights for developing more effective management of mutation carriers in LS families by providing more accurate multiple cancer risk estimates. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

9.
This research is motivated by studying the progression of age‐related macular degeneration where both a covariate and the response variable are subject to censoring. We develop a general framework to handle regression with censored covariate where the response can be different types and the censoring can be random or subject to (constant) detection limits. Multiple imputation is a popular technique to handle missing data that requires compatibility between the imputation model and the substantive model to obtain valid estimates. With censored covariate, we propose a novel multiple imputation‐based approach, namely, the semiparametric two‐step importance sampling imputation (STISI) method, to impute the censored covariate. Specifically, STISI imputes the missing covariate from a semiparametric accelerated failure time model conditional on fully observed covariates (Step 1) with the acceptance probability derived from the substantive model (Step 2). The 2‐step procedure automatically ensures compatibility and takes full advantage of the relaxed semiparametric assumption in the imputation. Extensive simulations demonstrate that the STISI method yields valid estimates in all scenarios and outperforms some existing methods that are commonly used in practice. We apply STISI on data from the Age‐related Eye Disease Study, to investigate the association between the progression time of the less severe eye and that of the more severe eye. We also illustrate the method by analyzing the urine arsenic data for patients from National Health and Nutrition Examination Survey (2003‐2004) where the response is binary and 1 covariate is subject to detection limit.  相似文献   

10.
The ‘heritability’ of a phenotype measures the proportion of trait variance due to genetic factors in a population. In the past 50 years, studies with monozygotic and dizygotic twins have estimated heritability for 17,804 traits;1 thus twin studies are popular for estimating heritability. Researchers are often interested in estimating heritability for non-normally distributed outcomes such as binary, counts, skewed or heavy-tailed continuous traits. In these settings, the traditional normal ACE model (NACE) and Falconer's method can produce poor coverage of the true heritability. Therefore, we propose a robust generalized estimating equations (GEE2) framework for estimating the heritability of non-normally distributed outcomes. The traditional NACE and Falconer's method are derived within this unified GEE2 framework, which additionally provides robust standard errors. Although the traditional Falconer's method cannot adjust for covariates, the corresponding ‘GEE2-Falconer’ can incorporate mean and variance-level covariate effects (e.g. let heritability vary by sex or age). Given a non-normally distributed outcome, the GEE2 models are shown to attain better coverage of the true heritability compared to traditional methods. Finally, a scenario is demonstrated where NACE produces biased estimates of heritability while Falconer remains unbiased. Therefore, we recommend GEE2-Falconer for estimating the heritability of non-normally distributed outcomes in twin studies.  相似文献   

11.
Complex genetic traits are inherently heterogeneous, i.e., they may be caused by different genes, or non-genetic factors, in different individuals. So, for mapping genes responsible for these diseases using linkage analysis, heterogeneity must be accounted for in the model. Heterogeneity across different families can be modeled using a mixture distribution by letting each family have its own heterogeneity parameter denoting the probability that its disease-causing gene is linked to the marker map under consideration. A substantial gain in power is expected if covariates that can discriminate between the families of linked and unlinked types are incorporated in this modeling framework. To this end, we propose a hierarchical Bayesian model, in which the families are grouped according to various (categorized) levels of covariate(s). The heterogeneity parameters of families within each group are assigned a common prior, whose parameters are further assigned hyper-priors. The hyper-parameters are obtained by utilizing the empirical Bayes estimates. We also address related issues such as evaluating whether the covariate(s) under consideration are informative and grouping of families. We compare the proposed approach with one that does not utilize covariates and show that our approach leads to considerable gains in power to detect linkage and in precision of interval estimates through various simulation scenarios. An application to the asthma datasets of Genetic Analysis Workshop 12 also illustrates this gain in a real data analysis. Additionally, we compare the performances of microsatellite markers and single nucleotide polymorphisms for our approach and find that the latter clearly outperforms the former.  相似文献   

12.
For genome‐wide association studies with family‐based designs, we propose a Bayesian approach. We show that standard transmission disequilibrium test and family‐based association test statistics can naturally be implemented in a Bayesian framework, allowing flexible specification of the likelihood and prior odds. We construct a Bayes factor conditional on the offspring phenotype and parental genotype data and then use the data we conditioned on to inform the prior odds for each marker. In the construction of the prior odds, the evidence for association for each single marker is obtained at the population‐level by estimating its genetic effect size by fitting the conditional mean model. Since such genetic effect size estimates are statistically independent of the effect size estimation within the families, the actual data set can inform the construction of the prior odds without any statistical penalty. In contrast to Bayesian approaches that have recently been proposed for genome‐wide association studies, our approach does not require assumptions about the genetic effect size; this makes the proposed method entirely data‐driven. The power of the approach was assessed through simulation. We then applied the approach to a genome‐wide association scan to search for associations between single nucleotide polymorphisms and body mass index in the Childhood Asthma Management Program data. Genet. Epidemiol. 34:569–574, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

13.
In this paper we examine alternative measurement models for fitting data from health surveys. We show why a testlet‐based latent trait model that includes covariate information, embedded within a fully Bayesian framework, can allow multiple simultaneous inferences and aid interpretation. We illustrate our approach with a survey of breast cancer survivors that reveals how the attitudes of those patients change after diagnosis toward a focus on appreciating the here‐and‐now, and away from consideration of longer‐term goals. Using the covariate information, we also show the extent to which individual‐level variables such as race, age and Tamoxifen treatment are related to a patient's change in attitude. The major contribution of this research is to demonstrate the use of a hierarchical Bayesian IRT model with covariates in this application area; hence a novel case study, and one that is certainly closely aligned with but distinct from the educational testing applications that have made IRT the dominant test scoring model. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

14.
Genetic association analyses of family-based studies with ordered categorical phenotypes are often conducted using methods either for quantitative or for binary traits, which can lead to suboptimal analyses. Here we present an alternative likelihood-based method of analysis for single nucleotide polymorphism (SNP) genotypes and ordered categorical phenotypes in nuclear families of any size. Our approach, which extends our previous work for binary phenotypes, permits straightforward inclusion of covariate, gene-gene and gene-covariate interaction terms in the likelihood, incorporates a simple model for ascertainment and allows for family-specific effects in the hypothesis test. Additionally, our method produces interpretable parameter estimates and valid confidence intervals. We assess the proposed method using simulated data, and apply it to a polymorphism in the c-reactive protein (CRP) gene typed in families collected to investigate human systemic lupus erythematosus. By including sex interactions in the analysis, we show that the polymorphism is associated with anti-nuclear autoantibody (ANA) production in females, while there appears to be no effect in males.  相似文献   

15.
A novel semiparametric regression model is developed for evaluating the covariate‐specific accuracy of a continuous medical test or biomarker. Ideally, studies designed to estimate or compare medical test accuracy will use a separate, flawless gold‐standard procedure to determine the true disease status of sampled individuals. We treat this as a special case of the more complicated and increasingly common scenario in which disease status is unknown because a gold‐standard procedure does not exist or is too costly or invasive for widespread use. To compensate for missing data on disease status, covariate information is used to discriminate between diseased and healthy units. We thus model the probability of disease as a function of ‘disease covariates’. In addition, we model test/biomarker outcome data to depend on ‘test covariates’, which provides researchers the opportunity to quantify the impact of covariates on the accuracy of a medical test. We further model the distributions of test outcomes using flexible semiparametric classes. An important new theoretical result demonstrating model identifiability under mild conditions is presented. The modeling framework can be used to obtain inferences about covariate‐specific test accuracy and the probability of disease based on subject‐specific disease and test covariate information. The value of the model is illustrated using multiple simulation studies and data on the age‐adjusted ability of soluble epidermal growth factor receptor – a ubiquitous serum protein – to serve as a biomarker of lung cancer in men. SAS code for fitting the model is provided. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

16.
Analyses were performed on lipid data from the NHLBI Veteran Twin Study. The analyses focused on longitudinal multivariate models, describing how the genetic effects on lipids vary over time. Our pedigree-based model selection approach allows simultaneous estimation of both covariance structure parameters and regression parameters. The analyses reveal strong correlations between additive genetic effects over time, implying that genetic effects on lipids are somewhat constant throughout the life span represented within this sample. Both univariate preliminary analyses and robust fitting applied to the longitudinal models indicate that several assumptions underlying the twin analyses are violated. Although variance component and correlation parameter estimates are not much changed by robust fitting analyses, questions remain about the behavior of parameter estimates in multivariate genetic models under departures from model assumptions. © 1993 Wiley-Liss, Inc.  相似文献   

17.
Joint effects of genetic and environmental factors have been increasingly recognized in the development of many complex human diseases. Despite the popularity of case‐control and case‐only designs, longitudinal cohort studies that can capture time‐varying outcome and exposure information have long been recommended for gene–environment (G × E) interactions. To date, literature on sampling designs for longitudinal studies of G × E interaction is quite limited. We therefore consider designs that can prioritize a subsample of the existing cohort for retrospective genotyping on the basis of currently available outcome, exposure, and covariate data. In this work, we propose stratified sampling based on summaries of individual exposures and outcome trajectories and develop a full conditional likelihood approach for estimation that adjusts for the biased sample. We compare the performance of our proposed design and analysis with combinations of different sampling designs and estimation approaches via simulation. We observe that the full conditional likelihood provides improved estimates for the G × E interaction and joint exposure effects over uncorrected complete‐case analysis, and the exposure enriched outcome trajectory dependent design outperforms other designs in terms of estimation efficiency and power for detection of the G × E interaction. We also illustrate our design and analysis using data from the Normative Aging Study, an ongoing longitudinal cohort study initiated by the Veterans Administration in 1963. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

18.
Although gene‐environment (G× E) interactions play an important role in many biological systems, detecting these interactions within genome‐wide data can be challenging due to the loss in statistical power incurred by multiple hypothesis correction. To address the challenge of poor power and the limitations of existing multistage methods, we recently developed a screening‐testing approach for G× E interaction detection that combines elastic net penalized regression with joint estimation to support a single omnibus test for the presence of G× E interactions. In our original work on this technique, however, we did not assess type I error control or power and evaluated the method using just a single, small bladder cancer data set. In this paper, we extend the original method in two important directions and provide a more rigorous performance evaluation. First, we introduce a hierarchical false discovery rate approach to formally assess the significance of individual G× E interactions. Second, to support the analysis of truly genome‐wide data sets, we incorporate a score statistic‐based prescreening step to reduce the number of single nucleotide polymorphisms prior to fitting the first stage penalized regression model. To assess the statistical properties of our method, we compare the type I error rate and statistical power of our approach with competing techniques using both simple simulation designs as well as designs based on real disease architectures. Finally, we demonstrate the ability of our approach to identify biologically plausible SNP‐education interactions relative to Alzheimer's disease status using genome‐wide association study data from the Alzheimer's Disease Neuroimaging Initiative (ADNI).  相似文献   

19.
This paper is concerned with evaluating whether an interaction between two sets of risk factors for a binary trait is removable and, when it is removable, fitting a parsimonious additive model using a suitable link function to estimate the disease odds (on the natural logarithm scale). Statisticians define the term ‘interaction’ as a departure from additivity in a linear model on a specific scale on which the data are measured. Certain interactions may be eliminated via a transformation of the outcome such that the relationship between the risk factors and the outcome is additive on the transformed scale. Such interactions are known as removable interactions. We develop a novel test statistic for detecting the presence of a removable interaction in case–control studies. We consider the Guerrero and Johnson family of transformations and show that this family constitutes an appropriate link function for fitting an additive model when an interaction is removable. We use simulation studies to examine the type I error and power of the proposed test and to show that, when an interaction is removable, an additive model based on the Guerrero and Johnson link function leads to more precise estimates of the disease odds parameters and a better fit. We illustrate the proposed test and use of the transformation by using case–control data from three published studies. Finally, we indicate how one can check that, after transformation, no further interaction is significant. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

20.
It is well known that measurement error in the covariates of regression models generally causes bias in parameter estimates. Correction for such biases requires information concerning the measurement error, which is often in the form of internal validation or replication data. Regression calibration (RC) is a popular approach to correct for covariate measurement error, which involves predicting the true covariate using error‐prone measurements. Likelihood methods have previously been proposed as an alternative approach to estimate the parameters in models affected by measurement error, but have been relatively infrequently employed in medical statistics and epidemiology, partly because of computational complexity and concerns regarding robustness to distributional assumptions. We show how a standard random‐intercepts model can be used to obtain maximum likelihood (ML) estimates when the outcome model is linear or logistic regression under certain normality assumptions, when internal error‐prone replicate measurements are available. Through simulations we show that for linear regression, ML gives more efficient estimates than RC, although the gain is typically small. Furthermore, we show that RC and ML estimates remain consistent even when the normality assumptions are violated. For logistic regression, our implementation of ML is consistent if the true covariate is conditionally normal given the outcome, in contrast to RC. In simulations, this ML estimator showed less bias in situations where RC gives non‐negligible biases. Our proposal makes the ML approach to dealing with covariate measurement error more accessible to researchers, which we hope will improve its viability as a useful alternative to methods such as RC. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号