首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Most existing association tests for genome‐wide association studies (GWASs) fail to account for genetic heterogeneity. Zhou and Pan proposed a binomial‐mixture‐model‐based association test to account for the possible genetic heterogeneity in case‐control studies. The idea is elegant, however, the proposed test requires an expectation‐maximization (EM)‐type iterative algorithm to identify the penalised maximum likelihood estimates and a permutation method to assess p‐values. The intensive computational burden induced by the EM‐algorithm and the permutation becomes prohibitive for direct applications to GWASs. This paper develops a likelihood ratio test (LRT) for GWASs under genetic heterogeneity based on a more general alternative mixture model. In particular, a closed‐form formula for the LRT statistic is derived to avoid the EM‐type iterative numerical evaluation. Moreover, an explicit asymptotic null distribution is also obtained, which avoids using the permutation to obtain p‐values. Thus, the proposed LRT is easy to implement for GWASs. Furthermore, numerical studies demonstrate that the LRT has power advantages over the commonly used Armitage trend test and other existing association tests under genetic heterogeneity. A breast cancer GWAS dataset is used to illustrate the newly proposed LRT.  相似文献   

2.
In family-based association studies, an optimal test statistic with asymptotic normal distribution is available when the underlying genetic model is known (e.g., recessive, additive, multiplicative, or dominant). In practice, however, genetic models for many complex diseases are usually unknown. Using a single test statistic optimal for one genetic model may lose substantial power when the model is mis-specified. When a family of genetic models is scientifically plausible, the maximum of several tests, each optimal for a specific genetic model, is robust against the model mis-specification. This robust test is preferred over a single optimal test. Recently, cost-effective group sequential approaches have been introduced to genetic studies. The group sequential approach allows interim analyses and has been applied to many test statistics, but not to the maximum statistic. When the group sequential method is applied, type I error should be controlled. We propose and compare several approaches of controlling type I error rates when group sequential analysis is conducted with the maximum test for family-based candidate-gene association studies. For a two-stage group sequential robust procedure with a single interim analysis, two critical values for the maximum tests are provided based on a given alpha spending function to control the desired overall type I error.  相似文献   

3.
Gene selection is an important issue in analyzing multiclass microarray data. Among many proposed selection methods, the traditional ANOVA F test statistic has been employed to identify informative genes for both class prediction (classification) and discovery problems. However, the F test statistic assumes an equal variance. This assumption may not be realistic for gene expression data. This paper explores other alternative test statistics which can handle heterogeneity of the variances. We study five such test statistics, which include Brown-Forsythe test statistic and Welch test statistic. Their performance is evaluated and compared with that of F statistic over different classification methods applied to publicly available microarray datasets.  相似文献   

4.
Locus heterogeneity is a common phenomenon in complex diseases and is one of the most important factors that affect the power of either linkage or linkage disequilibrium (LD) analysis. In linkage analysis, the heterogeneity LOD score (HLOD) rather than LOD itself is often used. However, the existing methods for detecting linkage disequilibrium, such as the TDT and many of its variants, do not take into account locus heterogeneity. We propose two novel likelihood-based methods, an LD-Het likelihood and an LD-multinomial likelihood, to test linkage disequilibrium (LD) that explicitly incorporate locus heterogeneity in the analysis. The LD-Het is applicable to general nuclear family data but requires a working penetrance model. The LD-multinomial is only applicable to affected sib-pair data but does not require specification of a trait model. For affected sib-pair data, both methods have similar power to detect LD under the recessive model, but the LD-multinomial model has greater power when the underlying model is dominant or additive.  相似文献   

5.
Traditional quantitative trait locus (QTL) analysis focuses on identifying loci associated with mean heterogeneity. Recent research has discovered loci associated with phenotype variance heterogeneity (vQTL), which is important in studying genetic association with complex traits, especially for identifying gene–gene and gene–environment interactions. While several tests have been proposed to detect vQTL for unrelated individuals, there are no tests for related individuals, commonly seen in family‐based genetic studies. Here we introduce a likelihood ratio test (LRT) for identifying mean and variance heterogeneity simultaneously or for either effect alone, adjusting for covariates and family relatedness using a linear mixed effect model approach. The LRT test statistic for normally distributed quantitative traits approximately follows χ2‐distributions. To correct for inflated Type I error for non‐normally distributed quantitative traits, we propose a parametric bootstrap‐based LRT that removes the best linear unbiased prediction (BLUP) of family random effect. Simulation studies show that our family‐based test controls Type I error and has good power, while Type I error inflation is observed when family relatedness is ignored. We demonstrate the utility and efficiency gains of the proposed method using data from the Framingham Heart Study to detect loci associated with body mass index (BMI) variability.  相似文献   

6.
Multivariate linkage analysis has been suggested for the analysis of correlated traits, such as blood pressure (BP) and body mass index (BMI), because it may offer greater power and provide clearer results than univariate analyses. Currently, the most commonly used multivariate linkage methods are extensions of the univariate variance component model. One concern about those methods is their inherent sensitivity to the assumption of multivariate normality which cannot be easily guaranteed in practice. Another problem possibly related to all multivariate linkage analysis methods is the difficulty in interpreting nominal p-values, because the asymptotic distribution of the test statistic has not been well characterized. Here we propose a regression-based multivariate linkage method in which a robust score statistic is used to detect linkage. The p-value of the statistic is evaluated by a simple and rapid simulation procedure. Theoretically, this method can be used for any number and type of traits and for general pedigree data. We apply this approach to a genome linkage analysis of blood pressure and body mass index data from the Beaver Dam Eye Study.  相似文献   

7.
Genome‐wide association studies (GWAS) are used to investigate genetic variants contributing to complex traits. Despite discovering many loci, a large proportion of “missing” heritability remains unexplained. Gene–gene interactions may help explain some of this gap. Traditionally, gene–gene interactions have been evaluated using parametric statistical methods such as linear and logistic regression, with multifactor dimensionality reduction (MDR) used to address sparseness of data in high dimensions. We propose a method for the analysis of gene–gene interactions across independent single‐nucleotide polymorphisms (SNPs) in two genes. Typical methods for this problem use statistics based on an asymptotic chi‐squared mixture distribution, which is not easy to use. Here, we propose a Kullback–Leibler‐type statistic, which follows an asymptotic, positive, normal distribution under the null hypothesis of no relationship between SNPs in the two genes, and normally distributed under the alternative hypothesis. The performance of the proposed method is evaluated by simulation studies, which show promising results. The method is also used to analyze real data and identifies gene–gene interactions among RAB3A, MADD, and PTPRN on type 2 diabetes (T2D) status.  相似文献   

8.
The likelihood ratio test of nested models for family data plays an important role in the assessment of genetic and environmental influences on the variation in traits. The test is routinely based on the assumption that the test statistic follows a chi-square distribution under the null, with the number of restricted parameters as degrees of freedom. However, tests of variance components constrained to be non-negative correspond to tests of parameters on the boundary of the parameter space. In this situation the standard test procedure provides too large p-values and the use of the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC) for model selection is problematic. Focusing on the classical ACE twin model for univariate traits, we adapt existing theory to show that the asymptotic distribution for the likelihood ratio statistic is a mixture of chi-square distributions, and we derive the mixing probabilities. We conclude that when testing the AE or the CE model against the ACE model, the p-values obtained from using the chi(2)(1 df) as the reference distribution should be halved. When the E model is tested against the ACE model, a mixture of chi(2)(0 df), chi(2)(1 df) and chi(2)(2 df) should be used as the reference distribution, and we provide a simple formula to compute the mixing probabilities. Similar results for tests of the AE, DE and E models against the ADE model are also derived. Failing to use the appropriate reference distribution can lead to invalid conclusions.  相似文献   

9.
We find accurate approximations for the expected number of three-cycles and unchorded four-cycles under a stochastic distribution for graphs that has been proposed for modelling yeast two-hybrid protein-protein interaction networks. We show that unchorded four-cycles are characteristic motifs under this model and that the count of unchorded four-cycles in the graph is a reliable statistic on which to base parameter estimation. Finally, we test our model against a range of experimental data, obtain parameter estimates from these data and investigate possible improvements in the model. Characterization of this model lays the foundation for its use as a prior distribution in a Bayesian analysis of yeast two-hybrid networks that can potentially aid in identifying false-positive and false-negative results.  相似文献   

10.
When phenotypic, but no genotypic data are available for relatives of participants in genetic association studies, previous research has shown that family-based imputed genotypes can boost the statistical power when included in such studies. Here, using simulations, we compared the performance of two statistical approaches suitable to model imputed genotype data: the mixture approach, which involves the full distribution of the imputed genotypes and the dosage approach, where the mean of the conditional distribution features as the imputed genotype. Simulations were run by varying sibship size, size of the phenotypic correlations among siblings, imputation accuracy and minor allele frequency of the causal SNP. Furthermore, as imputing sibling data and extending the model to include sibships of size two or greater requires modeling the familial covariance matrix, we inquired whether model misspecification affects power. Finally, the results obtained via simulations were empirically verified in two datasets with continuous phenotype data (height) and with a dichotomous phenotype (smoking initiation). Across the settings considered, the mixture and the dosage approach are equally powerful and both produce unbiased parameter estimates. In addition, the likelihood-ratio test in the linear mixed model appears to be robust to the considered misspecification in the background covariance structure, given low to moderate phenotypic correlations among siblings. Empirical results show that the inclusion in association analysis of imputed sibling genotypes does not always result in larger test statistic. The actual test statistic may drop in value due to small effect sizes. That is, if the power benefit is small, that the change in distribution of the test statistic under the alternative is relatively small, the probability is greater of obtaining a smaller test statistic. As the genetic effects are typically hypothesized to be small, in practice, the decision on whether family-based imputation could be used as a means to increase power should be informed by prior power calculations and by the consideration of the background correlation.  相似文献   

11.
Jackola DR 《Molecular immunology》2007,44(10):2549-2557
The complex inherited human atopic diseases are associated with adverse IgE-mediated immune responses, notably allergen-specific IgE that presumably involves the input from one or more genes. However, gene searches have met with limited success, possibly because a causally direct gene input-trait outcome assumption is not valid for these immune responses. To test this assumption, we determined the probability distributions of quantitative IgE responses associated with atopy, and used these to determine the statistical interdependence among first-degree relatives (parent-child and sibling-sibling) from families with history of atopic asthma (total available N=1099). Each person was screened for asthma history, pulmonary responses by spirometry and atopic immune responses using serum total IgE and skin prick tests (SPT) to 14 allergens. Heritability estimates were made by variance components analysis for quantitative IgE traits. The serum total IgE distribution comprised statistically independent sub-sets when individuals were categorized as either SPT [-] or SPT [+], reflecting contributions from non-pathology associated basal IgE and pathology-associated allergen-specific IgE. However, heritability estimates were significant only for basal IgE, while total allergen-specific IgE production was a random variable independent of inheritance. Genes for specific IgE-mediated responses are not obligately inherited. Rather, gene products that modulate underlying stimulus-response coupling interactions and alter the probabilities influencing adverse immune responses are inherited, but an individual's specific pathologic outcome is a random variable. These results support a model of "stochastic bias" that "skews" an immune response to non-infectious antigens among people with an inherited predisposition for atopy.  相似文献   

12.
Population‐based genetic association analysis may suffer from the failure to control for confounders such as population stratification (PS). There has been extensive study on the influence of PS on candidate gene‐disease association analysis, but much less attention has been paid to its influence on marker‐disease association analysis. In this paper, we focus on the Pearson χ2 test and the trend test for marker‐disease association analysis. The mean and variance of the test statistics are derived under presence of PS, so that the power and inflated type I error rate can be evaluated. It is shown that the bias and the variance distortion are not zero in the presence of both PS and penetrance heterogeneity (PH). Unlike candidate gene‐disease association analysis, when PS is present, the bias is not zero no matter whether PH is present or not. This work generalises the published results, where only the fully recessive penetrance model is considered and only the bias is calculated. It is shown that candidate gene‐disease association analysis can be treated as a special case of marker‐disease association analysis. Consequently, our results extend previous studies on candidate gene‐disease association analysis. A simulation study confirms the theoretical findings.  相似文献   

13.
It has been shown that it is preferable to use a robust model that incorporated constraints on the genotype relative risk rather than rely on a model that assumes the disease operates in a recessive or dominant fashion. Previous methods are applicable to case-control studies, but not to family based studies of case children along with their parents (triads). We show here how to implement analogous constraints while analyzing triad data. The likelihood, conditional on the parents genotype, is maximized over the appropriately constrained parameter space. The asymptotic distribution for the maximized likelihood ratio statistic is found and used to estimate the null distribution of the test statistics. The properties of several methods of testing for association are compared by simulation. The constrained method provides higher power across a wide range of genetic models with little cost when compared to methods that restrict to a dominant, recessive, or multiplicative model, or make no modeling restriction. The methods are applied to two SNPs on the methylenetetrahydrofolate reductase ( MTHFR ) gene with neural tube defect (NTD) triads.  相似文献   

14.
It has been known for some time that regional blood flows within an organ are not uniform. Useful measures of heterogeneity of regional blood flows are the standard deviation and coefficient of variation or relative dispersion of the probability density function (PDF) of regional flows obtained from the regional concentrations of tracers that are deposited in proportion to blood flow. When a mathematical model is used to analyze dilution curves after tracer solute administration, for many solutes it is important to account for flow heterogeneity and the wide range of transit times through multiple pathways in parallel. Failure to do so leads to bias in the estimates of volumes of distribution and membrane conductances. Since in practice the number of paths used should be relatively small, the analysis is sensitive to the choice of the individual elements used to approximate the distribution of flows or transit times. Presented here is a method for modeling heterogeneous flow through an organ using a scheme that covers both the high flow and long transit time extremes of the flow distribution. With this method, numerical experiments are performed to determine the errors made in estimating parameters when flow heterogeneity is ignored, in both the absence and presence of noise. The magnitude of the errors in the estimates depends upon the system parameters, the amount of flow heterogeneity present, and whether the shape of the input function is known. In some cases, some parameters may be estimated to within 10% when heterogeneity is ignored (homogeneous model), but errors of 15–20% may result, even when the level of heterogeneity is modest. In repeated trials in the presence of 5% noise, the mean of the estimates was always closer to the true value with the heterogeneous model than when heterogeneity was ignored, but the distributions of the estimates from the homogeneous and heterogeneous models overlapped for some parameters when outflow dilution curves were analyzed. The separation between the ditributions was further reduced when tissue content curves were analyzed. It is concluded that multipath models accounting for flow heterogeneity are a vehicle for assessing the effects of flow heterogeneity under the conditions applicable to specific laboratory protocols, that efforts should be made to assess the actual level of flow heterogeneity in the organ being studied, and that the errors in parameter estimates are generally smaller when the input function is known rather than estimated by deconvolution.  相似文献   

15.
Meta-analysis is a commonly used approach to increase the sample size for genome-wide association searches when individual studies are otherwise underpowered. Here, we present a meta-analysis procedure to estimate the heterogeneity of the quantitative trait variance attributable to genetic variants using Levene''s test without needing to exchange individual-level data. The meta-analysis of Levene''s test offers the opportunity to combine the considerable sample size of a genome-wide meta-analysis to identify the genetic basis of phenotypic variability and to prioritize single-nucleotide polymorphisms (SNPs) for gene–gene and gene–environment interactions. The use of Levene''s test has several advantages, including robustness to departure from the normality assumption, freedom from the influence of the main effects of SNPs, and no assumption of an additive genetic model. We conducted a meta-analysis of the log-transformed body mass index of 5892 individuals and identified a variant with a highly suggestive Levene''s test P-value of 4.28E-06 near the NEGR1 locus known to be associated with extreme obesity.  相似文献   

16.
Most of the existing association tests for population-based case-control studies are based on comparing the mean genotype scores between the case and control groups, which may not be efficient under genetic heterogeneity. Given that most common diseases are genetically heterogeneous, caused by mutations in multiple loci, it may be beneficial to fully account for genetic heterogeneity in an association test. Here we first propose a binomial mixture model for such a purpose and develop a corresponding mixture likelihood ratio test (MLRT) for a single locus. We also consider two methods to combine single-locus-based MLRTs across multiple loci in linkage disequilibrium to boost power when causal SNPs are not genotyped. We show with a wide spectrum of numerical examples that under genetic heterogeneity the proposed tests are more powerful than some commonly used association tests.  相似文献   

17.
We present here a quantitative way to assess the impact of language-family boundaries on population differentiation and to evaluate the homogeneity of the genetic processes along these boundaries. Our estimator (delta a) of the impact of the boundary is based on an isolation by distance (IBD) model and measures the added genetic distance between populations located on different sides of the boundary. We compare this statistic with another estimator of group differentiation (F(CT)) computed under an analysis of variance framework that does not assume any particular spatial structure of the populations. Monte Carlo simulations are used to study the behaviour of these statistics under a two-dimensional stepping-stone model. Simulations show that F(CT) can suggest the existence of a frontier when populations only differ because of IBD. This spurious behaviour is much less frequent for the delta a statistic. However, the large variance associated with the delta a statistic, and the fact that it should only be computed in the presence of IBD, may limit the use of this statistic. Overall, the origin and the effect of the boundary is best understood by comparing different statistics and by testing for the presence of IBD on each side of the boundary as well as across the boundary. We illustrate our approach by examining the boundary between Afro-Asiatic and Indo-European populations. These populations are globally genetically differentiated, but the effect of the linguistic boundary on gene flow seems geographically very heterogeneous. This boundary appears to be the result of a secondary contact between two differentiation centres rather than an enhancer of population differentiation.  相似文献   

18.
In this paper is investigated the use of the scan statistic for evaluating the detectability of small nodules in medical images. The scan-statistic method is often used in applications in which random fields must be searched for abnormal local features. Several results of the detection with localization theory are reviewed and a generalization is presented using the noise nodule distribution obtained by scanning arbitrary areas. One benefit of the noise nodule model is that it enables determination of the scan-statistic distribution by using only a few image samples in a way suitable both for simulation and experimental setups. Also, based on the noise nodule model, the case of multiple targets per image is addressed and an image abnormality test using the likelihood ratio and an alternative test using multiple decision thresholds are derived. The results obtained reveal that in the case of low contrast nodules or multiple nodules the usual test strategy based on a single decision threshold underperforms compared with the alternative tests. That is a consequence of the fact that not only the contrast or the size, but also the number of suspicious nodules is a clue indicating the image abnormality. In the case of the likelihood ratio test, the multiple clues are unified in a single decision variable. Other tests that process multiple clues differently do not necessarily produce a unique ROC curve, as shown in examples using a test involving two decision thresholds. We present examples with two-dimensional time-of-flight (TOF) and non-TOF PET image sets analysed using the scan statistic for different search areas, as well as the fixed position observer.  相似文献   

19.
In case–control genetic association studies, a standard practice is to perform the Cochran‐Armitage (CA) trend test under the assumption of the additive model because of its robustness. We could even identify situations in which it outperformed the analysis model consistent with the underlying inheritance mode. In this article, we analytically reveal the statistical basis that leads to the phenomenon. By elucidating the origin of the CA trend test as a linear regression model, we decompose Pearson's χ2‐test statistic into two components—one is the CA trend test statistic that measures the goodness of fit of the linear regression model, and the other measures the discrepancy between data and the linear regression model. Under this framework, we show that the additive coding scheme, as well as the multiplicative coding scheme, increases the coefficient of determination of the regression model by increasing the spread of data points. We also obtain the conditions under which the CA trend test statistic equals the MAX statistic and Pearson's χ2‐test statistic.  相似文献   

20.
In the case‐parents design for testing candidate‐gene association, the conditional likelihood method based on genotype relative risks has been developed recently. A specific relation of the genotype relative risks is referred to as a genetic model. The efficient score tests have been used when the genetic model is correctly specified under the alternative hypothesis. In practice, however, it is usually not able to specify the genetic model correctly. In the latter situation, tests such as the likelihood ratio test (LRT) and the MAX3 (the maximum of the three score statistics for dominant, additive, and recessive models) have been used. In this paper, we consider the restricted likelihood ratio test (RLRT). For a specific genetic model, simulation results demonstrate that RLRT is asymptotically equivalent to the score test, and both are more powerful than the LRT. When the genetic model cannot be correctly specified, the simulation results show that RLRT is most robust and powerful in the situations we studied. MAX3 is the next most robust and powerful test. The TDT is the easiest statistic to compute, compared to MAX3 and RLRT. When the recessive model can be eliminated, it is also as robust and powerful as RLRT for other genetic models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号