首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Family-based study designs have an important role in the search for association between disease phenotypes and genetic markers. Unlike traditional case-control methods, family-based tests use within-family data to avoid identification of spurious associations that may result from population admixture. Many family-based association tests have been proposed to accommodate a variety of ascertainment schemes and patterns of missing data. In this report, we describe exact family-based association tests for biallelic data. Specifically, we discuss test of the null hypotheses "no linkage and no association" and "linkage, but no association". These tests, which are valid under various models for inheritance and patterns of missingness, utilize the procedure proposed by Rabinowitz and Laird [2000: Hum Hered 50:211-223] that provides a unified framework for family based association testing (FBAT). The conditioning approach implemented in FBAT makes an exact test conceptually straightforward, but computationally difficult since the minimum sufficient statistics upon which we condition do not have a conventional form. An exact test may be especially critical when accurate computation of the extreme area of the FBAT statistic is needed, such as when the study design necessitates multiple comparisons adjustments. We describe the exact approach as a useful alternative to the asymptotic test and show that the exact tests for biallelic data may be most useful for the recessive disease model.  相似文献   

2.
    
A genome‐wide association study (GWAS) correlates marker and trait variation in a study sample. Each subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here, we assume that subjects are randomly collected unrelateds and that trait values are normally distributed or can be transformed to normality. Over the past decade, geneticists have been remarkably successful in applying GWAS analysis to hundreds of traits. The massive amount of data produced in these studies present unique computational challenges. Penalized regression with the ?1 penalty (LASSO) or minimax concave penalty (MCP) penalties is capable of selecting a handful of associated SNPs from millions of potential SNPs. Unfortunately, model selection can be corrupted by false positives and false negatives, obscuring the genetic underpinning of a trait. Here, we compare LASSO and MCP penalized regression to iterative hard thresholding (IHT). On GWAS regression data, IHT is better at model selection and comparable in speed to both methods of penalized regression. This conclusion holds for both simulated and real GWAS data. IHT fosters parallelization and scales well in problems with large numbers of causal markers. Our parallel implementation of IHT accommodates SNP genotype compression and exploits multiple CPU cores and graphics processing units (GPUs). This allows statistical geneticists to leverage commodity desktop computers in GWAS analysis and to avoid supercomputing. Availability : Source code is freely available at https://github.com/klkeys/IHT.jl .  相似文献   

3.
The genetic dissection of quantitative traits, or endophenotypes, usually involves genetic linkage or association analysis in pedigrees and subsequent fine mapping association analysis in the population. The ascertainment procedure for quantitative traits often results in unequal variance of observations. For example, some phenotypes may be clinically measured whilst others are from self‐reports, or phenotypes may be the average of multiple measures but with the number of measurements varying. The resulting heterogeneity of variance poses no real problem for analysis, as long as it is properly modelled and thereby taken into account. However, if statistical significance is determined using an empirical permutation procedure, it is not obvious what the units of sampling are. We investigated a number of permutation approaches in a simulation study of an association analysis between a quantitative trait and a single nucleotide polymorphism. Our simulations were designed such that we knew the true p‐value of the test statistics. A number of permutation methods were compared from the regression of true on empirical p‐values and the precision of the empirical p‐values. We show that the best procedure involves an implicit adjustment of the original data for the effects in the model before permutation, and that other methods, some of which seemed appropriate a priori, are relatively biased. Genet. Epidemiol. 33:710–716, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

4.
Lu X  Takala EP 《Statistics in medicine》2008,27(22):4549-4568
Musculoskeletal disorders are common in prolonged computer use. The dynamics of the relationship between musculoskeletal outcomes and duration of computer use is not easy to model by common statistical methods due to the nonlinearity of the data. The use of inappropriate statistical models increases the likelihood of drawing wrong conclusions. In this paper we present a new mathematical methodology for modelling such a dynamic data set. Data on simultaneous direct measures of computer usage and questionnaire diaries of discomfort ratings were analysed using singular value decomposition as a means to track dominant dynamic trends, which presented the common characteristic of the relationship between computer-related workload and musculoskeletal outcomes over time. The relationship was constructed explicitly as a dose-response functional relationship that was parametrized by body region parameters. Standard statistical software was employed to quantify the variability of the estimates due to the complicated feature involved in the data. A deterministic population model was then developed to simulate the dynamics of computer-related workload and musculoskeletal outcomes. A stochastic model was also proposed to describe the stochastic nature of dynamics for individual subjects. The model was verified by comparison with real data for forecasting purposes. Possibilities for extending the model to accommodate more complicated medical data are discussed. The mathematical model can be served as both a deterministic population model that can project the demographic consequences and a stochastic model that can describe the whole quantity evolution of the study subjects. The proposed methodology is a flexible and broadly applicable one, which can be utilized by a variety of medical research. Copyright (c) 2008 John Wiley & Sons, Ltd.  相似文献   

5.
Advances in statistical human genetics over the last 25 years   总被引:4,自引:0,他引:4  
The past 25 years has seen an explosion in the number of genetic markers that can be measured on DNA samples at an ever decreasing cost. Although basic statistical methods for analysing such data gathered on samples of either independent individuals or family members, one or two markers at a time, were already well developed before this explosion occurred, there has been a corresponding burst in activity to develop multiple marker models to find disease-causing gene variants, capitalizing on the data that have become available, to increase the power of such methods. This has required the concomitant development of faster algorithms to speed up the computation of various likelihoods. For linkage analysis, to obtain the approximate locations for genes of interest, Mendelian segregation models have been extended to be more realistic and statistical models that do not assume specific modes of inheritance have been extended to allow for the analysis of larger pedigree structures. For association analysis, to obtain more precise locations for genes of interest, the recent completion of the first stage of the HapMap project has spurred the development, still underway, of novel experimental designs and analytical methods to combat the curse of dimensionality and the resulting multiple testing problem. Perhaps the greatest current challenge concerns how best to gather and synthesize the many lines of evidence possible in order to discover the genetic determinants underlying complex diseases.  相似文献   

6.
7.
In this paper, we argue that causal effect models for realistic individualized treatment rules represent an attractive tool for analyzing sequentially randomized trials. Unlike a number of methods proposed previously, this approach does not rely on the assumption that intermediate outcomes are discrete or that models for the distributions of these intermediate outcomes given the observed past are correctly specified. In addition, it generalizes the methodology for performing pairwise comparisons between individualized treatment rules by allowing the user to posit a marginal structural model for all candidate treatment rules simultaneously. This is particularly useful if the number of such rules is large, in which case an approach based on individual pairwise comparisons would be likely to suffer from too much sampling variability to provide an informative answer. In addition, such causal effect models represent an interesting alternative to methods previously proposed for selecting an optimal individualized treatment rule in that they immediately give the user a sense of how the optimal outcome is estimated to change in the neighborhood of the identified optimum. We discuss an inverse-probability-of-treatment-weighted (IPTW) estimator for these causal effect models, which is straightforward to implement using standard statistical software, and develop an approach for constructing valid asymptotic confidence intervals based on the influence curve of this estimator. The methodology is illustrated in two simulation studies that are intended to mimic an HIV/AIDS trial.  相似文献   

8.
The completion of the HapMap Project and the development of high-throughput single nucleotide polymorphism genotyping technologies have greatly enhanced the prospects of identifying and characterizing the genetic variants that influence complex traits. In principle, association analysis of haplotypes rather than single nucleotide polymorphisms may better capture an underlying causal variant, but the multiple haplotypes can lead to reduced statistical power due to the testing of (and need to correct for) a large number of haplotypes. This paper presents a novel method based on clustering similar haplotypes to address this issue. The method, implemented in the CLUMPHAP program, is an extension of the CLUMP program designed for the analysis of multi-allelic markers (Sham and Curtis [1995] Ann. Hum. Genet. 59(Pt1):97-105). CLUMPHAP performs a hierarchical clustering of the haplotypes and then computes the chi(2) statistic between each haplotype cluster and disease; the statistical significance of the largest of the chi(2) statistics is obtained by permutation testing. A significant result suggests that the presence of a disease-causing variant in the haplotype cluster is over-represented in cases. Using simulation studies, we have compared CLUMPHAP and more widely used approaches in terms of their statistical power to identify an untyped susceptibility locus. Our results show that CLUMPHAP tends to have greater power than the omnibus haplotype test and is comparable in power to multiple regression locus-coding approaches.  相似文献   

9.
The development of a new method for testing the association of genetic markers with disease is presented. This approach is applicable when sampling nuclear families with one or more affected siblings and when neither, one, or both parents are missing marker genotype data. All siblings, affected and not affected, are used to probabilistically infer the missing parental marker data. A likelihood ratio statistic, which treats marker allele frequencies as nuisance parameters, is presented to test whether all marker relative risks are equal to one (i.e., no marker association). This approach offers a solution to test for marker associations when parents are difficult to obtain. © 1997 Wiley-Liss, Inc.  相似文献   

10.
    
Unraveling the underlying biological mechanisms or pathways behind the effects of genetic variations on complex diseases remains one of the major challenges in the post‐GWAS (where GWAS is genome‐wide association study) era. To further explore the relationship between genetic variations, biomarkers, and diseases for elucidating underlying pathological mechanism, a huge effort has been placed on examining pleiotropic and gene‐environmental interaction effects. We propose a novel genetic stochastic process model (GSPM) that can be applied to GWAS and jointly investigate the genetic effects on longitudinally measured biomarkers and risks of diseases. This model is characterized by more profound biological interpretation and takes into account the dynamics of biomarkers during follow‐up when investigating the hazards of a disease. We illustrate the rationale and evaluate the performance of the proposed model through two GWAS. One is to detect single nucleotide polymorphisms (SNPs) having interaction effects on type 2 diabetes (T2D) with body mass index (BMI) and the other is to detect SNPs affecting the optimal BMI level for protecting from T2D. We identified multiple SNPs that showed interaction effects with BMI on T2D, including a novel SNP rs11757677 in the CDKAL1 gene (P = 5.77 × 10?7). We also found a SNP rs1551133 located on 2q14.2 that reversed the effect of BMI on T2D (P = 6.70 × 10?7). In conclusion, the proposed GSPM provides a promising and useful tool in GWAS of longitudinal data for interrogating pleiotropic and interaction effects to gain more insights into the relationship between genes, quantitative biomarkers, and risks of complex diseases.  相似文献   

11.
Due to the optional sampling effect in a sequential design, the maximum likelihood estimator (MLE) following sequential tests is generally biased. In a typical two-stage design employed in a phase II clinical trial in cancer drug screening, a fixed number of patients are enrolled initially. The trial may be terminated for lack of clinical efficacy of treatment if the observed number of treatment responses after the first stage is too small. Otherwise, an additional fixed number of patients are enrolled to accumulate additional information on efficacy as well as on safety. There have been numerous suggestions for design of such two-stage studies. Here we establish that under the two-stage design the sufficient statistic, i.e. stopping stage and the number of treatment responses, for the parameter of the binomial distribution is also complete. Then, based on the Rao-Blackwell theorem, we derive the uniformly minimum variance unbiased estimator (UMVUE) as the conditional expectation of an unbiased estimator, which in this case is simply the maximum likelihood estimator based only on the first stage data, given the complete sufficient statistic. Our results generalize to a multistage design. We will illustrate features of the UMVUE based on two-stage phase II clinical trial design examples and present results of numerical studies on the properties of the UMVUE in comparison to the usual MLE.  相似文献   

12.
Genome wide association studies (GWAS) have revealed many fascinating insights into complex diseases even from simple, single-marker statistical tests. Most of these tests are designed for testing of associations between a phenotype and an autosomal genotype and are therefore not applicable to X chromosome data. Testing for association on the X chromosome raises unique challenges that have motivated the development of X-specific statistical tests in the literature. However, to date there has been no study of these methods under a wide range of realistic study designs, allele frequencies and disease models to assess the size and power of each test. To address this, we have performed an extensive simulation study to investigate the effects of the sex ratios in the case and control cohorts, as well as the allele frequencies, on the size and power of eight test statistics under three different disease models that each account for X-inactivation. We show that existing, but under-used, methods that make use of both male and female data are uniformly more powerful than popular methods that make use of only female data. In particular, we show that Clayton's one degree of freedom statistic [Clayton, 2008] is robust and powerful across a wide range of realistic simulation parameters. Our results provide guidance on selecting the most appropriate test statistic to analyse X chromosome data from GWAS and show that much power can be gained by a more careful analysis of X chromosome GWAS data.  相似文献   

13.
    
Populations of non-European ancestry are substantially underrepresented in genome-wide association studies (GWAS). As genetic effects can differ between ancestries due to possibly different causal variants or linkage disequilibrium patterns, a meta-analysis that includes GWAS of all populations yields biased estimation in each of the populations and the bias disproportionately impacts non-European ancestry populations. This is because meta-analysis combines study-specific estimates with inverse variance as the weights, which causes biases towards studies with the largest sample size, typical of the European ancestry population. In this paper, we propose two empirical Bayes (EB) estimators to borrow the strength of information across populations although accounting for between-population heterogeneity. Extensive simulation studies show that the proposed EB estimators are largely unbiased and improve efficiency compared to the population-specific estimator. In contrast, even though the meta-analysis estimator has a much smaller variance, it yields significant bias when the genetic effect is heterogeneous across populations. We apply the proposed EB estimators to a large-scale trans-ancestry GWAS of stroke and demonstrate that the EB estimators reduce the variance of the population-specific estimator substantially, with the effect estimates close to the population-specific estimates.  相似文献   

14.
Bioinformatics approaches to examine gene‐gene models provide a means to discover interactions between multiple genes that underlie complex disease. Extensive computational demands and adjusting for multiple testing make uncovering genetic interactions a challenge. Here, we address these issues using our knowledge‐driven filtering method, Biofilter, to identify putative single nucleotide polymorphism (SNP) interaction models for cataract susceptibility, thereby reducing the number of models for analysis. Models were evaluated in 3,377 European Americans (1,185 controls, 2,192 cases) from the Marshfield Clinic, a study site of the Electronic Medical Records and Genomics (eMERGE) Network, using logistic regression. All statistically significant models from the Marshfield Clinic were then evaluated in an independent dataset of 4,311 individuals (742 controls, 3,569 cases), using independent samples from additional study sites in the eMERGE Network: Mayo Clinic, Group Health/University of Washington, Vanderbilt University Medical Center, and Geisinger Health System. Eighty‐three SNP‐SNP models replicated in the independent dataset at likelihood ratio test P < 0.05. Among the most significant replicating models was rs12597188 (intron of CDH1)–rs11564445 (intron of CTNNB1). These genes are known to be involved in processes that include: cell‐to‐cell adhesion signaling, cell‐cell junction organization, and cell‐cell communication. Further Biofilter analysis of all replicating models revealed a number of common functions among the genes harboring the 83 replicating SNP‐SNP models, which included signal transduction and PI3K‐Akt signaling pathway. These findings demonstrate the utility of Biofilter as a biology‐driven method, applicable for any genome‐wide association study dataset.  相似文献   

15.
    
Detecting the association between a set of variants and a phenotype of interest is the first and important step in genetic and genomic studies. Although it attracted a large amount of attention in the scientific community and several related statistical approaches have been proposed in the literature, powerful and robust statistical tests are still highly desired and yet to be developed in this area. In this paper, we propose a powerful and robust association test, which combines information from each individual single-nucleotide polymorphisms based on sequential independent burden tests. We compare the proposed approach with some popular tests through a comprehensive simulation study and real data application. Our results show that, in general, the new test is more powerful; the gain in detecting power can be substantial in many situations, compared to other methods.  相似文献   

16.
    
Sara Lindström  Jennifer A. Brody  Constance Turman  Marine Germain  Traci M. Bartz  Erin N. Smith  Ming-Huei Chen  Marja Puurunen  Daniel Chasman  Jeffrey Hassler  Nathan Pankratz  Saonli Basu  Weihua Guan  Beata Gyorgy  Manal Ibrahim  Jean-Philippe Empana  Robert Olaso  Rebecca Jackson  Sigrid K. Brækkan  Barbara McKnight  Jean-Francois Deleuze  Cristopher J. O’Donnell  Xavier Jouven  Kelly A. Frazer  Bruce M. Psaty  Kerri L. Wiggins  Kent Taylor  Alexander P. Reiner  Susan R. Heckbert  Charles Kooperberg  Paul Ridker  John-Bjarne Hansen  Weihong Tang  Andrew D. Johnson  Pierre-Emmanuel Morange  David A. Trégouët  Peter Kraft  Nicholas L. Smith  Christopher Kabrhel 《Genetic epidemiology》2019,43(4):449-457
Although recent Genome-Wide Association Studies have identified novel associations for common variants, there has been no comprehensive exome-wide search for low-frequency variants that affect the risk of venous thromboembolism (VTE). We conducted a meta-analysis of 11 studies comprising 8,332 cases and 16,087 controls of European ancestry and 382 cases and 1,476 controls of African American ancestry genotyped with the Illumina HumanExome BeadChip. We used the seqMeta package in R to conduct single variant and gene-based rare variant tests. In the single variant analysis, we limited our analysis to the 64,794 variants with at least 40 minor alleles across studies (minor allele frequency [MAF] ~0.08%). We confirmed associations with previously identified VTE loci, including ABO, F5, F11, and FGA. After adjusting for multiple testing, we observed no novel significant findings in single variant or gene-based analysis. Given our sample size, we had greater than 80% power to detect minimum odds ratios greater than 1.5 and 1.8 for a single variant with MAF of 0.01 and 0.005, respectively. Larger studies and sequence data may be needed to identify novel low-frequency and rare variants associated with VTE risk.  相似文献   

17.
为了解广州市儿童少年血锌、铜含量的水平,我们测定了本市355名中小学生的全血锌、铜值,探讨了血锌、铜的正常值范围.血锌值属偏态分布,以百分位数法P95下限确定各年龄组的正常值下限为;6~9岁3.46μg/ml,10~13岁3.56μg/ml.14~18岁423μg/ml,血铜属正态分布,以正态分布法确定各年龄组正常值范围为:6~9岁0.77~1.09μg/ml.10~13岁0.68~0.94μg/ml,14~18岁0.63~0.89μg/ml.  相似文献   

18.
Four estimators of annual infection probability were compared pertinent to Quantitative Microbial Risk Analysis (QMRA). A stochastic model, the Gold Standard, was used as the benchmark. It is a product of independent daily infection probabilities which in turn are based on daily doses. An alternative and commonly-used estimator, here referred to as the Naïve, assumes a single daily infection probability from a single value of daily dose. The typical use of this estimator in stochastic QMRA involves the generation of a distribution of annual infection probabilities, but since each of these is based on a single realisation of the dose distribution, the resultant annual infection probability distribution simply represents a set of inaccurate estimates. While the medians of both distributions were within an order of magnitude for our test scenario, the 95th percentiles, which are sometimes used in QMRA as conservative estimates of risk, differed by around one order of magnitude. The other two estimators examined, the Geometric and Arithmetic, were closely related to the Naïve and use the same equation, and both proved to be poor estimators. Lastly, this paper proposes a simple adjustment to the Gold Standard equation accommodating periodic infection probabilities when the daily infection probabilities are unknown.  相似文献   

19.
Haplotype information could lead to more powerful tests of genetic association than single‐locus analyses but it is not easy to estimate haplotype frequencies from genotype data due to phase ambiguity. The challenge is compounded when individuals are pooled together to save costs or to increase sample size, which is crucial in the study of rare variants. Existing expectation–maximization type algorithms are slow and cannot cope with large pool size or long haplotypes. We show that by collapsing the total allele frequencies of each pool suitably, the maximum likelihood estimates of haplotype frequencies based on the collapsed data can be calculated very quickly regardless of pool size and haplotype length. We provide a running time analysis to demonstrate the considerable savings in time that the collapsed data method can bring. The method is particularly well suited to estimating certain union probabilities useful in the study of rare variants. We provide theoretical and empirical evidence to suggest that the proposed estimation method will not suffer much loss in efficiency if the variants are rare. We use the method to analyze re‐sequencing data collected from a case control study involving 148 obese persons and 150 controls. Focusing on a region containing 25 rare variants around the MGLL gene, our method selects three rare variants as potentially causal. This is more parsimonious than the 12 variants selected by a recently proposed covering method. From another set of 32 rare variants around the FAAH gene, we discover an interesting potential interaction between two of them. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

20.
There has been extensive literature on modeling gene‐gene interaction (GGI) and gene‐environment interaction (GEI) in case‐control studies with limited literature on statistical methods for GGI and GEI in longitudinal cohort studies. We borrow ideas from the classical two‐way analysis of variance literature to address the issue of robust modeling of interactions in repeated‐measures studies. While classical interaction models proposed by Tukey and Mandel have interaction structures as a function of main effects, a newer class of models, additive main effects and multiplicative interaction (AMMI) models, do not have similar restrictive assumptions on the interaction structure. AMMI entails a singular value decomposition of the cell residual matrix after fitting the additive main effects and has been shown to perform well across various interaction structures. We consider these models for testing GGI and GEI from two perspectives: likelihood ratio test based on cell means and a regression‐based approach using individual observations. Simulation results indicate that both approaches for AMMI models lead to valid tests in terms of maintaining the type I error rate, with the regression approach having better power properties. The performance of these models was evaluated across different interaction structures and 12 common epistasis patterns. In summary, AMMI model is robust with respect to misspecified interaction structure and is a useful screening tool for interaction even in the absence of main effects. We use the proposed methods to examine the interplay between the hemochromatosis gene and cumulative lead exposure on pulse pressure in the Normative Aging Study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号