共查询到20条相似文献,搜索用时 0 毫秒
1.
Taking advantage of increasingly available high-density single nucleotide polymorphism (SNP) markers within genes and across genomes, more and more genetic association studies began to use multiple closely linked markers in candidate genes. A practical analytical challenge arising in such studies is the possibility that not all case chromosomes have inherited disease-causing mutations from a common ancestral chromosome (founder heterogeneity). To alleviate the problem, we propose a method that applies a clustering algorithm to haplotype similarity analysis. The method identifies a sequence of nested subsets of case chromosomes by a peeling procedure, where each subset is relatively homogeneous. The average similarity score estimated from each subset in the sequence is compared to that estimated in controls, and a raw (unadjusted for multiple comparisons) P value is obtained. The test for the association between the trait and the candidate gene is based on the minimum raw P value observed in the comparison sequence, with its significance level estimated by a permutation procedure. The method can be applied to both haplotype and genotype data. Simulation studies suggest that our method has the correct type I error rate, and is generally more powerful than existing methods of haplotype similarity analysis. 相似文献
2.
It is well recognized that multiple genes are likely contributing to the susceptibility of most common complex diseases. Studying one gene at a time might reduce our chance to identify disease susceptibility genes with relatively small effect sizes. Therefore, it is crucial to develop statistical methods that can assess the effect of multiple genes collectively. Motivated by the increasingly available high-density markers across the whole human genome, we propose a class of TDT-type methods that can jointly analyze haplotypes from multiple candidate genes (linked or unlinked). Our approach first uses a linear signed rank statistic to compare at an individual gene level the structural similarity among transmitted haplotypes against that among non-transmitted haplotypes. The results of the ranked comparisons from all considered genes are subsequently combined into global statistics, which can simultaneously test the association of the set of genes with the disease. Using simulation studies, we find that the proposed tests yield correct type I error rates in stratified populations. Compared with the gene-by-gene test, the new global tests appear to be more powerful in situations where all candidate genes are associated with the disease. 相似文献
3.
It is usually assumed that detection of a disease susceptability gene via marker polymorphisms in linkage disequilibrium with it is facilitated by consideration of marker haplotypes. However, capture of the marker haplotype information requires resolution of gametic phase, and this must usually be inferred statistically. Recently, we questioned the value of the marker haplotype information, and suggested that certain analyses of multivariate marker data, not based on haplotypes explicitly and not requiring resolution of gametic phase, are often more powerful than analyses based on haplotypes. Here, we review this work and assess more carefully the situations in which our conclusions might apply. We also relate these analyses to alternative approaches to haplotype analysis, namely those based on haplotype similarity and those inspired by cladistics. 相似文献
4.
Tzeng JY 《Genetic epidemiology》2005,28(3):220-231
Haplotypes incorporate more information about the underlying polymorphisms than do genotypes for individual SNPs, and are considered as a more informative format of data in association analysis. To model haplotypes requires high degrees of freedom, which could decrease power and limit a model's capacity to incorporate other complex effects, such as gene-gene interactions. Even within haplotype blocks, high degrees of freedom are still a concern unless one chooses to discard rare haplotypes. To increase the efficiency and power of haplotype analysis, we adapt the evolutionary concepts of cladistic analyses and propose a grouping algorithm to cluster rare haplotypes to the corresponding ancestral haplotypes. The algorithm determines the cluster bases by preserving common haplotypes using a criterion built on the Shannon information content. Each haplotype is then assigned to its appropriate clusters probabilistically according to the cladistic relationship. Through this algorithm, we perform association analysis based on groups of haplotypes. Simulation results indicate power increases for performing tests on the haplotype clusters when compared to tests using original haplotypes or the truncated haplotype distribution. 相似文献
5.
Multiple significance testing involving multiple phenotypes is not uncommon in the context of gene association studies but has remained largely unaddressed. If no adjustment is made for the multiple tests conducted, the type I error probability will exceed the nominal (per test) alpha level. Nevertheless, many investigators do not implement such adjustments. This may, in part, be because most available methods for adjusting the alpha rate either: 1) do not take the correlation structure among the variables into account and, therefore, tend to be overly stringent; or 2) do not allow statements to be made about specific variables but only about multivariate composites of variables. In this paper we develop a simulation-based method and computer program that holds the actual alpha rate to the nominal alpha rate but takes the correlation structure into account. We show that this method is more powerful than several common alternative approaches and that this power advantage increases as the number of variables and their intercorrelations increase. The method appears robust to marked non-normality and variance heterogeneity even with unequal numbers of subjects in each group. The fact that gene association studies with biallelic loci will have (at most) three groups (i.e., AA, Aa, aa) implies by the closure principle that, after detection of a significant result for a specific variable, pairwise comparisons for that variable can be conducted without further adjustment of the alpha level. Genet. Epidemiol. 15:87–101,1998. © 1998 Wiley-Liss, Inc. 相似文献
6.
Peter Kraft 《Genetic epidemiology》2001,21(Z1):S447-S452
Many family‐based tests of linkage disequilibrium are not valid when related nuclear families from larger pedigrees are used, or when independent nuclear families with multiple cases are used. The Pedigree Disequilibrium Test (PDT) proposed by Martin et al. [Am J Hum Genet 67:146–54, 2000] avoids these problems. This paper sketches an extension of the PDT that can account for measured covariates. Where the PDT is based on allele‐counting methods, this extension is based on conditional logistic regression. Versions of these statistics were used to test for association between disease and two known functional single nucleotide polymorphisms (SNPs) on gene 1 and gene 6 and one inert SNP on gene 7 in the first 25 replicates of the simulated population‐isolate data. The new method was also used to test for linkage disequilibrium after correcting for the effect of the environmental factor E1. The PDT and the conditional logistic extension had similar power to detect the functional SNPs (100% for gene 1, approximately 50% for gene 6) and appropriate type I error rates for the inert SNP. Correcting for E1 slightly increased power to detect the association between gene 6 and disease. © 2001 Wiley‐Liss, Inc. 相似文献
7.
We consider the problem of detection of modifier genes that lead to variations in a disease‐related continuous variable (DRCV), such as the age of onset or a measure of disease severity, in a strategy of candidate genes. We propose a novel method, the ordered transmission disequilibrium test (OTDT), to test for a relation between the clinical heterogeneity expressed by a DRCV and marker genotypes of a candidate gene. The OTDT applies to trio families with one patients and his parents, all three genotyped at a bi‐allelic marker M. The OTDT aims to find a critical value of the DRCV which separates the sample of families in two subsamples in which the transmission rates are significantly different. We investigate the power of the method by simulations under various genetic models and covariate distributions and compare it with a linear regression analysis. Genet. Epidemiol. 2008. ©2008 Wiley‐Liss, Inc. 相似文献
8.
Morris AP 《Genetic epidemiology》2005,29(2):91-107
We describe a novel method for assessing the strength of disease association with single nucleotide polymorphisms (SNPs) in a candidate gene or small candidate region, and for estimating the corresponding haplotype relative risks of disease, using unphased genotype data directly. We begin by estimating the relative frequencies of haplotypes consistent with observed SNP genotypes. Under the Bayesian partition model, we specify cluster centres from this set of consistent SNP haplotypes. The remaining haplotypes are then assigned to the cluster with the "nearest" centre, where distance is defined in terms of SNP allele matches. Within a logistic regression modelling framework, each haplotype within a cluster is assigned the same disease risk, reducing the number of parameters required. Uncertainty in phase assignment is addressed by considering all possible haplotype configurations consistent with each unphased genotype, weighted in the logistic regression likelihood by their probabilities, calculated according to the estimated relative haplotype frequencies. We develop a Markov chain Monte Carlo algorithm to sample over the space of haplotype clusters and corresponding disease risks, allowing for covariates that might include environmental risk factors or polygenic effects. Application of the algorithm to SNP genotype data in an 890-kb region flanking the CYP2D6 gene illustrates that we can identify clusters of haplotypes with similar risk of poor drug metaboliser (PDM) phenotype, and can distinguish PDM cases carrying different high-risk variants. Further, the results of a detailed simulation study suggest that we can identify positive evidence of association for moderate relative disease risks with a sample of 1,000 cases and 1,000 controls. 相似文献
9.
Larkin EK Patel SR Redline S Mignot E Elston RC Hallmayer J 《Genetic epidemiology》2006,30(2):101-110
Evidence from both linkage analyses and association-based analyses has implicated Apoliprotein E (ApoE) as a disease susceptibility locus for obstructive sleep apnea. To further assess the putative role of ApoE in sleep apnea, we performed genotyping, association, and linkage analyses in a cohort assembled to investigate the genetic epidemiology of sleep apnea. Among a subset of the Caucasian families, ten microsatellites, spanning 20 cM, were genotyped in a region near ApoE on chromosome 19 where previous suggestive linkage had been demonstrated using a 9.1-cM genome-wide scan. Haseman-Elston regression analysis, conducted with these fine mapping markers (n=196 sibling pairs, 56 families), showed evidence for linkage to marker AFM210yg9 (p=0.00034), which was increased over that observed with the original scan. ApoE genotyping also was performed on a larger set of data (n=1,211 from 271 families, ages 3-85 years) from the cohort with available DNA. To determine whether the ApoE genotype explains the linkage peak, we included the ApoE genotype as a covariate in regression models. Inclusion of ApoE E2 allele as a covariate reduced the regression coefficient by 18%, suggesting that ApoE does not substantively explain the linkage signal. Finally, we repeated an association-based analysis in the larger sample of 1,211 individuals, and observed a higher prevalence of sleep apnea among individuals with the ApoE E2 allele. Overall, the evidence suggests that there is a disease susceptibility locus for obstructive sleep apnea in the region of ApoE, but ApoE itself is unlikely to be the causative locus. 相似文献
10.
Penalized regression methods offer an attractive alternative to single marker testing in genetic association analysis. Penalized regression methods shrink down to zero the coefficient of markers that have little apparent effect on the trait of interest, resulting in a parsimonious subset of what we hope are true pertinent predictors. Here we explore the performance of penalization in selecting SNPs as predictors in genetic association studies. The strength of the penalty can be chosen either to select a good predictive model (via methods such as computationally expensive cross validation), through maximum likelihood-based model selection criterion (such as the BIC), or to select a model that controls for type I error, as done here. We have investigated the performance of several penalized logistic regression approaches, simulating data under a variety of disease locus effect size and linkage disequilibrium patterns. We compared several penalties, including the elastic net, ridge, Lasso, MCP and the normal-exponential-γ shrinkage prior implemented in the hyperlasso software, to standard single locus analysis and simple forward stepwise regression. We examined how markers enter the model as penalties and P-value thresholds are varied, and report the sensitivity and specificity of each of the methods. Results show that penalized methods outperform single marker analysis, with the main difference being that penalized methods allow the simultaneous inclusion of a number of markers, and generally do not allow correlated variables to enter the model, producing a sparse model in which most of the identified explanatory markers are accounted for. 相似文献
11.
Modern molecular techniques make discovery of numerous single nucleotide polymorphims (SNPs) in candidate gene regions feasible. Conventional analysis relies on either independent tests with each variant or the use of haplotypes in association analysis. The first technique ignores the dependencies between SNPs. The second, though it may increase power, often introduces uncertainty by estimating haplotypes from population data. Additionally, as the number of loci expands for a haplotype, ambiguity in interpretation increases for determining the underlying genetic components driving a detected association. Here, we present a genotype-level analysis to jointly model the SNPs via a SNP interaction model with phase information (SIMPle) to capture the underlying haplotype structure. This analysis estimates both the risk associated with each variant and the importance of phase between pairwise combinations of SNPs. Thus, rather than selecting between genotype- or haplotype-level approaches, the SIMPle method frames the analysis of multilocus data in a model selection paradigm, the aim to determine which SNPs, phase terms, and linear combinations best describe the relation between genetic variation and a trait of interest. To avoid unstable estimation due to sparse data and to incorporate both the dependencies among terms and the uncertainty in model selection, we propose a Bayes model averaging procedure. This highlights key SNPs and phase terms and yields a set of best representative models. Using simulations, we demonstrate the utility of the SIMPle model to identify crucial SNPs and underlying haplotype structures across a variety of causal models and genetic architectures. 相似文献
12.
Genetic association is often determined in case-control studies by the differential distribution of alleles or genotypes. Recent work has demonstrated that association can also be assessed by deviations from the expected distributions of alleles or genotypes. Specifically, multiple methods motivated by the principles of Hardy-Weinberg equilibrium (HWE) have been developed. However, these methods do not take into account many of the assumptions of HWE. Therefore, we have developed a prevalence-based association test (PRAT) as an alternative method for detecting association in case-control studies. This method, also motivated by the principles of HWE, uses an estimated population allele frequency to generate expected genotype frequencies instead of using the case and control frequencies separately. Our method often has greater power, under a wide variety of genetic models, to detect association than genotypic, allelic or Cochran-Armitage trend association tests. Therefore, we propose PRAT as a powerful alternative method of testing for association. 相似文献
13.
The possible evidence for association comprises three types of information: differences between cases and controls in allele frequencies, in parameters for Hardy‐Weinberg disequilibrium (HWD), and in parameters for linkage disequilibrium (LD). LD between marker and disease alleles results in a difference in at least one of the three types of parameters [Won and Elston, 2008]. However, the parameters for LD require knowledge about phase, which is usually unknown, making the LD contrast test without modification infeasible in practice. Methods for handling phase uncertainty are: (1) the most probable haplotype pair for each individual can be considered as the true phase; (2) a weighted average of haplotypes can be used; (3) we can consider the composite LD, which does not require any information about phase. We compare these methods to handle phase uncertainty in terms of validity and efficiency, and the effect on them of HWD in the population, at the same time confirming results for the three types of information. When the LD between markers is high, the LD contrast test that uses a weighted average of haplotypes or the most probable haplotypes to calculate the LD is recommended, but otherwise the LD contrast test that uses the composite LD is recommended. We conclude that, even though the difference in allele frequencies is usually the most informative test except in the case of a recessive disease, the LD contrast test can be more powerful if the markers are dense enough. Genet. Epidemiol. 33:463–478, 2009. © 2009 Wiley‐Liss, Inc. 相似文献
14.
目的探讨亚甲基四氢叶酸还原酶(MTHFR)基因多态性与焦炉作业工人染色体损伤易感性的关系。方法选取140名焦炉工和66名医护人员作为研究对象,采用胞质分裂阻滞微核实验评价个体染色体损伤水平,测定尿中1-羟基芘浓度反映多环芳烃暴露内剂量,聚合酶链反应-限制性片段长度多态性(PCR-RFLP)方法分析MTHFR基因两个单核苷酸多态性(SNP)位点(C677T、A1298C),利用PHASE2·1软件经Bayesian法计算单体型。校正年龄、性别和尿1-羟基芘水平,用协方差分析比较MTHFR不同基因型或单体型之间外周血淋巴细胞微核率的差异。结果研究对象中MTHFRC677T和A1298C两个变异等位基因频率分别为0·56和0·16,其分布均符合Hardy-Weinberg平衡。本研究中MTHFR基因这两个SNPs存在连锁不平衡关系(D’=0·99),研究对象中有四种单体型677T-1298A、677C-1298A、677C-1298C和677T-1298C,其频率分别为0·555、0·279、0·163和0·003。在焦炉工组,非677C-1298A/677C-1298A单体型对的微核率高于677C-1298A/677C-1298A,差异有显著性(1·00±0·67vs0·60±0·41,P=0·04),其中677T-1298A/677T-1298A单体型对微核率高于677C-1298A/677C-1298A,差异有显著性(1·08±0·71vs0·60±0·41,P=0·04)。在焦炉工组和对照组中,未发现上述两个SNPs与微核率之间显著关联。结论MTHFR基因单体型可能是影响焦炉工染色体损伤的遗传易感性因素之一。 相似文献
15.
OBJECTIVES: Genotyping errors can induce biases in frequency estimates for haplotypes of single nucleotide polymorphisms (SNPs). Here, we considered the impact of SNP allele misclassification on haplotype odds ratio estimates from case-control studies of unrelated individuals. METHODS: We calculated bias analytically, using the haplotype counts expected in cases and controls under genotype misclassification. We evaluated the bias due to allele misclassification across a range of haplotype distributions using empirical haplotype frequencies within blocks of limited haplotype diversity. We also considered simple two- and three-locus haplotype distributions to understand the impact of haplotype frequency and number of SNPs on misclassification bias. RESULTS: We found that for common haplotypes (>5% frequency), realistic genotyping error rates (0.1-1% chance of miscalling an allele), and moderate relative risks (2-4), the bias was always towards the null and increases in magnitude with increasing error rate, increasing odds ratio. For common haplotypes, bias generally increased with increasing haplotype frequency, while for rare haplotypes, bias generally increased with decreasing frequency. When the chance of miscalling an allele is 0.5%, the median bias in haplotype-specific odds ratios for common haplotypes was generally small (<4% on the log odds ratio scale), but the bias for some individual haplotypes was larger (10-20%). Bias towards the null leads to a loss in power; the relative efficiency using a test statistic based upon misclassified haplotype data compared to a test based on the unobserved true haplotypes ranged from roughly 60% to 80%, and worsened with increasing haplotype frequency. CONCLUSIONS: The cumulative effect of small allele-calling errors across multiple loci can induce noticeable bias and reduce power in realistic scenarios. This has implications for the design of candidate gene association studies that utilize multi-marker haplotypes. 相似文献
16.
Interpretation of dense single nucleotide polymorphism (SNP) follow-up of genome-wide association or linkage scan signals can be facilitated by establishing expectation for the behaviour of primary mapping signals upon fine-mapping, under both null and alternative hypotheses. We examined the inferences that can be made regarding the posterior probability of a real genetic effect and considered different disease-mapping strategies and prior probabilities of association. We investigated the impact of the extent of linkage disequilibrium between the disease SNP and the primary analysis signal and the extent to which the disease gene can be physically localised under these scenarios. We found that large increases in significance (>2 orders of magnitude) appear in the exclusive domain of genuine genetic effects, especially in the follow-up of genome-wide association scans or consensus regions from multiple linkage scans. Fine-mapping significant association signals that reside directly under linkage peaks yield little improvement in an already high posterior probability of a real effect. Following fine-mapping, those signals that increase in significance also demonstrate improved localisation. We found local linkage disequiliptium patterns around the primary analysis signal(s) and tagging efficacy of typed markers to play an important role in determining a suitable interval for fine-mapping. Our findings help inform the interpretation and design of dense SNP-mapping follow-up studies, thus facilitating discrimination between a genuine genetic effect and chance fluctuation (false positive). 相似文献
17.
Mensah FK Gilthorpe MS Davies CF Keen LJ Adamson PJ Roman E Morgan GJ Bidwell JL Law GR 《Genetic epidemiology》2007,31(4):348-357
Inferring haplotypes from genotype data is commonly undertaken in population genetic association studies. Within such studies the importance of accounting for uncertainty in the inference of haplotypes is well recognised. We investigate the effectiveness of correcting for uncertainty using simple methods based on the output provided by the PHASE haplotype inference methodology. In case-control analyses investigating non-Hodgkin lymphoma and haplotypes associated with immune regulation we find little effect of making adjustment for uncertainty in inferred haplotypes. Using simulation we introduce a higher degree of haplotype uncertainty than was present in our study data. The simulation represents two genetic loci, physically close on a chromosome, forming haplotypes. Considering a range of allele frequencies, degrees of linkage between the loci, and frequency of missing genotype data, we detail the characteristics of genetic regions which may be susceptible to the influence of haplotype uncertainty. Within our evaluation we find that bias is avoided by considering haplotype probabilities or using multiple imputation, provided that for each of these methods haplotypes are inferred separately for case and control populations; furthermore using multiple imputation provides the facility to incorporate haplotype uncertainty in the estimation of confidence intervals. We discuss the implications of our findings within the context of the complexity of haplotype inference for larger marker rich regions as would typically be encountered in genetic analyses. 相似文献
18.
Several versions of the transmission/disequilibrium test (TDT) were applied to the two candidate genes ACTHR and Golf for bipolar illness. Analyses were carried out separately for paternal and maternal transmission. Evidence for linkage and association was found for ACTHR for paternal transmission in support of a parent-of-origin effect. Possible evidence for segregation distortion was found for one of the two markers for Golf for maternal transmission. © 1997 Wiley-Liss, Inc. 相似文献
19.
Nicolae DL 《Genetic epidemiology》2006,30(8):703-717
Many genetic analyses are done with incomplete information; for example, unknown phase in haplotype-based association studies. Measures of the amount of available information can be used for efficient planning of studies and/or analyses. In particular, the linkage disequilibrium (LD) between two sets of markers can be interpreted as the amount of information one set of markers contains for testing allele frequency differences in the second set, and measuring LD can be viewed as quantifying information in a missing data problem. We introduce a framework for measuring the association between two sets of variables; for example, genotype data for two distinct groups of markers, or haplotype and genotype data for a given set of polymorphisms. The goal is to quantify how much information is in one data set, e.g. genotype data for a set of SNPs, for estimating parameters that are functions of frequencies in the second data set, e.g. haplotype frequencies, relative to the ideal case of actually observing the complete data, e.g. haplotypes. In the case of genotype data on two mutually exclusive sets of markers, the measure determines the amount of multi-locus LD, and is equal to the classical measure r(2), if the sets consist each of one bi-allelic marker. In general, the measures are interpreted as the asymptotic ratio of sample sizes necessary to achieve the same power in case-control testing. The focus of this paper is on case-control allele/haplotype tests, but the framework can be extended easily to other settings like regressing quantitative traits on allele/haplotype counts, or tests on genotypes or diplotypes. We highlight applications of the approach, including tools for navigating the HapMap database [The International HapMap Consortium, 2003], and genotyping strategies for positional cloning studies. 相似文献
20.
Haplotype analysis is essential to studies of the genetic factors underlying human disease, but requires a large sample size of phase-known data. Recently, directly haplotyping individuals was suggested as a means of maximizing the phase-known data from a sample. Haplotyping, however, is much more labor-intensive than indirectly inferring haplotypes from genotypes (genotyping). This study uses simulations to compare the power of each methodology to detect associations between a haplotype and a trait or disease locus under conditions of varying linkage disequilibrium. The relative power of haplotyping over genotyping in association studies increases with decreasing sample size, decreasing linkage disequilibrium, increasing [corrected] numbers of marker loci, and decreasing numbers of different haplotypes. In addition, the frequency of the haplotype of interest and the magnitude of its association with the disease affect the power. From a cost-benefit standpoint, genotyping would be favored with large multiplicative risks (relative risk of haplotype >2.5). If case numbers are limiting rather than cost, haplotyping would maximize the information obtained. At small haplotype frequencies (e.g., <0.05), haplotyping is relatively more efficient, but there is little absolute power to detect associations under either methodology. Given the much larger laboratory resources required for direct haplotyping, genotyping would probably be favored under most conditions, but this must be balanced against the unit costs associated with recruitment and phenotyping. In the context of multipurpose, prospective cohort studies (e.g., the UK Biobank study), there may be a general value in establishing a series of directly haplotyped individuals to serve as controls for a number of alternative studies. 相似文献