共查询到20条相似文献,搜索用时 15 毫秒
1.
Haplotype inference for tightly linked markers from general pedigrees remains a challenging problem. Only a few methods are available to efficiently and accurately estimate haplotype frequencies and reconstruct haplotypes for a large number of tightly linked markers from general pedigrees in the presence of missing data, and their performance has not been carefully and extensively evaluated. In this paper, we compare four published methods for haplotype reconstruction and frequency estimation for tightly linked markers from general pedigrees, including HAPLORE, GENEHUNTER, PedPhase, and MERLIN. We review these methods and discuss the differences between them in terms of the models and computational strategies employed. We assess their performance based on simulations using pedigrees and haplotypes on tightly linked single nucleotide polymorphisms from real studies. We investigate the effect of several factors, including the missing rate, the departure from Hardy-Weinberg Equilibrium, and the sample size, on the accuracy for haplotype inference. We also compare these methods with a widely used method for haplotype inference from unrelated individuals, PHASE, by treating individuals within a pedigree as unrelated samples. This comparison allows us to investigate the relative efficiency in haplotype inference using pedigree data. Our results indicate that incorporation of pedigree information can improve the precision for haplotype frequency estimation and the accuracy for haplotype reconstruction. Among four haplotyping methods capable of analyzing general pedigrees, HAPLORE and MERLIN have comparable performance and outperform the other two methods in almost all situations. 相似文献
2.
In this article, we develop a powerful test for identifying single nucleotide polymorphism (SNP)-sets that are predictive of survival with data from genome-wide association studies. We first group typed SNPs into SNP-sets based on genomic features and then apply a score test to assess the overall effect of each SNP-set on the survival outcome through a kernel machine Cox regression framework. This approach uses genetic information from all SNPs in the SNP-set simultaneously and accounts for linkage disequilibrium (LD), leading to a powerful test with reduced degrees of freedom when the typed SNPs are in LD with each other. This type of test also has the advantage of capturing the potentially nonlinear effects of the SNPs, SNP-SNP interactions (epistasis), and the joint effects of multiple causal variants. By simulating SNP data based on the LD structure of real genes from the HapMap project, we demonstrate that our proposed test is more powerful than the standard single SNP minimum P-value-based test for association studies with censored survival outcomes. We illustrate the proposed test with a real data application. 相似文献
3.
Mendelian randomization analysis of a time‐varying exposure for binary disease outcomes using functional data analysis methods
下载免费PDF全文
![点击此处可从《Genetic epidemiology》网站下载免费的PDF全文](/ch/ext_images/free.gif)
A Mendelian randomization (MR) analysis is performed to analyze the causal effect of an exposure variable on a disease outcome in observational studies, by using genetic variants that affect the disease outcome only through the exposure variable. This method has recently gained popularity among epidemiologists given the success of genetic association studies. Many exposure variables of interest in epidemiological studies are time varying, for example, body mass index (BMI). Although longitudinal data have been collected in many cohort studies, current MR studies only use one measurement of a time‐varying exposure variable, which cannot adequately capture the long‐term time‐varying information. We propose using the functional principal component analysis method to recover the underlying individual trajectory of the time‐varying exposure from the sparsely and irregularly observed longitudinal data, and then conduct MR analysis using the recovered curves. We further propose two MR analysis methods. The first assumes a cumulative effect of the time‐varying exposure variable on the disease risk, while the second assumes a time‐varying genetic effect and employs functional regression models. We focus on statistical testing for a causal effect. Our simulation studies mimicking the real data show that the proposed functional data analysis based methods incorporating longitudinal data have substantial power gains compared to standard MR analysis using only one measurement. We used the Framingham Heart Study data to demonstrate the promising performance of the new methods as well as inconsistent results produced by the standard MR analysis that relies on a single measurement of the exposure at some arbitrary time point. 相似文献
4.
Population-based case-control studies measuring associations between haplotypes of single nucleotide polymorphisms (SNPs) are increasingly popular, in part because haplotypes of a few "tagging" SNPs may serve as surrogates for variation in relatively large sections of the genome. Due to current technological limitations, haplotypes in cases and controls must be inferred from unphased genotypic data. Using individual-specific inferred haplotypes as covariates in standard epidemiologic analyses (e.g., conditional logistic regression) is an attractive analysis strategy, as it allows adjustment for nongenetic covariates, provides omnibus and haplotype-specific tests of association, and can estimate haplotype and haplotype x environment interaction effects. In principle, some adjustment for the uncertainty in inferred haplotypes should be made. Via simulation, we compare the performance (bias and mean squared error of haplotype and haplotype x environment interaction effect estimates) of several analytic strategies using inferred haplotypes in the context of matched case-control data. These strategies include using only the most likely haplotype assignment, the expectation substitution approach described by Stram et al. ([2003b] Hum. Hered. 55:179-190) and others, and an improper version of multiple imputation. For relatively uncomplicated haplotype structures and moderate haplotype relative risks (=2), all methods performed comparably well (small bias with appropriately-sized confidence intervals). For larger relative risks, the most likely haplotype and multiple imputation strategies showed noticeable bias towards the null; the expectation substitution strategy still performed well. When there was more uncertainty in the inferred haplotypes, the most likely and multiple imputation strategies showed even more bias towards the null, while the expectation substitution method had slightly smaller than nominal confidence intervals for larger relative risks (>/=5). An application to progesterone-receptor haplotypes and endometrial cancer further illustrates that the performance of all these methods depends on how well the observed haplotypes "tag" the unobserved causal variant. 相似文献
5.
We have developed a single nucleotide polymorphism (SNP) association scan statistic that takes into account the complex distribution of the human genome variation in the identification of chromosomal regions with significant SNP associations. This scan statistic has wide applicability for genetic analysis, whether to identify important chromosomal regions associated with common diseases based on whole-genome SNP association studies or to identify disease susceptibility genes based on dense SNP positional candidate studies. To illustrate this method, we analyzed patterns of SNP associations on chromosome 19 in a large cohort study. Among 2,944 SNPs, we found seven regions that contained clusters of significantly associated SNPs. The average width of these regions was 35 kb with a range of 10-72 kb. We compared the scan statistic results to Fisher's product method using a sliding window approach, and detected 22 regions with significant clusters of SNP associations. The average width of these regions was 131 kb with a range of 10.1-615 kb. Given that the distances between SNPs are not taken into consideration in the sliding window approach, it is likely that a large fraction of these regions represents false positives. However, all seven regions detected by the scan statistic were also detected by the sliding window approach. The linkage disequilibrium (LD) patterns within the seven regions were highly variable indicating that the clusters of SNP associations were not due to LD alone. The scan statistic developed here can be used to make gene-based or region-based SNP inferences about disease association. 相似文献
6.
目的:研究中国人热性惊厥患者MASS1单核苷酸多态性位点(SNP)与基因突变情况。方法:抽取44位热性惊厥患者外周血,提取DNA,设计MASS1基因全部35个编码外显子引物,经PCR扩增,采用Sanger双脱氧链终止法测序。结果:共发现3个单核苷酸多态性位点,其中1个为位于第13外显子尚未有文献报道的新单核苷酸多态性位点2625A>C,致使原来的密码子CGA变成CGC,均编码精氨酸。另有2个单核苷酸多态性位点与已有文献报道相符,分别为第29外显子6666G>A及第33外显子7798T>G单核苷酸多态性位点。未发现新的基因突变。结论:MASS1基因存在多个单核苷酸多态性位点,对进一步研究中国人热性惊厥分子遗传学机制具有理论积累意义。 相似文献
7.
8.
A method for meta-analysis of molecular association studies 总被引:8,自引:0,他引:8
Although population-based molecular association studies are becoming increasingly popular, methodology for the meta-analysis of these studies has been neglected, particularly with regard to two issues: testing Hardy-Weinberg equilibrium (HWE), and pooling results in a manner that reflects a biological model of gene effect. We propose a process for pooling results from population-based molecular association studies which consists of the following steps: (1) checking HWE using chi-square goodness of fit; we suggest performing sensitivity analysis with and without studies that are in HWE. (2) Heterogeneity is then checked, and if present, possible causes are explored. (3) If no heterogeneity is present, regression analysis is used to pool data and to determine the gene effect. (4) If there is a significant gene effect, pairwise group differences are analysed and these data are allowed to 'dictate' the best genetic model. (5) Data may then be pooled using this model. This method is easily performed using standard software, and has the advantage of not assuming an a priori genetic model. 相似文献
9.
目的 在中国人群中探讨人类 13q32区域内与精神分裂症相关的易感基因位点。 方法 以中国汉族精神分裂症患者和他们的健康父母双亲组成的 91个核心家系为研究对象。采用聚合酶链式反应 -限制性片段长度多态性 (PCR -RFLP)方法对 13q32区域内分别位于STK2 4位点和GPC6位点上的 2个单核苷酸多态性 (SNPs)rs1886 0 89和rs2 892 6 79进行检测。利用拟合优度卡方检验分析基因型分布频率是否符合Hardy -Weinberg平衡定律 ,单体型相对风险分析 (HRR)和传递不平衡检验 (TDT)用于数据基因型分析。结果 (1)STK2 4 /GPC6基因型频率分布符合Hardy -Weinberg平衡 (P >0 0 5 ) ;(2 )HRR结果显示 ,rs1886 0 89和rs2 892 6 79两个基因多态性与精神分裂症无关联 (P >0 0 5 )。TDT结果表明 ,父母和受累子女之间不存在显著的传递不平衡 (P >0 0 5 ) ,即杂合父母传递给受累子女的等位基因无差异 ;(3)STK2 4rs1886 0 89等位基因与精神分裂症的两种临床症状真性幻听和情感淡漠相关 (χ2 =6 0 0 5df=1P <0 0 5 ;χ2 =6 0 74df=2P <0 0 5 )。GPC6rs2 892 6 79等位基因与精神分裂症的思维贫乏相关(χ2 =6 0 94df=2P <0 0 5 )。结论 STK2 4rs1886 0 89和GPC6rs2 892 6 79基因多态性与精神分裂症的 3种临床症状相关联 相似文献
10.
目的 探讨人微小RNA hsa-miR-499-3p种子序列1个常见单核苷酸多态位点(rs3746444A>G)的遗传变异与中国南方人群肺癌发病的关联.方法 以病例-对照研究方法收集原发性肺癌患者526例及正常对照526人,采用聚合酶链反应-限制性片段长度多态(PCR-RFLP)技术检测hsa-miR-499-3p种子序列的单核苷酸多态( rs3746444 A>G)的基因型,用SAS 9.13软件进行非条件Logistic回归,校正混杂因素影响,分析基因变异与肺癌发病的关联.结果 以携带野生型纯合子rs3746444AA基因型为参照,携带AG基因型个体发生肺癌的危险度可增加32%(校正OR=1.32,95% CI=1.01~1.84),携带GG基因型的个体患肺癌的危险度增加122%( OR =2.22,95% CI=1.52 ~3.23),且rs3746444 G变异的等位基因个数与肺癌发病危险呈剂量-效应关系(趋势性检验p=3.04 × 10-5).结论 hsa-miR-499-3p种子序列(rs3746444 A>G)遗传变异可增加人群肺癌发病的危险性. 相似文献
11.
Most findings from genome‐wide association studies (GWAS) are consistent with a simple disease model at a single nucleotide polymorphism, in which each additional copy of the risk allele increases risk by the same multiplicative factor, in contrast to dominance or interaction effects. As others have noted, departures from this multiplicative model are difficult to detect. Here, we seek to quantify this both analytically and empirically. We show that imperfect linkage disequilibrium (LD) between causal and marker loci distorts disease models, with the power to detect such departures dropping off very quickly: decaying as a function of r4, where r2 is the usual correlation between the causal and marker loci, in contrast to the well‐known result that power to detect a multiplicative effect decays as a function of r2. We perform a simulation study with empirical patterns of LD to assess how this disease model distortion is likely to impact GWAS results. Among loci where association is detected, we observe that there is reasonable power to detect substantial deviations from the multiplicative model, such as for dominant and recessive models. Thus, it is worth explicitly testing for such deviations routinely. Genet. Epidemiol. 35: 278‐290, 2011. © 2011 Wiley‐Liss, Inc. 相似文献
12.
Lee WC 《Genetic epidemiology》2004,27(1):1-13
Family-based association studies have gained in popularity for mapping disease-susceptibility gene(s) of complex diseases. However, recruiting family controls is often more difficult than recruiting unrelated controls. The author proposes a case-control study, where the possible biases due to population stratification are controlled by matching in the design stage and by genomic controlling in the data-analytic stage. The matching is based on a set of "stratum-delineating variables," such as, race, ethnicity, nationality, ancestry, and birthplace; and the genomic controlling is based on typing a number of null markers across the genome and applying the principle of multiplicative scaling of chi-square distribution. It pays to match carefully to have a higher proportion of correctly matched sets, as computer simulation showed that this would increase the power of the study. If matching is crude, one loses power but still has the correct type I error rate after genomic controlling. Power studies showed that the numbers of affected subjects required for the pair-matched study are comparable to those required by the case-parents design, if the study was conducted in a homogeneous population. As the (control-to-case) matching ratio increases, the number of affected subjects required decreases. With matching ratio tending toward infinity, the number required shrinks roughly by half. The case-control study with matching and genomic controlling frees us from family bondage, and the genetic problem as complicated as mapping genes can now be studied using simple epidemiologic methods. 相似文献
13.
On the advantage of haplotype analysis in the presence of multiple disease susceptibility alleles 总被引:21,自引:0,他引:21
We investigated the effect of multiple susceptibility alleles at a single disease locus on the statistical power of a likelihood ratio test to detect association between alleles at a marker locus and a disease phenotype in a case-control design. Using simplifying assumptions to obtain the joint frequency distribution of marker and disease locus alleles, we present numerical results that illustrate the impact of historical variation of initial associations between marker alleles and susceptibility alleles on the power of a likelihood ratio test for association. Our results show that an increase in the number of susceptibility alleles produces a decrease in power of the likelihood ratio test. The decrease in power in the presence of multiple susceptibility alleles, however, is less for markers with multiple alleles than for markers with two alleles. We investigate the implications of this observation for tests of association based on haplotypes made up of tightly linked single-nucleotide polymorphisms (SNPs). Our results suggest that an analysis based on haplotypes can be advantageous over an analysis based on individual SNPs in the presence of multiple susceptibility alleles, particularly when linkage disequilibria between SNPs is weak. The results provide motivation for further development of statistical methods based on haplotypes for assessing the potential for association methods to identify and locate complex disease genes. 相似文献
14.
The concept of haplotype sharing (HS) has received considerable attention recently, and several haplotype association methods have been proposed. Here, we extend the work of Beckmann and colleagues [2005 Hum. Hered. 59:67-78] who derived an HS statistic (BHS) as special case of Mantel's space-time clustering approach. The Mantel-type HS statistic correlates genetic similarity with phenotypic similarity across pairs of individuals. While phenotypic similarity is measured as the mean-corrected cross product of phenotypes, we propose to incorporate information of the underlying genetic model in the measurement of the genetic similarity. Specifically, for the recessive and dominant modes of inheritance we suggest the use of the minimum and maximum of shared length of haplotypes around a marker locus for pairs of individuals. If the underlying genetic model is unknown, we propose a model-free HS Mantel statistic using the max-test approach. We compare our novel HS statistics to BHS using simulated case-control data and illustrate its use by re-analyzing data from a candidate region of chromosome 18q from the Rheumatoid Arthritis (RA) Consortium. We demonstrate that our approach is point-wise valid and superior to BHS. In the re-analysis of the RA data, we identified three regions with point-wise P-values<0.005 containing six known genes (PMIP1, MC4R, PIGN, KIAA1468, TNFRSF11A and ZCCHC2) which might be worth follow-up. 相似文献
15.
Moskvina V Norton N Williams N Holmans P Owen M O'donovan M 《Genetic epidemiology》2005,28(3):273-282
Several groups have developed methods for estimating allele frequencies in DNA pools as a fast and cheap way for detecting allelic association between genetic markers and disease. To obtain accurate estimates of allele frequencies, a correction factor k for the degree to which measurement of allele-specific products is biased is generally applied. Factor k is usually obtained as the ratio of the two allele-specific signals in samples from heterozygous individuals, a step that can significantly impair throughput and increase cost. We have systematically investigated the properties of k through the use of empirical and simulated data. We show that for the dye terminator primer extension genotyping method we have applied, the correction factor k is substantially influenced by the dye terminators incorporated, but also by the terminal 3' base of the extension primer. We also show that the variation in k is large enough to result in unacceptable error rates if association studies are conducted without regard to k. We show that the impact of ignoring k can be neutralized by applying a correction factor k(max) that can be easily derived, but this at the potential cost of an increase in type I error. Finally, based upon observed distributions for k we derive a method allowing the estimation of the probability pooled data reflects significant differences in the allele frequencies between the subjects comprising the pools. By controlling the error rates in the absence of knowledge of the appropriate SNP-specific correction factors, each approach enhances the performance of DNA pooling, while considerably streamlining the method by reducing time and cost. 相似文献
16.
In this paper we explore the use of biological knowledge to supplement statistical analysis in identifying genes associated with disease. It has been previously found that the 402H variant in complement factor H (CFH) is associated with risk for developing age related macular degeneration (AMD). By focusing on the single nucleotide polymorphisms (SNPs) in the complement pathway, we were able to use the genotype data from a recently published AMD genome wide association study to identify two additional genes, C7 and MBL2, as potentially associated with subtypes of AMD. Two SNPs situated in introns of C7 and MBL2 could help differentiate between two forms of AMD: wet (more severe form of AMD) and dry (milder form of AMD). We identified a C7 haplotype associated with protection against developing wet AMD among individuals with homozygous CFH risk allele 402H (p-value 0.001 for wet AMD versus dry AMD, odds ratio (OR) 0.16, OR 95% CI 0.05-0.49) as well as among individuals with at least one CFH risk allele (p-value 0.007 for wet AMD versus dry AMD, OR 0.35, OR 95% CI 0.16-0.77). The fact that the statistical scores for the C7 and MBL2 SNPs were significant (low false discovery rate) at the pathway level, but not significant at the genome level suggests that focusing at the pathway level can be beneficial for identifying SNP signals that would be lost at the genome-wide level. 相似文献
17.
Wei Pan 《Genetic epidemiology》2009,33(6):497-507
We consider detecting associations between a trait and multiple single nucleotide polymorphisms (SNPs) in linkage disequilibrium (LD). To maximize the use of information contained in multiple SNPs while minimizing the cost of large degrees of freedom (DF) in testing multiple parameters, we first theoretically explore the sum test derived under a working assumption of a common association strength between the trait and each SNP, testing on the corresponding parameter with only one DF. Under the scenarios that the association strengths between the trait and the SNPs are close to each other (and in the same direction), as considered by Wang and Elston [Am. J. Hum. Genet. [2007] 80:353–360], we show with simulated data that the sum test was powerful as compared to several existing tests; otherwise, the sum test might have much reduced power. To overcome the limitation of the sum test, based on our theoretical analysis of the sum test, we propose five new tests that are closely related to each other and are shown to consistently perform similarly well across a wide range of scenarios. We point out the close connection of the proposed tests to the Goeman test. Furthermore, we derive the asymptotic distributions of the proposed tests so that P‐values can be easily calculated, in contrast to the use of computationally demanding permutations or simulations for the Goeman test. A distinguishing feature of the five new tests is their use of a diagonal working covariance matrix, rather than a full covariance matrix as used in the usual Wald or score test. We recommend the routine use of two of the new tests, along with several other tests, to detect disease associations with multiple linked SNPs. Genet. Epidemiol. 33:497–507, 2009. © 2009 Wiley‐Liss, Inc. 相似文献
18.
目的探讨5-脂氧合酶激活蛋白基因(ALOX5AP)多态性与缺血性脑卒中(IS)及其亚型关联性。方法采用表型不一致同胞对研究设计,分析5个ALOX5AP基因单核苷酸多态性(SNP)位点(rs10507391、rs9551963、rs12429692、rs4293222、rs4360791)与IS及其亚型的关联关系。用广义估计方程(GEE)进行多因素分析,以家系为基础的关联检验(FBAT)进行关联分析。结果对240个IS患病家系所组成的446对表型不一致同胞对进行GEE分析,发现糖尿病、高血压及舒张压、甘油三酯和高密度脂蛋白胆固醇水平与IS的发病风险存在相关(P<0.05);FBAT分析发现rs9551963A及rs4360791A等位基因在显性遗传模型下与IS(Z=-2.50、-2.52,P<0.05)、大动脉型脑卒中(Z=-2.16、-2.27,P<0.05)以及糖尿病(Z=-2.33、-2.51,P<0.05)存在关联,在加性模型下与小动脉闭塞型脑卒中(Z=-2.38、2.08,P<0.05)及高血脂(Z=-2.63、2.73,P<0.05)存在关联,在显性模型下,rs12429692A等位基因与小动脉闭塞型脑卒中存在显著关联(Z=2.27,P=0.02),rs4360791A等位基因与高血压存在显著关联(Z=2.19,P=0.03)。结论ALOX5AP基因多态性与缺血性脑卒中存在关联关系。 相似文献
19.
With cost-effective high-throughput Single Nucleotide Polymorphism (SNP) arrays now becoming widely available, it is highly anticipated that SNPs will soon become the choice of markers in whole genome screens. This optimism raises a great deal of interest in assessing whether dense SNP maps offer at least as much information as their microsatellite (MS) counterparts. Factors considered to date include information content, strength of linkage signals, and effect of linkage disequilibrium. In the current report, we focus on investigating the relative merits of SNPs vs. MS markers for disease gene localization. For our comparisons, we consider three novel confidence interval estimation procedures based on confidence set inference (CSI) using affected sib-pair data. Two of these procedures are multipoint in nature, enabling them to capitalize on dense SNPs with limited heterozygosity. The other procedure makes use of markers one at a time (two-point), but is much more computationally efficient. In addition to marker type, we also assess the effects of a number of other factors, including map density and marker heterozygosity, on disease gene localization through an extensive simulation study. Our results clearly show that confidence intervals derived based on the CSI multipoint procedures can place the trait locus in much shorter chromosomal segments using densely saturated SNP maps as opposed to using sparse MS maps. Finally, it is interesting (although not surprising) to note that, should one wish to perform a quick preliminary genome screening, then the two-point CSI procedure would be a preferred, computationally cost-effective choice. 相似文献