首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
African Americans are admixed with genetic contributions from European and African ancestral populations. Admixture mapping leverages this information to map genes influencing differential disease risk across populations. We performed admixture and association mapping in 3,300 African American current or former smokers from the COPDGene Study. We analyzed estimated local ancestry and SNP genotype information to identify regions associated with FEV1/FVC, the ratio of forced expiratory volume in one second to forced vital capacity, measured by spirometry performed after bronchodilator administration. Global African ancestry inversely associated with FEV1/FVC (P = 0.035). Genome‐wide admixture analysis, controlling for age, gender, body mass index, current smoking status, pack‐years smoked, and four principal components summarizing the genetic background of African Americans in the COPDGene Study, identified a region on chromosome 12q14.1 associated with FEV1/FVC (P = 2.1 × 10?6) when regressed on local ancestry. Allelic association in this region of chromosome 12 identified an intronic variant in FAM19A2 (rs348644) as associated with FEV1/FVC (P = 1.76 × 10?6). By combining admixture and association mapping, a marker on chromosome 12q14.1 was identified as being associated with reduced FEV1/FVC ratio among African Americans in the COPDGene Study.  相似文献   

2.
Accurate assignment of copy number at known copy number variant (CNV) loci is important for both increasing understanding of the structural evolution of genomes as well as for carrying out association studies of copy number with disease. As with calling SNP genotypes, the task can be framed as a clustering problem but for a number of reasons assigning copy number is much more challenging. CNV assays have lower signal‐to‐noise ratios than SNP assays, often display heavy tailed and asymmetric intensity distributions, contain outlying observations and may exhibit systematic technical differences among different cohorts. In addition, the number of copy‐number classes at a CNV in the population may be unknown a priori. Due to these complications, automatic and robust assignment of copy number from array data remains a challenging problem. We have developed a copy number assignment algorithm, CNVCALL, for a targeted CNV array, such as that used by the Wellcome Trust Case Control Consortium's recent CNV association study. We use a Bayesian hierarchical mixture model that robustly identifies both the number of different copy number classes at a specific locus as well as relative copy number for each individual in the sample. This approach is fully automated which is a critical requirement when analyzing large numbers of CNVs. We illustrate the methods performance using real data from the Wellcome Trust Case Control Consortium's CNV association study and using simulated data. Genet. Epidemiol. 2011. © 2011 Wiley‐Liss, Inc. 35: 536‐548, 2011  相似文献   

3.
Through genome‐wide association studies, numerous genes have been shown to be associated with multiple phenotypes. To determine the overlap of genetic susceptibility of correlated phenotypes, one can apply multivariate regression or dimension reduction techniques, such as principal components analysis, and test for the association with the principal components of the phenotypes rather than the individual phenotypes. However, as these approaches test whether there is a genetic effect for at least one of the phenotypes, a significant test result does not necessarily imply pleiotropy. Recently, a method called Pleiotropy Estimation and Test Bootstrap (PET‐B) has been proposed to specifically test for pleiotropy (i.e., that two normally distributed phenotypes are both associated with the single nucleotide polymorphism of interest). Although the method examines the genetic overlap between the two quantitative phenotypes, the extension to binary phenotypes, three or more phenotypes, and rare variants is not straightforward. We provide two approaches to formally test this pleiotropic relationship in multiple scenarios. These approaches depend on permuting the phenotypes of interest and comparing the set of observed P‐values to the set of permuted P‐values in relation to the origin (e.g., a vector of zeros) either using the Hausdorff metric or a cutoff‐based approach. These approaches are appropriate for categorical and quantitative phenotypes, more than two phenotypes, common variants and rare variants. We evaluate these approaches under various simulation scenarios and apply them to the COPDGene study, a case‐control study of chronic obstructive pulmonary disease in current and former smokers.  相似文献   

4.
Copy number variations (CNVs) in the human genome provide exciting candidates for functional polymorphisms. Hence, we now assess association between CNV carrier status and diseases status by evaluating the signal intensity of SNP genotyping assays. Here, we present a novel statistical method designed to perform such inference and apply this method to a known CNV in a bipolar disorder linkage region. Using Bayesian computations we calculate the posterior probability for carrier status of a CNV in each individual of a sample by jointly analyzing genotype information and hybridization intensity. We model the signal intensity as a mixture of normal distributions, allowing for locus‐specific and allele‐specific distributions. Using an expectation maximization algorithm we estimate the parameters of these distributions and use these estimates for inferring carrier status of each individual and for the boundaries of the CNV. We applied the method to a sample of 3,512 individuals to a previously described common deletion on 8q24, a region consistently showing linkage to bipolar disorder, and unambiguously inferred 172 heterozygous and 1 homozygous deletion carrier. We observed no significant association between bipolar disorder and carrier status. We carefully assessed the validity of the inferred carrier status and observed no indication of errors. Furthermore, the algorithm precisely identifies the boundaries of the CNV. Finally, we assessed the power of this algorithm to detect shorter CNVs by sub‐sampling from the SNPs covered by this deletion, demonstrating that our EM algorithm produces precise estimates of carrier status. Genet. Epidemiol. 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

5.
Insertions and deletions (INDELs) represent a significant fraction of interindividual variation in the human genome yet their contribution to phenotypes is poorly understood. To confirm the quality of imputed INDELs and investigate their roles in mediating cardiometabolic phenotypes, genome‐wide association and linkage analyses were performed for 15 phenotypes with 1,273,952 imputed INDELs in 1,024 Mexican‐origin Americans. Imputation quality was validated using whole exome sequencing with an average kappa of 0.93 in common INDELs (minor allele frequencies [MAFs] ≥ 5%). Association analysis revealed one genome‐wide significant association signal for the cholesterylester transfer protein gene (CETP ) with high‐density lipoprotein levels (rs36229491, P = 3.06 × 10?12); linkage analysis identified two peaks with logarithm of the odds (LOD) > 5 (rs60560566, LOD = 5.36 with insulin sensitivity (S I) and rs5825825, LOD = 5.11 with adiponectin levels). Suggestive overlapping signals between linkage and association were observed: rs59849892 in the WSC domain containing 2 gene (WSCD2 ) was associated and nominally linked with S I (P = 1.17 × 10?7, LOD = 1.99). This gene has been implicated in glucose metabolism in human islet cell expression studies. In addition, rs201606363 was linked and nominally associated with low‐density lipoprotein (P = 4.73 × 10?4, LOD = 3.67), apolipoprotein B (P = 1.39 × 10?3, LOD = 4.64), and total cholesterol (P = 1.35 × 10?2, LOD = 3.80) levels. rs201606363 is an intronic variant of the UBE2F‐SCLY (where UBE2F is ubiquitin‐conjugating enzyme E2F and SCLY is selenocysteine lyase) fusion gene that may regulate cholesterol through selenium metabolism. In conclusion, these results confirm the feasibility of imputing INDELs from array‐based single nucleotide polymorphism (SNP) genotypes. Analysis of these variants using association and linkage replicated previously identified SNP signals and identified multiple novel INDEL signals. These results support the inclusion of INDELs into genetic studies to more fully interrogate the spectrum of genetic variation.  相似文献   

6.
Cover Image     
Copy number variants (CNVs) play an important role in a number of human diseases, but the accurate calling of CNVs remains challenging. Most current approaches to CNV detection use raw read alignments, which are computationally intensive to process. We use a regression tree-based approach to call germline CNVs from whole-genome sequencing (WGS, >18x) variant call sets in 6,898 samples across four European cohorts, and describe a rich large variation landscape comprising 1,320 CNVs. Eighty-one percent of detected events have been previously reported in the Database of Genomic Variants. Twenty-three percent of high-quality deletions affect entire genes, and we recapitulate known events such as the GSTM1 and RHD gene deletions. We test for association between the detected deletions and 275 protein levels in 1,457 individuals to assess the potential clinical impact of the detected CNVs. We describe complex CNV patterns underlying an association with levels of the CCL3 protein (MAF = 0.15, p = 3.6x10−12) at the CCL3L3 locus, and a novel cis-association between a low-frequency NOMO1 deletion and NOMO1 protein levels (MAF = 0.02, p = 2.2x10−7). This study demonstrates that existing population-wide WGS call sets can be mined for germline CNVs with minimal computational overhead, delivering insight into a less well-studied, yet potentially impactful class of genetic variant.  相似文献   

7.
Most findings from genome‐wide association studies (GWAS) are consistent with a simple disease model at a single nucleotide polymorphism, in which each additional copy of the risk allele increases risk by the same multiplicative factor, in contrast to dominance or interaction effects. As others have noted, departures from this multiplicative model are difficult to detect. Here, we seek to quantify this both analytically and empirically. We show that imperfect linkage disequilibrium (LD) between causal and marker loci distorts disease models, with the power to detect such departures dropping off very quickly: decaying as a function of r4, where r2 is the usual correlation between the causal and marker loci, in contrast to the well‐known result that power to detect a multiplicative effect decays as a function of r2. We perform a simulation study with empirical patterns of LD to assess how this disease model distortion is likely to impact GWAS results. Among loci where association is detected, we observe that there is reasonable power to detect substantial deviations from the multiplicative model, such as for dominant and recessive models. Thus, it is worth explicitly testing for such deviations routinely. Genet. Epidemiol. 35: 278‐290, 2011. © 2011 Wiley‐Liss, Inc.  相似文献   

8.
Though there is an increasing support for an important contribution of copy number variation (CNV) to the genetic architecture of complex disease, few methods have been developed for the analysis of such variation in the context of genetic association studies. In this paper, we propose a generalization of family-based association tests (FBATs) to allow for the analysis of CNVs at a genome-wide level. We translate the popular FBAT approach so that, instead of genotypes, raw intensity values that reflect copy number are used directly in the test statistic, thereby bypassing the need for a CNV genotyping algorithm. Moreover, both inherited and de novo CNVs can be analyzed without any prior knowledge about the type of CNV, making it easily applicable to large-scale association studies. All robustness properties of the genotype FBAT approach are maintained and all previously developed FBAT extensions, including FBATs for time-to-onset, multivariate FBATs, and FBAT-testing strategies, can be directly transferred to the analysis of CNVs. Using simulation studies, we evaluate the power and the robustness of the new approach. Furthermore, for those CNVs that can be genotyped, we compare FBATs based on genotype calls with FBATs that are directly based on the intensity data. An application to one of the first CNV genome-wide-association studies of asthma identifies a very plausible candidate gene. A software implementation of the approach is freely available at http://www.hsph.harvard.edu/research/iuliana-ionita/software. The approach has also been completely integrated in the PBAT software package.  相似文献   

9.
Genetic association studies have increasingly recognized variant effects on multiple phenotypes. Chronic obstructive pulmonary disease (COPD) is a heterogeneous disease with environmental and genetic causes. Multiple genetic variants have been associated with COPD, many of which show significant associations to additional phenotypes. However, it is unknown if these associations represent biological pleiotropy or if they exist through correlation of related phenotypes (“mediated pleiotropy”). Using 6,670 subjects from the COPDGene study, we describe the association of known COPD susceptibility loci with other COPD-related phenotypes and distinguish if these act directly on the phenotypes (i.e., biological pleiotropy) or if the association is due to correlation (i.e., mediated pleiotropy). We identified additional associated phenotypes for 13 of 25 known COPD loci. Tests for pleiotropy between genotype and associated outcomes were significant for all loci. In cases of significant pleiotropy, we performed mediation analysis to test if SNPs had a direct association to phenotype. Most loci showed a mediated effect through the hypothesized causal pathway. However, many loci also had direct associations, suggesting causal explanations (i.e., emphysema leading to reduced lung function) are incomplete. Our results highlight the high degree of pleiotropy in complex disease-associated loci and provide novel insights into the mechanisms underlying COPD.  相似文献   

10.
A major concern for all copy number variation (CNV) detection algorithms is their reliability and repeatability. However, it is difficult to evaluate the reliability of CNV-calling strategies due to the lack of gold-standard data that would tell us which CNVs are real. We propose that if CNVs are called in duplicate samples, or inherited from parent to child, then these can be considered validated CNVs. We used two large family-based genome-wide association study (GWAS) datasets from the GENEVA consortium to look at concordance rates of CNV calls between duplicate samples, parent-child pairs, and unrelated pairs. Our goal was to make recommendations for ways to filter and use CNV calls in GWAS datasets that do not include family data. We used PennCNV as our primary CNV-calling algorithm, and tested CNV calls using different datasets and marker sets, and with various filters on CNVs and samples. Using the Illumina core HumanHap550 single nucleotide polymorphism (SNP) set, we saw duplicate concordance rates of approximately 55% and parent-child transmission rates of approximately 28% in our datasets. GC model adjustment and sample quality filtering had little effect on these reliability measures. Stratification on CNV size and DNA sample type did have some effect. Overall, our results show that it is probably not possible to find a CNV-calling strategy (including filtering and algorithm) that will give us a set of "reliable" CNV calls using current chip technologies. But if we understand the error process, we can still use CNV calls appropriately in genetic association studies.  相似文献   

11.
The ultimate goal of genome‐wide association (GWA) studies is to identify genetic variants contributing effects to complex phenotypes in order to improve our understanding of the biological architecture underlying the trait. One approach to allow us to meet this challenge is to consider more refined sub‐phenotypes of disease, defined by pattern of symptoms, for example, which may be physiologically distinct, and thus may have different underlying genetic causes. The disadvantage of sub‐phenotype analysis is that large disease cohorts are sub‐divided into smaller case categories, thus reducing power to detect association. To address this issue, we have developed a novel test of association within a multinomial regression modeling framework, allowing for heterogeneity of genetic effects between sub‐phenotypes. The modeling framework is extremely flexible, and can be generalized to any number of distinct sub‐phenotypes. Simulations demonstrate the power of the multinomial regression‐based analysis over existing methods when genetic effects differ between sub‐phenotypes, with minimal loss of power when these effects are homogenous for the unified phenotype. Application of the multinomial regression analysis to a genome‐wide association study of type 2 diabetes, with cases categorized according to body mass index, highlights previously recognized differential mechanisms underlying obese and non‐obese forms of the disease, and provides evidence of a potential novel association that warrants follow‐up in independent replication cohorts. Genet. Epidemiol. 34: 335–343, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

12.
The enhancement of ventilatory drive induced by amino acids (AA) has only been demonstrated in healthy subjects and in patients with nutritional depletion. The ventilatory effects of AA are more pronounced with branched chain amino acid (BCAA)-enriched AA solutions. This study examined the ventilatory effects of AA in patients with chronic obstructive pulmonary disease (COPD). Six hundred and seventy ml (nitrogen intake = 8 g) of BCAA-enriched AA solution (Valinor®) was infused over a 4-hour period in 5 patients with COPD. Tidal volume, minute-ventilation and the ventilatory response to CO2 increased in all 5 patients during the AA infusion, while PaCO2 decreased in three. These results indicate that amino acids enhance ventilatory drive in COPD patients.  相似文献   

13.
Genotype imputation is a critical technique for following up genome‐wide association studies. Efficient methods are available for dealing with the probabilistic nature of imputed single nucleotide polymorphisms (SNPs) in population‐based designs, but not for family‐based studies. We have developed a new analytical approach (FBATdosage), using imputed allele dosage in the general framework of family‐based association tests to bridge this gap. Simulation studies showed that FBATdosage yielded highly consistent type I error rates, whatever the level of genotype uncertainty, and a much higher power than the best‐guess genotype approach. FBATdosage allows fast linkage and association testing of several million of imputed variants with binary or quantitative phenotypes in nuclear families of arbitrary size with arbitrary missing data for the parents. The application of this approach to a family‐based association study of leprosy susceptibility successfully refined the association signal at two candidate loci, C1orf141‐IL23R on chromosome 1 and RAB32‐C6orf103 on chromosome 6.  相似文献   

14.
An increasing number of bioinformatic tools designed to detect CNVs (copy number variants) in tumor samples based on paired exome data where a matched healthy tissue constitutes the reference have been published in the recent years. The idea of using a pool of unrelated healthy DNA as reference has previously been formulated but not thoroughly validated. As of today, the gold standard for CNV calling is still aCGH but there is an increasing interest in detecting CNVs by exome sequencing. We propose to design a metric allowing the comparison of two CNV profiles, independently of the technique used and assessed the validity of using a pool of unrelated healthy DNA instead of a matched healthy tissue as reference in exome‐based CNV detection. We compared the CNV profiles obtained with three different approaches (aCGH, exome sequencing with a matched healthy tissue as reference, exome sequencing with a pool of eight unrelated healthy tissue as reference) on three multiple myeloma samples. We show that the usual analyses performed to compare CNV profiles (deletion/amplification ratios and CNV size distribution) lack in precision when confronted with low LRR values, as they only consider the binary status of each CNV. We show that the metric‐based distance constitutes a more accurate comparison of two CNV profiles. Based on these analyses, we conclude that a reliable picture of CNV alterations in multiple myeloma samples can be obtained from whole‐exome sequencing in the absence of a matched healthy sample.  相似文献   

15.
Linkage analysis of complex traits has had limited success in identifying trait‐influencing loci. Recently, coding variants have been implicated as the basis for some biomedical associations. We tested whether coding variants are the basis for linkage peaks of complex traits in 42 African‐American (n = 596) and 90 Hispanic (n = 1,414) families in the Insulin Resistance Atherosclerosis Family Study (IRASFS) using Illumina HumanExome Beadchips. A total of 92,157 variants in African Americans (34%) and 81,559 (31%) in Hispanics were polymorphic and tested using two‐point linkage and association analyses with 37 cardiometabolic phenotypes. In African Americans 77 LOD scores greater than 3 were observed. The highest LOD score was 4.91 with the APOE SNP rs7412 (MAF = 0.13) with plasma apolipoprotein B (ApoB). This SNP was associated with ApoB (P‐value = 4 × 10?19) and accounted for 16.2% of the variance in African Americans. In Hispanic families, 104 LOD scores were greater than 3. The strongest evidence of linkage (LOD = 4.29) was with rs5882 (MAF = 0.46) in CETP with HDL. CETP variants were strongly associated with HDL (0.00049 < P‐value <4.6 × 10?12), accounting for up to 4.5% of the variance. These loci have previously been shown to have effects on the biomedical traits evaluated here. Thus, evidence of strong linkage in this genome wide survey of primarily coding variants was uncommon. Loci with strong evidence of linkage was characterized by large contributions to the variance, and, in these cases, are common variants. Less compelling evidence of linkage and association was observed with additional loci that may require larger family sets to confirm.  相似文献   

16.
ABSTRACT

Background: Agricultural exposure is a risk factor for the development of chronic obstructive pulmonary disease (COPD). However, there are no good estimates of the number of COPD patients with a history of agricultural exposure. Methods: We conducted a telephone interview of subjects with COPD identified by reviewing all pulmonary function tests at the Omaha Veterans Administration Hospital between November 2004 and March 2005. Obstructive lung disease was defined as a FEV1/FVC ratio of less than 70%. The survey detailed demographic data, smoking history, pulmonary symptoms, and history of agricultural exposures. Results: Participants included 150 veterans (mean age 68.2 years ±10.8). A history of agricultural exposure was elicited in 68% of subjects. Of those who had worked in agriculture, the types of exposures varied, with 14% in hog confinement barns, 20% on dairy farms, 8% on poultry farms, and 87% exposed to grain dust. There was a trend of diminishing FEV1 with increasing years of agricultural exposure. Conclusions: In health systems that serve rural areas, patients with COPD commonly have a history of agricultural exposures that may contribute to the development of COPD. Health care workers in these areas should include agricultural exposures as an important part of the social/occupational history in these patients.  相似文献   

17.
While recent genomic surveys reveal growing numbers of di‐allelic copy number variations, it is genes with multiallelic (>2) copy numbers that have shown association with distinct phenotypes. Current high‐throughput laboratory methods are restricted to enumerating total gene copy numbers (GCNs) per individual and not the “genotype,” i.e. gene copy per chromosome. Thus, association studies of multiallelic GCNs have been limited to comparison of median copies in different groups. Our new nonparametric statistical approach is based on GCN information within a trio‐based study design. We present theoretical derivation of the statistics and results of simulation studies that show robustness of our approach and power under several genetic models. Genet. Epidemiol. 34:2–6, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

18.
Although genome‐wide association studies (GWAS) have now discovered thousands of genetic variants associated with common traits, such variants cannot explain the large degree of “missing heritability,” likely due to rare variants. The advent of next generation sequencing technology has allowed rare variant detection and association with common traits, often by investigating specific genomic regions for rare variant effects on a trait. Although multiple correlated phenotypes are often concurrently observed in GWAS, most studies analyze only single phenotypes, which may lessen statistical power. To increase power, multivariate analyses, which consider correlations between multiple phenotypes, can be used. However, few existing multivariant analyses can identify rare variants for assessing multiple phenotypes. Here, we propose Multivariate Association Analysis using Score Statistics (MAAUSS), to identify rare variants associated with multiple phenotypes, based on the widely used sequence kernel association test (SKAT) for a single phenotype. We applied MAAUSS to whole exome sequencing (WES) data from a Korean population of 1,058 subjects to discover genes associated with multiple traits of liver function. We then assessed validation of those genes by a replication study, using an independent dataset of 3,445 individuals. Notably, we detected the gene ZNF620 among five significant genes. We then performed a simulation study to compare MAAUSS's performance with existing methods. Overall, MAAUSS successfully conserved type 1 error rates and in many cases had a higher power than the existing methods. This study illustrates a feasible and straightforward approach for identifying rare variants correlated with multiple phenotypes, with likely relevance to missing heritability.  相似文献   

19.
Lung cancer is the leading cause of cancer death worldwide. Although several genetic variants associated with lung cancer have been identified in the past, stringent selection criteria of genome‐wide association studies (GWAS) can lead to missed variants. The objective of this study was to uncover missed variants by using the known association between lung cancer and first‐degree family history of lung cancer to enrich the variant prioritization for lung cancer susceptibility regions. In this two‐stage GWAS study, we first selected a list of variants associated with both lung cancer and family history of lung cancer in four GWAS (3,953 cases, 4,730 controls), then replicated our findings for 30 variants in a meta‐analysis of four additional studies (7,510 cases, 7,476 controls). The top ranked genetic variant rs12415204 in chr10q23.33 encoding FFAR4 in the Discovery set was validated in the Replication set with an overall OR of 1.09 (95% CI = 1.04, 1.14, P = 1.63 × 10?4). When combining the two stages of the study, the strongest association was found in rs1158970 at Ch4p15.2 encoding KCNIP4 with an OR of 0.89 (95% CI = 0.85, 0.94, P = 9.64 × 10?6). We performed a stratified analysis of rs12415204 and rs1158970 across all eight studies by age, gender, smoking status, and histology, and found consistent results across strata. Four of the 30 replicated variants act as expression quantitative trait loci (eQTL) sites in 1,111 nontumor lung tissues and meet the genome‐wide 10% FDR threshold.  相似文献   

20.
Genome‐wide association studies (GWAS) of common disease have been hugely successful in implicating loci that modify disease risk. The bulk of these associations have proven robust and reproducible, in part due to community adoption of statistical criteria for claiming significant genotype‐phenotype associations. As the cost of sequencing continues to drop, assembling large samples in global populations is becoming increasingly feasible. Sequencing studies interrogate not only common variants, as was true for genotyping‐based GWAS, but variation across the full allele frequency spectrum, yielding many more (independent) statistical tests. We sought to empirically determine genome‐wide significance thresholds for various analysis scenarios. Using whole‐genome sequence data, we simulated sequencing‐based disease studies of varying sample size and ancestry. We determined that future sequencing efforts in >2,000 samples of European, Asian, or admixed ancestry should set genome‐wide significance at approximately P = 5 × 10?9, and studies of African samples should apply a more stringent genome‐wide significance threshold of P = 1 × 10?9. Adoption of a revised multiple test correction will be crucial in avoiding irreproducible claims of association.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号