首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Logistic regression is the primary analysis tool for binary traits in genome-wide association studies (GWAS). Multinomial regression extends logistic regression to multiple categories. However, many phenotypes more naturally take ordered, discrete values. Examples include (a) subtypes defined from multiple sources of clinical information and (b) derived phenotypes generated by specific phenotyping algorithms for electronic health records (EHR). GWAS of ordinal traits have been problematic. Dichotomizing can lead to a range of arbitrary cutoff values, generating inconsistent, hard to interpret results. Using multinomial regression ignores trait value hierarchy and potentially loses power. Treating ordinal data as quantitative can lead to misleading inference. To address these issues, we analyze ordinal traits with an ordered, multinomial model. This approach increases power and leads to more interpretable results. We derive efficient algorithms for computing test statistics, making ordinal trait GWAS computationally practical for Biobank scale data. Our method is available as a Julia package OrdinalGWAS.jl. Application to a COPDGene study confirms previously found signals based on binary case–control status, but with more significance. Additionally, we demonstrate the capability of our package to run on UK Biobank data by analyzing hypertension as an ordinal trait.  相似文献   

2.
Previous studies have suggested that vitamin D (VD) was associated with psychiatric diseases, but efforts to elucidate the functional relevance of VD with depression and anxiety from genetic perspective have been limited. Based on the UK Biobank cohort, we first calculated polygenic risk score (PRS) for VD from genome-wide association study (GWAS) data of VD. Linear and logistic regression analysis were conducted to evaluate the associations of VD traits with depression and anxiety traits, respectively. Then, using individual genotype and phenotype data from the UK Biobank, genome-wide environment interaction studies (GWEIS) were performed to identify the potential effects of gene × VD interactions on the risks of depression and anxiety traits. In the UK Biobank cohort, we observed significant associations of blood VD level with depression and anxiety traits, as well as significant associations of VD PRS and depression and anxiety traits. GWEIS identified multiple candidate loci, such as rs114086183 (p = 4.11 × 10−8, LRRTM4) for self-reported depression status and rs149760119 (p = 3.88 × 10−8, GNB5) for self-reported anxiety status. Our study results suggested that VD was negatively associated with depression and anxiety. GWEIS identified multiple candidate genes interacting with VD, providing novel clues for understanding the biological mechanism potential associations between VD and psychiatric disorders.  相似文献   

3.
The large-scale open access whole-exome sequencing (WES) data of the UK Biobank ~200,000 participants is accelerating a new wave of genetic association studies aiming to identify rare and functional loss-of-function (LoF) variants associated with complex traits and diseases. We proposed to merge the WES genotypes and the genome-wide genotyping (GWAS) genotypes of 167,000 UKB homogeneous European participants into a combined reference panel, and then to impute 241,911 UKB homogeneous European participants who had the GWAS genotypes only. We then used the imputed data to replicate association identified in the discovery WES sample. The average imputation accuracy measure r2 is modest to high for LoF variants at all minor allele frequency intervals: 0.942 at MAF interval (0.01, 0.5), 0.807 at (1.0 × 10−3, 0.01), 0.805 at (1.0 × 10−4, 1.0 × 10−3), 0.664 at (1.0 × 10−5, 1.0 × 10−4) and 0.410 at (0, 1.0 × 10−5). As applications, we studied associations of LoF variants with estimated heel BMD and four lipid traits. In addition to replicating dozens of previously reported genes, we also identified three novel associations, two genes PLIN1 and ANGPTL3 for high-density-lipoprotein cholesterol and one gene PDE3B for triglycerides. Our results highlighted the strength of WES based genotype imputation as well as provided useful imputed data within the UKB cohort.  相似文献   

4.
It has been hypothesised that nonsyndromic cleft lip/palate (nsCL/P) and cancer may share aetiological risk factors. Population studies have found inconsistent evidence for increased incidence of cancer in nsCL/P cases, but several genes (e.g., CDH1, AXIN2) have been implicated in the aetiologies of both phenotypes. We aimed to evaluate shared genetic aetiology between nsCL/P and oral cavity/oropharyngeal cancers (OC/OPC), which affect similar anatomical regions. Using a primary sample of 5,048 OC/OPC cases and 5,450 controls of European ancestry and a replication sample of 750 cases and 336,319 controls from UK Biobank, we estimate genetic overlap using nsCL/P polygenic risk scores (PRS) with Mendelian randomization analyses performed to evaluate potential causal mechanisms. In the primary sample, we found strong evidence for an association between a nsCL/P PRS and increased odds of OC/OPC (per standard deviation increase in score, odds ratio [OR]: 1.09; 95% confidence interval [CI]: 1.04, 1.13; p = .000053). Although confidence intervals overlapped with the primary estimate, we did not find confirmatory evidence of an association between the PRS and OC/OPC in UK Biobank (OR 1.02; 95% CI: 0.95, 1.10; p = .55). Mendelian randomization analyses provided evidence that major nsCL/P risk variants are unlikely to influence OC/OPC. Our findings suggest possible shared genetic influences on nsCL/P and OC/OPC.  相似文献   

5.

Objectives

Recent genetic association studies have provided convincing evidence that several novel loci and single nucleotide polymorphisms (SNPs) are associated with the risk of developing type 2 diabetes mellitus (T2DM). The aims of this study were: 1) to develop a predictive model of T2DM using genetic and clinical data; and 2) to compare misclassification rates of different models.

Methods

We selected 212 individuals with newly diagnosed T2DM and 472 controls aged in their 60s from the Korean Genome and Epidemiology Study. A total of 499 known SNPs from 87 T2DM-related genes were genotyped using germline DNA. SNPs were analyzed for significant association with T2DM using various classification algorithms including Quest (Quick, Unbiased, Efficient, Statistical tree), Support Vector Machine, C4.5, logistic regression, and K-nearest neighbor.

Results

We tested these models using the complete Korean Genome and Epidemiology Study cohort (n = 10,038) and computed the T2DM misclassification rates for each model. Average misclassification rates ranged at 28.2–52.7%. The misclassification rates for the logistic and machine-learning algorithms were lower than the statistical tree algorithms. Using 1-to-1 matched data, the misclassification rate of the statistical tree QUEST algorithm using body mass index and SNP variables was the lowest, but overall the logistic regression performed best.

Conclusions

The K-nearest neighbor method exhibited more robust results than other algorithms. For clinical and genetic data, our “multistage adjustment” model outperformed other models in yielding lower rates of misclassification. To improve the performance of these models, further studies using warranted, strategies to estimate better classifiers for the quantification of SNPs need to be developed.  相似文献   

6.
Serum C-reactive protein (CRP), an important inflammatory marker, has been associated with age-related macular degeneration (AMD) in observational studies; however, the findings are inconsistent. It remains unclear whether the association between circulating CRP levels and AMD is causal. We used two-sample Mendelian randomization (MR) to evaluate the potential causal relationship between serum CRP levels and AMD risk. We derived genetic instruments for serum CRP levels in 418,642 participants of European ancestry from UK Biobank, and then conducted a genome-wide association study for 12,711 advanced AMD cases and 14,590 controls of European descent from the International AMD Genomics Consortium. Genetic variants which predicted elevated serum CRP levels were associated with advanced AMD (odds ratio [OR] for per standard deviation increase in serum CRP levels: 1.31, 95% confidence interval [CI]: 1.19–1.44, P = 5.2 × 10−8). The OR for the increase in advanced AMD risk when moving from low (< 3 mg/L) to high (> 3 mg/L) CRP levels is 1.29 (95% CI: 1.17–1.41). Our results were unchanged in sensitivity analyses using MR models which make different modelling assumptions. Our findings were broadly similar across the different forms of AMD (intermediate AMD, choroidal neovascularization, and geographic atrophy). We used multivariable MR to adjust for the effects of other potential AMD risk factors including smoking, body mass index, blood pressure and cholesterol; this did not alter our findings. Our study provides strong genetic evidence that higher circulating CRP levels lead to increases in risk for all forms of AMD. These findings highlight the potential utility for using circulating CRP as a biomarker in future trials aimed at modulating AMD risk via systemic therapies.  相似文献   

7.
Observational studies find an association between increased body mass index (BMI) and short self-reported sleep duration in adults. However, the underlying biological mechanisms that underpin these associations are unclear. Recent findings from the UK Biobank suggest a weak genetic correlation between BMI and self-reported sleep duration. However, the potential shared genetic aetiology between these traits has not been examined using a comprehensive approach. To investigate this, we created a polygenic risk score (PRS) of BMI and examined its association with self-reported sleep duration in a combination of individual participant data and summary-level data, with a total sample size of 142,209 individuals. Although we observed a nonsignificant genetic correlation between BMI and sleep duration, using LD score regression (rg = −0.067 [SE = 0.039], P = 0.092) we found that a PRS of BMI is associated with a decrease in sleep duration (unstandardized coefficient = −1.75 min [SE = 0.67], P = 6.13 × 10−7), but explained only 0.02% of the variance in sleep duration. Our findings suggest that BMI and self-reported sleep duration possess a small amount of shared genetic aetiology and other mechanisms must underpin these associations.  相似文献   

8.
In genome-wide association studies (GWAS) for thousands of phenotypes in biobanks, most binary phenotypes have substantially fewer cases than controls. Many widely used approaches for joint analysis of multiple phenotypes produce inflated type I error rates for such extremely unbalanced case-control phenotypes. In this research, we develop a method to jointly analyze multiple unbalanced case-control phenotypes to circumvent this issue. We first group multiple phenotypes into different clusters based on a hierarchical clustering method, then we merge phenotypes in each cluster into a single phenotype. In each cluster, we use the saddlepoint approximation to estimate the p value of an association test between the merged phenotype and a single nucleotide polymorphism (SNP) which eliminates the issue of inflated type I error rate of the test for extremely unbalanced case-control phenotypes. Finally, we use the Cauchy combination method to obtain an integrated p value for all clusters to test the association between multiple phenotypes and a SNP. We use extensive simulation studies to evaluate the performance of the proposed approach. The results show that the proposed approach can control type I error rate very well and is more powerful than other available methods. We also apply the proposed approach to phenotypes in category IX (diseases of the circulatory system) in the UK Biobank. We find that the proposed approach can identify more significant SNPs than the other viable methods we compared with.  相似文献   

9.
Objective: Osteoporosis (OP) is the most common bone disease. The genetic and metabolic factors play important roles in OP development. However, the genetic basis of OP is still elusive. The study aimed to explore the relationships between OP and dietary habits. Methods: This study used large-scale genome-wide association study (GWAS) summary statistics from the UK Biobank to explore potential associations between OP and 143 dietary habits. The GWAS summary data of OP included 9434 self-reported OP cases and 444,941 controls, and the GWAS summary data of the dietary habits included 455,146 participants of European ancestry. Linkage disequilibrium score regression (LDSC) was used to detect the genetic correlations between OP and each of the 143 dietary habits, followed by Mendelian randomization (MR) analysis to further assess the causal relationship between OP and candidate dietary habits identified by LDSC. Results: The LDSC analysis identified seven candidate dietary habits that showed genetic associations with OP including cereal type such as biscuit cereal (coefficient = −0.1693, p value = 0.0183), servings of raw vegetables per day (coefficient = 0.0837, p value = 0.0379), and spirits measured per month (coefficient = 0.115, p value = 0.0353). MR analysis found that OP and PC17 (butter) (odds ratio [OR] = 0.974, 95% confidence interval [CI] = (0.973, 0.976), p value = 0.000970), PC35 (decaffeinated coffee) (OR = 0.985, 95% CI = (0.983, 0.987), p value = 0.00126), PC36 (overall processed meat intake) (OR = 1.035, 95% CI = (1.033, 1.037), p value = 0.000976), PC39 (spirits measured per month) (OR = 1.014, 95% CI = (1.011, 1.015), p value = 0.00153), and servings of raw vegetables per day (OR = 0.978, 95% CI = (0.977, 0.979), p value = 0.000563) were clearly causal. Conclusions: Our findings provide new clues for understanding the genetic mechanisms of OP, which focus on the possible role of dietary habits in OP pathogenesis.  相似文献   

10.
Informative and accurate survival prediction with individualized dynamic risk profiles over time is critical for personalized disease prevention and clinical management. The massive genetic data, such as SNPs from genome-wide association studies (GWAS), together with well-characterized time-to-event phenotypes provide unprecedented opportunities for developing effective survival prediction models. Recent advances in deep learning have made extraordinary achievements in establishing powerful prediction models in the biomedical field. However, the applications of deep learning approaches in survival prediction are limited, especially with utilizing the wealthy GWAS data. Motivated by developing powerful prediction models for the progression of an eye disease, age-related macular degeneration (AMD), we develop and implement a multilayer deep neural network (DNN) survival model to effectively extract features and make accurate and interpretable predictions. Various simulation studies are performed to compare the prediction performance of the DNN survival model with several other machine learning-based survival models. Finally, using the GWAS data from two large-scale randomized clinical trials in AMD with over 7800 observations, we show that the DNN survival model not only outperforms several existing survival prediction models in terms of prediction accuracy (eg, c-index =0.76 ), but also successfully detects clinically meaningful risk subgroups by effectively learning the complex structures among genetic variants. Moreover, we obtain a subject-specific importance measure for each predictor from the DNN survival model, which provides valuable insights into the personalized early prevention and clinical management for this disease.  相似文献   

11.
The concept of ‘public consultation’ and the idea of ‘democratic deliberation’ describe different forms of engagement of various citizens and stakeholders in the governance of science and technology projects (STPs). On the one hand, public consultation is concerned with enhancing the quality of decisions through public understanding of a complex STP. On the other hand, democratic deliberation is concerned with taking quality decisions through communicative action and free argumentation between all parties affected. This article focuses on the STP of the UK Biobank, addressing the following question: which form of upstream engagement is required in governing the next phase of the UK Biobank for the public good of health? Drawing on political theory debates and qualitative evidence, it is argued that although ideal democratic governance of the (next phase of) UK Biobank requires transition from public consultation to democratic deliberation the latter faces practical limitations. Thus, deliberative engagement cannot be full in specific STPs for the public good of health.  相似文献   

12.
Genome-wide association studies (GWAS) have successfully identified thousands of genetic variants contributing to disease and other phenotypes. However, significant obstacles hamper our ability to elucidate causal variants, identify genes affected by causal variants, and characterize the mechanisms by which genotypes influence phenotypes. The increasing availability of genome-wide functional annotation data is providing unique opportunities to incorporate prior information into the analysis of GWAS to better understand the impact of variants on disease etiology. Although there have been many advances in incorporating prior information into prioritization of trait-associated variants in GWAS, functional annotation data have played a secondary role in the joint analysis of GWAS and molecular (i.e., expression) quantitative trait loci (eQTL) data in assessing evidence for association. To address this, we develop a novel mediation framework, iFunMed, to integrate GWAS and eQTL data with the utilization of publicly available functional annotation data. iFunMed extends the scope of standard mediation analysis by incorporating information from multiple genetic variants at a time and leveraging variant-level summary statistics. Data-driven computational experiments convey how informative annotations improve single-nucleotide polymorphism (SNP) selection performance while emphasizing robustness of iFunMed to noninformative annotations. Application to Framingham Heart Study data indicates that iFunMed is able to boost detection of SNPs with mediation effects that can be attributed to regulatory mechanisms.  相似文献   

13.
14.
Genotype misclassification occurs frequently in human genetic association studies. When cases and controls are subject to the same misclassification model, Pearson's chi-square test has the correct type I error but may lose power. Most current methods adjusting for genotyping errors assume that the misclassification model is known a priori or can be assessed by a gold standard instrument. But in practical applications, the misclassification probabilities may not be completely known or the gold standard method can be too costly to be available. The repeated measurement design provides an alternative approach for identifying misclassification probabilities. With this design, a proportion of the subjects are measured repeatedly (five or more repeats) for the genotypes when the error model is completely unknown. We investigate the applications of the repeated measurement method in genetic association analysis. Cost-effectiveness study shows that if the phenotyping-to-genotyping cost ratio or the misclassification rates are relatively large, the repeat sampling can gain power over the regular case-control design. We also show that the power gain is not sensitive to the genetic model, genetic relative risk and the population high-risk allele frequency, all of which are typically important ingredients in association studies. An important implication of this result is that whatever the genetic factors are, the repeated measurement method can be applied if the genotyping errors must be accounted for or the phenotyping cost is high.  相似文献   

15.
Genome‐wide association studies (GWAS) have been a standard practice in identifying single nucleotide polymorphisms (SNPs) for disease susceptibility. We propose a new approach, termed integrative GWAS (iGWAS) that exploits the information of gene expressions to investigate the mechanisms of the association of SNPs with a disease phenotype, and to incorporate the family‐based design for genetic association studies. Specifically, the relations among SNPs, gene expression, and disease are modeled within the mediation analysis framework, which allows us to disentangle the genetic effect on a disease phenotype into two parts: an effect mediated through a gene expression (mediation effect, ME) and an effect through other biological mechanisms or environment‐mediated mechanisms (alternative effect, AE). We develop omnibus tests for the ME and AE that are robust to underlying true disease models. Numerical studies show that the iGWAS approach is able to facilitate discovering genetic association mechanisms, and outperforms the SNP‐only method for testing genetic associations. We conduct a family‐based iGWAS of childhood asthma that integrates genetic and genomic data. The iGWAS approach identifies six novel susceptibility genes (MANEA, MRPL53, LYCAT, ST8SIA4, NDFIP1, and PTCH1) using the omnibus test with false discovery rate less than 1%, whereas no gene using SNP‐only analyses survives with the same cut‐off. The iGWAS analyses further characterize that genetic effects of these genes are mostly mediated through their gene expressions. In summary, the iGWAS approach provides a new analytic framework to investigate the mechanism of genetic etiology, and identifies novel susceptibility genes of childhood asthma that were biologically meaningful.  相似文献   

16.
17.
Understanding the genetic background of complex diseases and disorders plays an essential role in the promising precision medicine. The evaluation of candidate genes, however, requires time-consuming and expensive experiments given a large number of possibilities. Thus, computational methods have seen increasing applications in predicting gene-disease associations. We proposed a bioinformatics framework, Prioritization of Autism-genes using Network-based Deep-learning Approach (PANDA). Our approach aims to identify autism-genes across the human genome based on patterns of gene–gene interactions and topological similarity of genes in the interaction network. PANDA trains a graph deep learning classifier using the input of the human molecular interaction network and predicts and ranks the probability of autism association of every node (gene) in the network. PANDA was able to achieve a high classification accuracy of 89%, outperforming three other commonly used machine learning algorithms. Moreover, the gene prioritization ranking list produced by PANDA was evaluated and validated using an independent large-scale exome-sequencing study. The top 10% of PANDA-ranked genes were found significantly enriched for autism association.  相似文献   

18.
Genome‐wide association studies (GWAS) often measure gene–environment interactions (G × E). We consider the problem of accurately estimating a G × E in a case–control GWAS when a subset of the controls have silent, or undiagnosed, disease and the frequency of the silent disease varies by the environmental variable. We show that using case–control status without accounting for misdiagnosis can lead to biased estimates of the G × E. We further propose a pseudolikelihood approach to remove the bias and accurately estimate how the relationship between the genetic variant and the true disease status varies by the environmental variable. We demonstrate our method in extensive simulations and apply our method to a GWAS of prostate cancer.  相似文献   

19.
Unraveling the underlying biological mechanisms or pathways behind the effects of genetic variations on complex diseases remains one of the major challenges in the post‐GWAS (where GWAS is genome‐wide association study) era. To further explore the relationship between genetic variations, biomarkers, and diseases for elucidating underlying pathological mechanism, a huge effort has been placed on examining pleiotropic and gene‐environmental interaction effects. We propose a novel genetic stochastic process model (GSPM) that can be applied to GWAS and jointly investigate the genetic effects on longitudinally measured biomarkers and risks of diseases. This model is characterized by more profound biological interpretation and takes into account the dynamics of biomarkers during follow‐up when investigating the hazards of a disease. We illustrate the rationale and evaluate the performance of the proposed model through two GWAS. One is to detect single nucleotide polymorphisms (SNPs) having interaction effects on type 2 diabetes (T2D) with body mass index (BMI) and the other is to detect SNPs affecting the optimal BMI level for protecting from T2D. We identified multiple SNPs that showed interaction effects with BMI on T2D, including a novel SNP rs11757677 in the CDKAL1 gene (P = 5.77 × 10?7). We also found a SNP rs1551133 located on 2q14.2 that reversed the effect of BMI on T2D (P = 6.70 × 10?7). In conclusion, the proposed GSPM provides a promising and useful tool in GWAS of longitudinal data for interrogating pleiotropic and interaction effects to gain more insights into the relationship between genes, quantitative biomarkers, and risks of complex diseases.  相似文献   

20.
Background

Physical inactivity increases the risk of chronic disease and mortality. The high prevalence of physical inactivity in the UK is likely to increase financial pressure on the National Health Service. The UK Biobank Study offered an opportunity to assess the impact of physical inactivity on healthcare use and spending using individual-level data and objective measures of physical activity. The objective of this study was to assess the associations between objectively measured physical activity levels and future inpatient days and costs in adults in the UK Biobank study.

Methods

We conducted an econometric analysis of the UK Biobank study, a large prospective cohort study. The participants (n = 86,066) were UK adults aged 43–79 who had provided sufficient valid accelerometer data. Hospital inpatient days and costs were discounted and standardised to mean monthly values per person to adjust for the variation in follow-up times. Econometric models adjusted for BMI, long-standing illness, and other sociodemographic factors.

Results

Mean follow-up time for the sample was 28.11 (SD 7.65) months. Adults in the most active group experienced 0.037 fewer days per month (0.059–0.016) and 14.1% lower inpatient costs ( – £3.81 [ – £6.71 to  – £0.91] monthly inpatient costs) compared to adults in the least active group. The relationship between physical activity and inpatient costs was stronger in women compared to men and amongst those in the lowest income group compared to others. The findings remained significant across various sensitivity analyses.

Conclusions

Increasing physical activity levels in the UK may reduce inpatient hospitalisations and costs, especially in women and lower-income groups.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号