首页 | 本学科首页   官方微博 | 高级检索  
     


An association test of the spatial distribution of rare missense variants within protein structures identifies Alzheimer's disease–related patterns
Authors:Bowen Jin  John A. Capra  Penelope Benchek  Nicholas Wheeler  Adam C. Naj  Kara L. Hamilton-Nelson  John J. Farrell  Yuk Yee Leung  Brian Kunkle  Badri Vadarajan  Gerard D. Schellenberg  Richard Mayeux  Li-San Wang  Lindsay A. Farrer  Margaret A. Pericak-Vance  Eden R. Martin  Jonathan L. Haines  Dana C. Crawford  William S. Bush
Abstract:More than 90% of genetic variants are rare in most modern sequencing studies, such as the Alzheimer''s Disease Sequencing Project (ADSP) whole-exome sequencing (WES) data. Furthermore, 54% of the rare variants in ADSP WES are singletons. However, both single variant and unit-based tests are limited in their statistical power to detect an association between rare variants and phenotypes. To best use missense rare variants and investigate their biological effect, we examine their association with phenotypes in the context of protein structures. We developed a protein structure–based approach, protein optimized kernel evaluation of missense nucleotides (POKEMON), which evaluates rare missense variants based on their spatial distribution within a protein rather than their allele frequency. The hypothesis behind this test is that the three-dimensional spatial distribution of variants within a protein structure provides functional context to power an association test. POKEMON identified three candidate genes (TREM2, SORL1, and EXOC3L4) and another suggestive gene from the ADSP WES data. For TREM2 and SORL1, two known Alzheimer''s disease (AD) genes, the signal from the spatial cluster is stable even if we exclude known AD risk variants, indicating the presence of additional low-frequency risk variants within these genes. EXOC3L4 is a novel AD risk gene that has a cluster of variants primarily shared by case subjects around the Sec6 domain. This cluster is also validated in an independent replication data set and a validation data set with a larger sample size.

High-throughput DNA sequencing of diverse humans has identified millions of genetic variants, the vast majority of which are exceptionally rare. A survey of ∼60,000 individuals from the Exome Aggregation Consortium (ExAC) found that out of ∼7 million variants, 99% have a frequency <1% and 54% are singletons (Taliun et al. 2021). Similarly, in the Alzheimer''s Disease Sequencing Project (ADSP) whole-exome sequencing (WES) of ∼10,000 individuals, 97% of identified variants have a minor allele frequency <1%, and 23% are singletons (Butkiewicz et al. 2018). However, the effect of most rare variants on diseases of interest remains unknown because of insufficient statistical power to detect the associations between these variants and phenotypes.We hypothesized that rare missense variants contribute to common diseases by disrupting the protein function and are likely to form clustered or dispersed patterns within protein structures when examined in population-based studies. Therefore, incorporating spatial context will improve rare variant association tests. Prior studies have shown that missense variants show nonrandom patterns in protein structures, such as cancer-associated hotspot regions with a high density of missense somatic mutations (Tokheim et al. 2016). Our group (Sivley et al. 2018) also found that germline causal missense variants for Mendelian diseases show nonrandom patterns in three-dimensional (3D) space. These patterns include clusters that likely reflect disruption of a key functional region and dispersions that likely reflect depletion of variants within a sensitive protein core.To test this hypothesis within sequencing studies of disease traits, we developed a kernel function to quantify genetic similarity among individuals by using protein structure information. When two individuals have different missense variants distal in genomic coordinates but close in 3D protein structure, these individuals will be assigned a high genetic similarity through our kernel function. When applied over an entire data set, our kernel function captures differences in the spatial patterns of rare missense variants among cases and controls or over continuous traits. Using a statistical framework similar to SKAT (Wu et al. 2011), we test the association of rare variants with quantitative and dichotomous phenotypes using this structure-based kernel. We call this approach protein optimized kernel evaluation of missense nucleotides (POKEMON). We validated that POKEMON can identify trait associations with spatial patterns formed by missense variants both in simulation studies and real-world data.
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号