首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Genetic variants drive the evolution of traits and diseases. We previously modeled these variants as small displacements in fitness landscapes and estimated their functional impact by differentiating the evolutionary relationship between genotype and phenotype. Conversely, here we integrate these derivatives to identify genes steering specific traits. Over cancer cohorts, integration identified 460 likely tumor-driving genes. Many have literature and experimental support but had eluded prior genomic searches for positive selection in tumors. Beyond providing cancer insights, these results introduce a general calculus of evolution to quantify the genotype–phenotype relationship and discover genes associated with complex traits and diseases.

From short-term disease risk to long term species evolution, the genotype–phenotype relationship describes how genetic variations induce biological change (Manolio et al. 2009). Experimental screens tracking the effect of these variations include RNA interference (RNAi) (Boutros and Ahringer 2008), CRISPR-Cas9 knockouts (Mali et al. 2013; Koike-Yusa et al. 2014; Wang et al. 2014b; Hart et al. 2015), and deep mutational scans (Fowler et al. 2014) within the limitations of achievable perturbations and assays (Mak and Justman 2017). Alternately, statistical analyses of genome-wide association identify overrepresented variants in case-control studies. These presumably influence the phenotype common to case subjects (Hirschhorn and Daly 2005; Hardy and Singleton 2009), although sample size, signal quality (McCarthy et al. 2008), and rate biases (Korte and Farlow 2013) may limit accuracy.Here, we propose a different approach to recover the genotype–phenotype relationship, which is based on representing genetic variations as moves in the fitness landscape (Wright 1932). Prior theory suggests that these moves should generally be small and nearly neutral (Nei 2005). Against this background, we hypothesize that gene mutations driving new phenotypes are the result of abnormally large moves in the fitness landscape. Testing this hypothesis requires a metric for motions in the fitness landscape. We propose to use the evolutionary action (EA) of mutations on fitness described in prior work as the derivative of the genotype–phenotype relationship (Katsonis and Lichtarge 2014). In practice, the EA score correlates with the experimental effects of mutations (Katsonis and Lichtarge 2014) and consistently performs well in blinded assessments of predictions of deleterious mutations against state-of-the-art statistical and machine learning methods (Katsonis and Lichtarge 2017, 2019). A limitation of EA, however, is that it describes only the impact of single mutations, or individual moves in the fitness landscape. This is not sufficient to interpret complex polygenic phenotypes owing to multiple causal variants. To identify groups of gene variations that in aggregate drive patients to a disease region of the fitness landscape, we therefore propose a new operation, called Cohort Integration (CI), which sums the individual effects of variants measured with EA over all genes and over all patients. Calculus suggests that this summation will reverse the differential operation that led to EA in the first place and thus recover the genotype–phenotype relationship, meaning it will uncover genes that drive cohort-specific traits.Here, we test this model in cancer. Tumor genomes evolve (Greaves and Maley 2012) by acquiring advantageous somatic mutations that, when considered collectively across a cohort of cancer patients, should be associated with a large displacement in the fitness landscape. The average number of coding mutations per tumor can be as small as about eight in leukemia but is more often confoundingly large, such as about 1600 in colorectal cancers (Vogelstein et al. 2013). Among these, however, the number of cancer-driving somatic mutations are relatively few, three to five by some estimates (Tomasetti et al. 2015), and finding these cancer drivers remains difficult, as well as critical for personalized therapy (Chin et al. 2011). To search for cancer genes that harbor these driver mutations, state-of-the-art methods (Dees et al. 2012; Davoli et al. 2013; Vogelstein et al. 2013; Lawrence et al. 2014; Tokheim et al. 2016; Martincorena et al. 2017; Bailey et al. 2018; Dietlein et al. 2020; Martínez-Jiménez et al. 2020) pool statistics and machine learning to search for signs of positive selection in cancer genes, including mutation frequency (Dees et al. 2012; Lawrence et al. 2014), surrounding nucleotide context (Dietlein et al. 2020), inactivation bias (Greenman et al. 2007; Van den Eynden et al. 2015; Martincorena et al. 2017), functional impact (Gonzalez-Perez and Lopez-Bigas 2012; Davoli et al. 2013), and structural or functional clustering (Tamborero et al. 2013; Porta-Pardo and Godzik 2014). Some of the challenges to identify driver genes include inaccurate background mutation rates (Lawrence et al. 2013), too few mutations per gene (Van den Eynden et al. 2015), and unbalanced distributions of passenger mutations (Bignell et al. 2010) that lead to a repertoire of mutational signatures (Alexandrov et al. 2013, 2020). Notably, as much as 60% to 80% of genes identified by one method are not found by others (Tokheim et al. 2016), and a large number of rare cancer drivers in individual patients remains hidden for lack of a population-wide role (Garraway and Lander 2013; Chang et al. 2016). The sequencing of diverse types of somatic tumor tissue from a large number of patients by The Cancer Genome Atlas (TCGA) (The Cancer Genome Atlas Research Network 2008; The Cancer Genome Atlas Research Network et al. 2013; Tomczak et al. 2015; The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium 2020) yields a rich data set for discovering new cancer genes. Here, we used CI to prioritize the collective fitness impact of variants across all genes and all patients in a cohort in order to discover candidate cancer-driver genes and compare this approach to other cancer gene identification techniques (Greenman et al. 2007; Dees et al. 2012; Gonzalez-Perez and Lopez-Bigas 2012; Davoli et al. 2013; Tamborero et al. 2013; Lawrence et al. 2014; Porta-Pardo and Godzik 2014; Van den Eynden et al. 2015; Tokheim et al. 2016; Martincorena et al. 2017; Bailey et al. 2018).  相似文献   

13.
14.
15.
16.
17.
Genome-wide association studies (GWAS) have been highly informative in discovering disease-associated loci but are not designed to capture all structural variations in the human genome. Using long-read sequencing data, we discovered widespread structural variation within SINE-VNTR-Alu (SVA) elements, a class of great ape-specific transposable elements with gene-regulatory roles, which represents a major source of structural variability in the human population. We highlight the presence of structurally variable SVAs (SV-SVAs) in neurological disease–associated loci, and we further associate SV-SVAs to disease-associated SNPs and differential gene expression using luciferase assays and expression quantitative trait loci data. Finally, we genetically deleted SV-SVAs in the BIN1 and CD2AP Alzheimer''s disease–associated risk loci and in the BCKDK Parkinson''s disease–associated risk locus and assessed multiple aspects of their gene-regulatory influence in a human neuronal context. Together, this study reveals a novel layer of genetic variation in transposable elements that may contribute to identification of the structural variants that are the actual drivers of disease associations of GWAS loci.

Discovering the genetic variation underlying human diseases is a common goal in human genetics, and the rapid increase of genome-wide association studies (GWAS) has generated a vast catalog of single-nucleotide polymorphisms (SNPs) associated with specific traits and diseases (MacArthur et al. 2017). In most cases, GWAS do not identify the genetic variation that drives the trait but use SNPs as markers to highlight trait-associated loci through linkage disequilibrium (LD) (Edwards et al. 2013). This calls for elaborate post-GWAS analysis to shed light on the genes and mechanisms involved in specific traits (Backman et al. 2021; Mortezaei and Tavallaei 2021). A comprehensive view of the genetic structural variants that exist within loci containing trait-associated SNPs is an essential first step to assessing how these variants may lead to disease susceptibility on both genetic and functional levels (Eichler 2019).One source of structural variation that has not been sufficiently considered comes from transposable elements (TEs), which together constitute >42% of the human genome (Smit 1999; International Human Genome Sequencing Consortium 2001; Audano et al. 2019; Linthorst et al. 2020). Although the vast majority of TEs do not alter coding regions of our genome, some TE classes harbor strong gene regulatory potential that can directly affect gene expression levels (Jacobs et al. 2014; Wang et al. 2014; Chuong et al. 2016; Fuentes et al. 2018; Pontis et al. 2019). The TE-mediated regulatory effect on genes is highly tissue-specific and has been shown to be particularly prominent in a neuronal environment (Jacob-Hirsch et al. 2018; Trizzino et al. 2018; Pontis et al. 2019; Miao et al. 2020; Sundaram and Wysocka 2020). TEs are activated during aging, neurodegeneration, and neurological diseases, but whether this is a cause or a consequence of the disease pathology remains unknown in many cases (Frank et al. 2005; Li et al. 2013; Van Meter et al. 2014; Guo et al. 2018; Shpyleva et al. 2018).Only TEs of the Alu, L1, and SVA (SINE-VNTR-Alu) families can still actively spread through the genome, and new insertions cause variation between individuals in the form of presence/absence TE-insertional polymorphisms (Kazazian et al. 1988; Batzer et al. 1991; Brouha et al. 2003; Ostertag et al. 2003). TEs can alter gene regulation in the locus in which they insert, such that the presence or absence of a TE can lead to inter-individual differences in gene expression. There are approximately 60,475 Alu, 10,018 L1, and 6417 SVA TE-insertional polymorphisms known, with new insertions occurring every 40, 63, and 63 births, respectively (Feusier et al. 2019; Collins et al. 2020). Some of these new insertions have been linked to diseases (Makino et al. 2007; Hancks and Kazazian 2016; Sekar et al. 2016; Payer et al. 2017; Payer and Burns 2019; Pfaff et al. 2021). Next to presence/absence TE polymorphisms, structural variation within fixed TEs (TE insertions observed in all individuals in the human population) has also been reported (Savage et al. 2013, 2014), although the prevalence of this type of structural variation has remained elusive. The repetitive nature of TEs increases the propensity for unequal crossover events or DNA polymerase slippage during meiosis, for which variable number of tandem repeats (VNTRs) are especially susceptible (Brookes 2013). SVA elements harbor unusually large VNTRs as their internal segment and have a unique sequence composition compared to other VNTRs in our genome. The structural variation in VNTRs is particularly interesting because they are often associated with gene-regulatory functions, and many genes have accrued VNTRs as essential regulatory elements for their expression (International Human Genome Sequencing Consortium 2001; Fondon et al. 2008).It is becoming increasingly clear that gene-regulatory properties of TEs were co-opted during evolution, leading to the integration of TEs as novel gene-regulatory elements in preexisting gene expression networks (Cordaux and Batzer 2009; Chuong et al. 2016). As such, TEs have become an integral part of normal human gene regulation. Because our genome has become dependent on TEs for specific aspects of gene regulation, structural variation within fixed TEs could account for inter-individual differences in temporal or spatial aspects of gene expression. Despite the possible roles structurally variable TEs may play in human health or disease, this level of structural variation has remained largely undocumented. This is mainly a result of technical limitations associated with the highly repetitive DNA sequences within TEs, which makes identifying structural variations in TEs using short-read sequencing strategies extremely challenging. The development of long-read sequencing techniques provides, for the first time, the opportunity to accurately assess the level of structural variation (Eichler 2019). This allows for the evaluation of possible associations between disease susceptibility and specific structural variations found in fixed TEs in our genome (Audano et al. 2019; Chaisson et al. 2019; Sulovari et al. 2019; Ewing et al. 2020; Ebert et al. 2021; Porubsky et al. 2021).In this study we discovered that SVA retrotransposons, a great ape-specific class of TEs, constitutes a major source of hidden genetic variation that is not taken into account by conventional genetic case-control studies. We set out to investigate the biological consequences of structural variability in SVAs, focusing on SV-SVAs in Alzheimer''s disease (AD)– and Parkinson''s disease (PD)–associated GWAS loci. We assessed the gene-regulatory influence of SV-SVAs in a human neuronal context by genetic deletion of SVAs in three disease-associated loci. Our findings highlight the importance of careful mapping of structural variations within fixed TEs in the human population and argue for their inclusion in complex trait genetics as a layer of genetic variation that may, in some cases, confer the actual disease susceptibility to a GWAS-identified locus.  相似文献   

18.
Genetic drift can dramatically change allele frequencies in small populations and lead to reduced levels of genetic diversity, including loss of segregating variants. However, there is a shortage of quantitative studies of how genetic diversity changes over time in natural populations, especially on genome-wide scales. Here, we analyzed whole-genome sequences from 76 wolves of a highly inbred Scandinavian population, founded by only one female and two males, sampled over a period of 30 yr. We obtained chromosome-level haplotypes of all three founders and found that 10%–24% of their diploid genomes had become lost after about 20 yr of inbreeding (which approximately corresponds to five generations). Lost haplotypes spanned large genomic regions, as expected from the amount of recombination during this limited time period. Altogether, 160,000 SNP alleles became lost from the population, which may include adaptive variants as well as wild-type alleles masking recessively deleterious alleles. Although not sampled, we could indirectly infer that the two male founders had megabase-sized runs of homozygosity and that all three founders showed significant haplotype sharing, meaning that there were on average only 4.2 unique haplotypes in the six copies of each autosome that the founders brought into the population. This violates the assumption of unrelated founder haplotypes often made in conservation and management of endangered species. Our study provides a novel view of how whole-genome resequencing of temporally stratified samples can be used to visualize and directly quantify the consequences of genetic drift in a small inbred population.

Genetic diversity is a key component for long-term viability of populations in a changing environment (Lande and Shannon 1996; Lacy 1997; Saccheri et al. 1998; Reed and Frankham 2003; Sommer 2005; Lai et al. 2019). When the size of a population decreases, the maintenance of genetic diversity becomes challenging. In small populations genetic drift (random sampling of alleles) and inbreeding (mating of closely related individuals) will tend to erode genetic diversity. Although drift has a direct effect on allele frequencies in a population, inbreeding increases the frequency of homozygotes, which in turn reduces the effective population size and effective frequency of recombination (Charlesworth 2003). This may lead to the accumulation of recessive deleterious alleles across the genome (Charlesworth and Charlesworth 1999; Rogers and Slatkin 2017) and the associated risk for inbreeding depression (Charlesworth and Willis 2009; Hedrick and Garcia-Dorado 2016).There is a well-established theoretical framework for the study of inbreeding and genetic drift and how they contribute to the loss of genetic diversity (Wright 1931). Empirically, loss of genetic diversity may be indirectly estimated by analyzing pedigree information (Lacy 1997; Grueber and Jamieson 2008; Jansson and Laikre 2014), although this is limited to the few populations for which such information is available. Many conservation genetic studies have quantified genetic diversity in populations using molecular analyses, now feasible on a genome-wide scale (e.g., Prado-Martinez et al. 2013; Abascal et al. 2016; Kardos et al. 2018). Typically, these studies provide a snapshot on contemporary levels of diversity in a population, which in itself does not easily translate into the conservation status of populations (Ellegren et al. 1993; Dobrynin et al. 2015; Díez-del-Molino et al. 2018). Moreover, monitoring actual loss of genetic diversity requires temporal studies including analyses of change in genomic parameters such as heterozygosity and inbreeding coefficient (Díez-del-Molino et al. 2018). Temporal data may not be easy to collect from natural populations and studies on genetic drift therefore tend to be restricted to model organisms (Nené et al. 2018; Subramanian 2018; Ørsted et al. 2019) and museum collections (Díez-del-Molino et al. 2018; Ewart et al. 2019; Turvey et al. 2019).A direct but largely untested approach to study genomic erosion in a population is to follow the survival of individual haplotypes over time. The Scandinavian gray wolf (Canis lupus) population provides an excellent opportunity for this kind of study. After being widely distributed across Europe up until modern times, wolves were eradicated by human persecution, including in Scandinavia (Haglund 1968; Wabakken et al. 2001; Hindrikson et al. 2017; Wolf and Ripple 2017). After functional extinction in the late 1960s, a wolf population was reestablished in Scandinavia by breeding three immigrant founders: a pair in 1983, and a second male in 1991 (Wabakken et al. 2001; Vilà et al. 2003). The small number of founders and absence of gene flow from neighboring populations resulted in rapid increase of inbreeding (Vilà et al. 2003; Liberg et al. 2005; Åkesson et al. 2016). However, the population size increased and is currently about 480 individuals, including additional immigrants that recently have contributed to reproduction (Åkesson et al. 2016; Svensson et al. 2021).We have shown previously that individuals of this population have accumulated long runs of homozygosity, some being inbred to an extent that entire chromosome pairs are identical by descent (Kardos et al. 2018). Here, we use whole-genome resequencing data of 76 Scandinavian wolves sampled over a period of 30 yr after the reestablishment to directly quantify tempo of genomic erosion in terms of haplotype and allele loss. Specifically, by deriving phased chromosome-level haplotypes of the founders and following their fate over time, we provide a novel empirical insight into how founder relatedness and rapid loss of large founder haplotype segments facilitates the observed high inbreeding level of the population.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号