首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Large-scale analysis of balanced chromosomal translocation breakpoints has shown nonhomologous end joining and microhomology-mediated repair to be the main drivers of interchromosomal structural aberrations. Breakpoint sequences of de novo unbalanced translocations have not yet been investigated systematically. We analyzed 12 de novo unbalanced translocations and mapped the breakpoints in nine. Surprisingly, in contrast to balanced translocations, we identify nonallelic homologous recombination (NAHR) between (retro)transposable elements and especially long interspersed elements (LINEs) as the main mutational mechanism. This finding shows yet another involvement of (retro)transposons in genomic rearrangements and exposes a profoundly different mutational mechanism compared with balanced chromosomal translocations. Furthermore, we show the existence of compound maternal/paternal derivative chromosomes, reinforcing the hypothesis that human cleavage stage embryogenesis is a cradle of chromosomal rearrangements.With the advent of genomic technologies, our understanding of the mechanisms underlying human chromosomal rearrangements has expanded rapidly in recent years. Analysis of the breakpoints of copy number variants (CNVs) has revealed that “nonrecurrent CNVs” are mainly generated by microhomology-mediated end joining (MMEJ) (Liang et al. 1996; McVey and Lee 2008), nonhomologous end joining (NHEJ) (Vissers et al. 2009; Conrad et al. 2010), replication fork stalling and template switching (FoSTeS), or microhomology-mediated break-induced replication (MMBIR) (Lee et al. 2007; Bauters et al. 2008; Hastings et al. 2009a). “Recurrent CNVs” are caused by NAHR between low copy repeats (LCRs) (Sharp et al. 2006). In addition to LCRs, NAHR between Alu repetitive elements has also emerged as a major driver for CNV formation (Lehrman et al. 1985; Shaw and Lupski 2005; Luo et al. 2011). The mechanisms underlying chromosomal translocations have been less intensively studied. Recurrent balanced translocations can result from NAHR between paralogous low copy repeats (Giglio et al. 2002; Ou et al. 2011) or from rearrangements between palindromic AT-rich repeats (Edelmann et al. 2001; Kurahashi and Emanuel 2001). Two recent studies reported the first systematic high-throughput sequence-based studies of nonrecurrent balanced chromosomal translocations (Higgins et al. 2008; Chiang et al. 2012). In 30.5% of 141 de novo breakpoints, microhomology was detected, suggesting MMEJ or MMBIR, whereas the remaining breakpoints showed <2 bp of homology implying NHEJ, with or without prior processing of the exposed DNA ends (Lieber 2008). In contrast to balanced translocations, the breakpoint sequences of de novo unbalanced translocations have barely been investigated.Unbalanced translocations, defined as derivative chromosomes comprising a terminal deletion and a duplication of a terminal segment of another chromosome, are found in 0.7%–1.1% of individuals with developmental disabilities (Ravnan et al. 2006; Ballif et al. 2007; Shao et al. 2008) and are generally assumed to result from the unbalanced transmission of a derivative chromosome from a balanced translocation carrier. During our systematic screen of patients with developmental disabilities, we noticed that a substantial number, namely, 0.23% of the patients and comprising 30% of all unbalanced translocations, actually arise de novo. To determine when those de novo unbalanced translocations originate and to learn about the mutational mechanisms leading to those rearrangements, we performed a systematic analysis.  相似文献   

2.
3.
4.
5.
6.
《Genome research》2009,19(9):1682-1690
We present a database of copy number variations (CNVs) detected in 2026 disease-free individuals, using high-density, SNP-based oligonucleotide microarrays. This large cohort, comprised mainly of Caucasians (65.2%) and African-Americans (34.2%), was analyzed for CNVs in a single study using a uniform array platform and computational process. We have catalogued and characterized 54,462 individual CNVs, 77.8% of which were identified in multiple unrelated individuals. These nonunique CNVs mapped to 3272 distinct regions of genomic variation spanning 5.9% of the genome; 51.5% of these were previously unreported, and >85% are rare. Our annotation and analysis confirmed and extended previously reported correlations between CNVs and several genomic features such as repetitive DNA elements, segmental duplications, and genes. We demonstrate the utility of this data set in distinguishing CNVs with pathologic significance from normal variants. Together, this analysis and annotation provides a useful resource to assist with the assessment of CNVs in the contexts of human variation, disease susceptibility, and clinical molecular diagnostics.Copy number variation (CNV) in the human genome significantly influences human diversity and predisposition to disease (Sebat et al. 2004, 2007; Sharp et al. 2005; Conrad et al. 2006; Feuk et al. 2006; Hinds et al. 2006; McCarroll et al. 2006; Redon et al. 2006; Kidd et al. 2008; Perry et al. 2008; Walsh et al. 2008). CNVs arise from genomic rearrangements, primarily owing to deletion, duplication, insertion, and unbalanced translocation events. The pathogenic role of CNVs in genetic disorders has been well documented (Lupski and Stankiewicz 2005), yet the extent to which CNVs contribute to phenotypic variation and complex disease predisposition remains poorly understood. CNVs have been known to contribute to genetic disease through different mechanisms, resulting in either imbalance of gene dosage or gene disruption in most cases. In addition to their direct correlation with genetic disorders, CNVs are known to mediate phenotypic changes that can be deleterious (Feuk et al. 2006; Freeman et al. 2006). Recently, several studies have reported an increased burden of rare or de novo CNVs in complex disorders such as Autism, ADHD, and schizophrenia as compared to normal controls, highlighting the potential pathogenicity of rare or unique CNVs (Sebat et al. 2007; International Schizophrenia Consortium 2008; Stefansson et al. 2008; Walsh et al. 2008; Xu et al. 2008; Elia et al. 2009). Thus, more thorough analysis of genomic CNVs is necessary in order to determine their role in conveying disease risk.Several approaches have been used to examine CNVs in the genome, including array CGH and genotyping microarrays (Albertson and Pinkel 2003; Iafrate et al. 2004; Sebat et al. 2004; Sharp et al. 2005; Redon et al. 2006; Wong et al. 2007). Results from more than 30 studies comprising 21,000 CNVs have been reported in public repositories (Iafrate et al. 2004). However, a majority of these studies have been performed on limited numbers of individuals using a variety of nonuniform technologies, reporting methods, and disease states. In addition, these data are both substantially reiterative and enriched in CNV events that are frequently observed in one or more populations. Thus, extreme care is needed in determining whether a particular structural variant plays a role in disease susceptibility or progression. To address these challenges, we identified and characterized the constellation of CNVs observed in a large cohort of healthy children and their parents, when available. This study uses uniform measures to detect and assess CNVs within the context of genomic and functional annotations, as well as to demonstrate the utility of this information in assessing their impact on abnormal phenotypes. Our analysis and annotation provide a useful resource to assist with the assessment of structural variants in the contexts of human variation, disease susceptibility, and clinical molecular diagnostics.  相似文献   

7.
8.
9.
10.
Evolution is fueled by phenotypic diversity, which is in turn due to underlying heritable genetic (and potentially epigenetic) variation. While environmental factors are well known to influence the accumulation of novel variation in microorganisms and human cancer cells, the extent to which the natural environment influences the accumulation of novel variation in plants is relatively unknown. Here we use whole-genome and whole-methylome sequencing to test if a specific environmental stress (high-salinity soil) changes the frequency and molecular profile of accumulated mutations and epimutations (changes in cytosine methylation status) in mutation accumulation (MA) lineages of Arabidopsis thaliana. We first show that stressed lineages accumulate ∼100% more mutations, and that these mutations exhibit a distinctive molecular mutational spectrum (specific increases in relative frequency of transversion and insertion/deletion [indel] mutations). We next show that stressed lineages accumulate ∼45% more differentially methylated cytosine positions (DMPs) at CG sites (CG-DMPs) than controls, and also show that while many (∼75%) of these CG-DMPs are inherited, some can be lost in subsequent generations. Finally, we show that stress-associated CG-DMPs arise more frequently in genic than in nongenic regions of the genome. We suggest that commonly encountered natural environmental stresses can accelerate the accumulation and change the profiles of novel inherited variants in plants. Our findings are significant because stress exposure is common among plants in the wild, and they suggest that environmental factors may significantly alter the rates and patterns of incidence of the inherited novel variants that fuel plant evolution.In The Origin of Species, Darwin identified heritable variation as fundamental to biological evolution (Darwin 1859), although he could not define that variation. We now understand that the heritable variation underlying evolution is substantially due to genetic (e.g., DNA sequence mutation) and potentially to epigenetic (e.g., altered cytosine methylation or histone modification status) change (Mitchell-Olds and Schmitt 2006; Baer et al. 2007; Nordborg and Weigel 2008; Richards 2008; Atwell et al. 2010; Liu et al. 2010; Schmitz et al. 2011, 2013; Schmitz and Ecker 2012). Furthermore, recent advances have provided a provisional genome-wide picture of how genetic and epigenetic changes accumulate during successive generations. For example, previous studies have characterized the de novo variants accumulating in mutation accumulation (MA) lineages of the genetic model plant Arabidopsis thaliana (Ossowski et al. 2010; Becker et al. 2011; Schmitz et al. 2011). These studies have revealed the frequencies and patterns with which mutations and epimutations accumulate in MA lineages grown in relatively sheltered artificial laboratory environments. However, the natural environment is rarely as benign as these laboratory environments, and plants growing in nature are frequently exposed to varying combinations of environmental stresses (Easterling et al. 2000; Parmesan and Yohe 2003; Mittler 2006). Furthermore, the phenomenon of stress-induced mutagenesis (SIM), in which mutation is promoted when cells are poorly adapted to their environment, is well established in bacteria (Al Mamun et al. 2012) and more recently identified in yeast and human cancer cells (Bindra et al. 2007; Shor et al. 2013). We therefore sought to determine if propagation of A. thaliana MA lineages in a stressful environment changes the rates and profiles of de novo variant accumulation, reasoning that such changes could have important implications for the understanding of plant genome evolution in nature.Soil salinity is a widespread source of plant abiotic stress, affecting ∼6% of global land area (Hasegawa et al. 2000; Zhu 2002; Munns and Tester 2008). Plants have evolved mechanisms to circumvent the effects of high soil Na+, the most prevalent ionic form of natural soil-salinity (Zhu 2002). While the short-term physiological consequences of exposure to high soil-salinity are increasingly well understood (Rus et al. 2001; Zhu 2002; Ren et al. 2005; Munns and Tester 2008; Baxter et al. 2010; Jiang et al. 2012, 2013; Zhou et al. 2012), the longer-term evolutionary genetic and epigenetic consequences are not. For example, it was not previously known if multiple successive generations of exposure to soil-salinity stress changes the properties of genome-wide accumulated de novo variants, thus in turn affecting evolutionary processes. Here we directly address this issue and show that A. thaliana (The Arabidopsis Genome Initiative 2000) MA lineages grown for 10 successive generations on saline soil display an increased frequency of accumulated de novo mutations and epimutations (differentially methylated cytosine positions, DMPs). We also show that the mutations accumulating during soil-salinity stress exhibit a distinctive molecular mutational spectrum that differs from that of mutations accumulating in nonstressed control MA lineages. Our observations have important implications for the understanding of plant genome evolution in the stressful natural environment.  相似文献   

11.
Understanding patterns of spontaneous mutations is of fundamental interest in studies of human genome evolution and genetic disease. Here, we used extremely rare variants in humans to model the molecular spectrum of single-nucleotide mutations. Compared to common variants in humans and human–chimpanzee fixed differences (substitutions), rare variants, on average, arose more recently in the human lineage and are less affected by the potentially confounding effects of natural selection, population demographic history, and biased gene conversion. We analyzed variants obtained from a population-based sequencing study of 202 genes in >14,000 individuals. We observed considerable variability in the per-gene mutation rate, which was correlated with local GC content, but not recombination rate. Using >20,000 variants with a derived allele frequency ≤10−4, we examined the effect of local GC content and recombination rate on individual variant subtypes and performed comparisons with common variants and substitutions. The influence of local GC content on rare variants differed from that on common variants or substitutions, and the differences varied by variant subtype. Furthermore, recombination rate and recombination hotspots have little effect on rare variants of any subtype, yet both have a relatively strong impact on multiple variant subtypes in common variants and substitutions. This observation is consistent with the effect of biased gene conversion or selection-dependent processes. Our results highlight the distinct biases inherent in the initial mutation patterns and subsequent evolutionary processes that affect segregating variants.Mutation is one of the most fundamental processes in biology. It is the ultimate source of genetic variation and one of the driving forces of evolution. Mutation also plays a significant role in the etiology of human diseases. There is considerable interest in understanding the underlying pattern and molecular spectrum of spontaneous mutations. Historically, two approaches were applied to estimate the single-nucleotide mutation rate in humans. The first analyzes divergent sites between humans and another species, typically chimpanzee. According to Kimura''s neutral theory, the majority of substitutions are neutral and therefore the extent of between-species divergence can be used to estimate the neutral mutation rate (Kimura 1983). Many groups have applied this approach to estimate the spontaneous mutation rate in humans (Drake et al. 1998; Nachman and Crowell 2000; Kumar and Subramanian 2002; Silva and Kondrashov 2002). However, several forces, including natural selection, biased gene conversion (BGC), and demographic history, can alter fixation probabilities and reshape the spectrum and genomic distribution of between-species substitution patterns. A second, more direct approach, pioneered by Haldane (1935), uses incidence rates of dominant disorders in humans to estimate the mutation rate (Sommer 1995; Sommer and Ketterling 1996; Kondrashov 2003; Lynch 2010). This approach, however, is limited by the fact that only a small subset of new mutations manifest as disease variants (Nachman 2004).The mutation rates from these studies represent a genome-wide average. However, there is extensive variability among different genes or genomic regions in both between-species divergence and within-species diversity (Wolfe et al. 1989; Nachman and Crowell 2000; Sachidanandam et al. 2001; Smith and Lercher 2002; Kondrashov 2003; Hodgkinson et al. 2009). This suggests that spontaneous mutation rates are not constant throughout the genome, although the reasons behind this variability are unclear.Local nucleotide composition is a frequently studied feature that could contribute to mutation rate variability. One study showed that AT > GC (an A base replaced with a G or a T base replaced with a C) common variants segregate at a higher frequency in regions with higher GC content (Webster et al. 2003), and others similarly reported increased fixation bias toward GC base pairs in GC-rich regions (Lercher and Hurst 2002a; Lercher et al. 2002). However, analyses of GC content and variant patterns often reported contradicting findings. For example, while some studies showed that GC content is positively correlated with both divergence rates between humans and chimpanzee (Smith et al. 2002; Webster et al. 2003; Arndt and Hwa 2005; Duret and Arndt 2008) and within-human nucleotide diversity (Sachidanandam et al. 2001; Hellmann et al. 2005), another study found a negative correlation (Cai et al. 2009). Furthermore, while some studies reported increasing GC > AT substitution rates with increasing GC content (Smith et al. 2002; Webster et al. 2003), others showed a decrease (Arndt and Hwa 2004; Duret and Arndt 2008). These inconsistencies could be partly explained by differences in the allele frequency, and therefore the evolutionary time scale of the variants analyzed in different studies. Consequently the observed patterns could be the result of confounding factors, such as selection and demography, instead of alterations in the actual mutation rate.Recombination is known to influence patterns of common variation and substitution rates. Correlations between recombination rate and nucleotide diversity or between species substitution rates have been observed in humans (Nachman et al. 1998; Nachman 2001; Lercher and Hurst 2002b; Hellmann et al. 2003, 2005; Spencer et al. 2006; Duret and Arndt 2008; Cai et al. 2009; Lohmueller et al. 2011), Drosophila (Begun and Aquadro 1992; Begun et al. 2007; Kulathinal et al. 2008), and several plant species (Dvorak et al. 1998; Kraft et al. 1998; Stephan and Langley 1998; Tenaillon et al. 2004). Three major theories exist to explain these observations. First, recombination may be directly mutagenic, leading to increased mutation rates in regions of high recombination and thus higher diversity (Lercher and Hurst 2002b; Hellmann et al. 2003, 2008). Second, while background selection and selective sweeps reduce haplotype diversity, recombination generates new haplotypes by shuffling variants onto different backgrounds, thereby maintaining diversity in regions of high recombination rates (Kaplan et al. 1989; Charlesworth et al. 1993, 1995; Hudson and Kaplan 1995; Nachman 2001). A third explanation is BGC, a recombination-associated process that preferentially repairs AT/GC mismatches produced during recombination to GC bases, leading to preferential fixation of GC alleles (for review, see Duret and Galtier 2009). Over time, the observed effect of BGC can mimic that of natural selection, leading to an excess of “weak” (W) A/T bases converted to “strong” (S) G/C bases as if the latter were under positive selection (Berglund et al. 2009; Galtier et al. 2009; Necsulea et al. 2011). The reports hypothesizing a mutagenic effect of recombination relied on common variants and substitutions (Lercher and Hurst 2002b; Hellmann et al. 2003, 2008). Several lines of evidence argue against the mutagenic recombination theory and instead suggest that a selection-dependent mechanism or BGC can explain the observed correlation between diversity and recombination rate (Duret and Arndt 2008; Berglund et al. 2009; Galtier et al. 2009; Lohmueller et al. 2011).Previous studies using common variants within humans and substitutions between humans and chimpanzees are effectively dealing with mutations accumulated over many generations. Their patterns, therefore, reflect the cumulative influence of many processes, including natural selection, population demographic history, and BGC. A major challenge in the field is to elucidate the extent to which these forces alter the distribution of variants over time and to distinguish their relative contributions. To minimize the effects of selection, many studies restrict their analysis to noncoding regions of the genome. However, widespread signatures of recent positive selection, even within supposedly neutral regions (Williamson et al. 2007), suggest that noncoding regions may also be influenced by selection.Rare variants represent a newly available and expanding resource that can overcome some of these limitations. Rare variants are relatively young, predominantly because they are the result of recent mutation events. Therefore, rare variants are typically less affected by population demographic history or natural selection (Messer 2009). Furthermore, as BGC acts only on variants after they have arisen in the population (Duret and Galtier 2009), it does not influence innate mutation rates. Rare variants, therefore, are an appropriate resource for studying the spectrum and genomic distribution of mutations while minimizing the potentially confounding influences. In addition, while family-based whole-genome sequencing has begun to identify de novo mutations that provide more direct measures of mutation rates (The 1000 Genomes Project Consortium 2010; Conrad et al. 2011; Campbell et al. 2012; Kong et al. 2012), the identified mutations sparsely cover the genome. For example, if whole-genome sequencing of each parent-offspring trio yields ∼40 de novo mutations (Conrad et al. 2011), 500 such trios would need to be sequenced to accumulate roughly 20,000 mutations. These mutations, however, would occur once per 150 kb on average, and the data would lack the spatial resolution necessary to detect the effect of local genomic context on a finer scale.We studied a set of rare variants discovered via targeted resequencing of 202 genes in >14,000 unrelated individuals. We analyzed the per-gene mutation rate as well as the probability of each site to contain a variant of a specific subtype relative to local GC content, recombination rate, and recombination hotspots. In order to compare mutation rate inferences based on rare variants with those obtained by within- and between-species data, we compared rare variant patterns to common variant data from The 1000 Genomes Project Consortium and substitution sites between humans and chimpanzee. These three variant classes cover different evolutionary time scales, and the differences between them allow us to examine the distinct influence of genomic context on the initial mutation process, the subsequent rise of some mutations to become common variants, and eventual fixation.  相似文献   

12.
13.
14.
15.
16.
17.
18.
Pan-American mitochondrial DNA (mtDNA) haplogroup C1 has been recently subdivided into three branches, two of which (C1b and C1c) are characterized by ages and geographical distributions that are indicative of an early arrival from Beringia with Paleo-Indians. In contrast, the estimated ages of C1d—the third subset of C1—looked too young to fit the above scenario. To define the origin of this enigmatic C1 branch, we completely sequenced 63 C1d mitochondrial genomes from a wide range of geographically diverse, mixed, and indigenous American populations. The revised phylogeny not only brings the age of C1d within the range of that of its two sister clades, but reveals that there were two C1d founder genomes for Paleo-Indians. Thus, the recognized maternal founding lineages of Native Americans are at least 15, indicating that the overall number of Beringian or Asian founder mitochondrial genomes will probably increase extensively when all Native American haplogroups reach the same level of phylogenetic and genomic resolution as obtained here for C1d.While debate is still ongoing among scientists from several disciplines regarding the number of migratory events, their timing, and entry routes into the Americas (Wallace and Torroni 1992; Torroni et al. 1993; Forster et al. 1996; Kaufman and Golla 2000; Goebel et al. 2003, 2008; Schurr and Sherry 2004; Wang et al. 2007; Waters and Stafford 2007; Dillehay et al. 2008; Gilbert et al. 2008a; O''Rourke and Raff 2010), the general consensus is that modern Native American populations ultimately trace their gene pool to Asian groups who colonized northeast Siberia, including parts of Beringia, prior to the last glacial period. These ancestral population(s) probably retreated into refugial areas during the Last Glacial Maximum (LGM), where their genetic variation was reshaped by drift. Thus, pre-LGM haplotypes of Asian ancestry were differently preserved and lost in Beringian enclaves, but at the same time, novel haplotypes and alleles arose in situ due to new mutations, often becoming predominant because of major founder events (Tamm et al. 2007; Achilli et al. 2008; Bourgeois et al. 2009; Perego et al. 2009; Schroeder et al. 2009). The scenario of a temporally important differentiation stage in Beringia explains the predominance in Native Americans of private alleles and haplogroups such as the autosomal 9-repeat at microsatellite locus D9S1120 (Phillips et al. 2008; Schroeder et al. 2009), the Y chromosome haplogroup Q1a3a-M3 (Bortolini et al. 2003; Karafet et al. 2008; Rasmussen et al. 2010), and the pan-American mtDNA haplogroups A2, B2, C1b, C1c, C1d, D1, and D4h3a (Tamm et al. 2007; Achilli et al. 2008; Fagundes et al. 2008; Perego et al. 2009).In the millennia after the initial Paleo-Indian migrations, other groups from Beringia or eastern Siberia expanded into North America. If the gene pool of the source population(s) had in the meantime partially changed, not only because of drift, but also due to the admixture with population groups newly arrived from regions located west of Beringia, this would have resulted in the entry of additional Asian lineages into North America. This scenario, sometimes invoked to explain the presence of certain mtDNA haplogroups such as A2a, A2b, D2a, D3, and X2a only in populations of northern North America (Torroni et al. 1992; Brown et al. 1998; Schurr and Sherry 2004; Helgason et al. 2006; Achilli et al. 2008; Gilbert et al. 2008b; Perego et al. 2009), has recently received support from nuclear and morphometric data showing that some native groups from northern North America harbor stronger genetic similarities with some eastern Siberian groups than with Native American groups located more in the South (González-José et al. 2008; Bourgeois et al. 2009; Wang et al. 2009; Rasmussen et al. 2010).As for the pan-American mtDNA haplogroups, when analyzed at the highest level of molecular resolution (Bandelt et al. 2003; Tamm et al. 2007; Fagundes et al. 2008; Perego et al. 2009), they all reveal, with the exception of C1d, entry times of 15–18 thousand years ago (kya), which are suggestive of a (quasi) concomitant post-LGM arrival from Beringia with early Paleo-Indians. A similar entry time is also shown for haplogroup X2a, whose restricted geographical distribution in northern North America appears to be due not to a later arrival, but to its entry route through the ice-free corridor (Perego et al. 2009). Despite its continent-wide distribution, C1d was hitherto characterized by an expansion time of only 7.6–9.7 ky (Perego et al. 2009). This major discrepancy has been attributed to a poor and possibly biased representation of complete C1d mtDNA sequences (only 10) in the available data sets (Achilli et al. 2008; Malhi et al. 2010). To clarify the issue of the age of haplogroup C1d and its role as a founding Paleo-Indian lineage, we sequenced and analyzed 63 C1d mtDNAs from populations distributed over the entire geographical range of the haplogroup.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号