首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
Kveraga et al. (2002, Exp Brain Res 146(3):307–14) reported that saccade latencies are immune to the effects of stimulus-response uncertainty and constitute one of the few response systems that violate Hicks law. Similar effects have been reported for keypresses triggered by vibrations of the fingertips, but robust uncertainty effects were subsequently revealed using weak, low-frequency vibrations (Ten Hoopen et al. 1982, Acta Psychol 50:143–157). We wondered whether immunity of saccadic responses would demonstrate a similar intensity-dependency and therefore re-examined the effects of response entropy on saccade latencies using near-threshold visual stimuli. Saccadic latencies remained independent of stimulus-response uncertainty, indicating that saccadic motor programming is unaffected by the duration of the target detection process.  相似文献   

3.
Object motor representation and language   总被引:1,自引:0,他引:1  
Results of kinematic studies on the control of the reaching–grasping motor act (Gentilucci 2003, Exp Brain Res 149:395–400) suggest that grasp is guided by a single motor representation, which codes all the possible types of interactions with the objects. Neuroimaging studies in humans (Chao and Martin 2000, Neuroimage 12:478–484; Grabowski et al. 1998, Neuroimage 7:232–243; Grafton et al. 1997, Neuroimage 6:231–236; Martin et al. 1995, Science 270:102–105) suggest that these representations are coded in the premotor cortex and are automatically activated when naming the object or viewing it without the execution of an overt action. If an object motor representation is accessed by language, naming of object properties related to sensory-motor transformation can automatically influence the object motor representation. This hypothesis was verified by behavioural experiments (Gentilucci and Gangitano 1998, Eur J Neurosci 10:752–756; Gentilucci et al. 2000, Exp Brain Res 133:468–490; Glover and Dixon 2002, Exp Brain Res 146:383–387), which showed that automatic reading (and probably silent naming; MacLeod 1991, Psychol Bull 109:163–203) of adjectives related to object properties analysed for planning the reaching–grasping motor act influenced the control of the arm movement. In a new study it was determined whether the class of a word can be a factor selectively influencing motor control. Participants were required to reach for and grasp an object located either on the right or on the left, and to place it on the opposite side. Either a verb ("place" SPOSTA versus "lift" ALZA) or an adjective ("lateral" LATERALE versus "high" ALTO) was printed on the target. A greater influence of the verbs than of the adjectives was observed on the kinematics of the action. In particular, when the verb ALZA was printed on the object, hand-path height and vertical component of arm velocity were higher than when the adjective ALTO was presented on the object. The data support the hypothesis that the object motor representation is mainly coded in terms of possible interactions with the object.  相似文献   

4.
Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies.Although next-generation sequencing (NGS) technologies have enabled whole-genome sequencing (WGS) of many individuals to identify variation, current large-scale and cost-effective resequencing platforms produce reads of limited length (Shendure and Ji 2008; Metzker 2010); and as a result, variant identification within repeated sequences remains challenging. The 1000 Genomes Project Consortium has reported that nearly 6% of the GRCh37 human genome reference is inaccessible by short-read technologies (The 1000 Genomes Project Consortium 2012). Further studies have shown that as much as 10% of GRCh37 cannot be aligned to for the purpose of accurate variant discovery (Lee and Schatz 2012).The portion of the human genome that is currently dark to short-read technologies is significant in both its size and phenotypic effect. Recent segmental duplications (also referred to as low copy repeats), consisting of regions >5 kbp in size and >94% sequence identity, have been identified as making up 130.5 Mb, or ∼4.35% of the human genome (Bailey et al. 2002). These regions tend to be hotspots of structural and copy number variants (CNVs) (Coe et al. 2014; Chaisson et al. 2015) that in aggregate affect a larger fraction of the genome than that affected by single nucleotide polymorphisms (SNPs) (Conrad et al. 2010). CNVs have been associated with diseases such as autism (Sebat et al. 2007; Pinto et al. 2010), Crohn''s disease (Wellcome Trust Case Control Consortium et al. 2010), schizophrenia (Stefansson et al. 2008; McCarthy et al. 2009), and neurocognitive disorders (Coe et al. 2014). However, current short-read technologies are unable to identify precise nucleotide variation in these regions.In principle, longer sequencing reads provide an opportunity to disambiguate repeated sequences. Technologies such as Pacific Biosciences (PacBio) (McCarthy 2010) and Oxford Nanopore (Ashton et al. 2014) produce long reads, but at much higher per-base error rate. PacBio has been leveraged for improved bacterial reference genome assemblies (Koren et al. 2013) and for targeted de novo assembly of the complex 1.3 Mb of 17q21.31 (Huddleston et al. 2014). However, these technologies are currently substantially lower in throughput and higher in cost than short-read technologies and so cannot currently be used to cost-effectively uncover variation in repeated regions of the genome.An alternative approach used in LFR (Peters et al. 2012), CPT-seq (Amini et al. 2014), and Illumina TruSeq Synthetic Long-Reads (previously known as Moleculo) (Kuleshov et al. 2014) utilizes accurate short-read sequencing of long DNA fragments in order to obtain long-range information at high nucleotide accuracy. The Illumina TruSeq protocol is able to produce 10-kbp long reads, retaining the benefits of the highly accurate and cost-effective Illumina technology (Kuleshov et al. 2014) and enabling human genome phasing (Kuleshov et al. 2014) and de novo assembly of complex genomes (Voskoboynik et al. 2013; McCoy et al. 2014).Under Illumina''s synthetic long-read protocol, DNA sequencing libraries are prepared as follows: First, the genomic DNA is sheared into long (≥10 kbp) fragments and ligated with amplification adapters at both ends; second, these molecules are diluted into wells so that each well receives only a small fraction (1%–2%) of the genome; third, molecules are amplified, sheared into short fragments, and uniquely barcoded within each well (Kuleshov et al. 2014). The individual wells are then pooled and sequenced together. Demultiplexing the resulting reads by well barcode and aligning them to the reference genome yields clusters of short reads, which we call read clouds, each of which originated from a single long DNA molecule (Fig. 1A). Additionally, short reads that originate from the endpoints of a read cloud will overlap the original adapters ligated to the long molecules and serve as end-markers of the original long molecule.Open in a separate windowFigure 1.Read clouds (RC) and synthetic long reads (SLR) obtained by Illumina TruSeq Synthetic Long-Read sequencing. Each well initially contains long molecules that represent a small fraction of the target genome; reads from each long molecule are separated in genomic coordinates within the target genome, and therefore, clusters of such reads (read clouds) are formed with each cluster originating from one source fragment. Blue reads denote end-markers of the source fragments and may not always be present as sequenced short reads. (A) In the RC approach, long fragments from several wells wn are sequenced to a shallow depth and aligned to the reference to obtain read clouds. Pooling of reads across several read clouds allows inference of the variation in the underlying long fragments. (B) In the SLR approach, long fragments are sequenced to a much higher depth to enable de novo assembly of synthetic long reads. For the same total sequencing budget C, the RC approach covers proportionally more target genome space than the SLR approach.A read cloud approach has two key parameters for genome coverage (Fig. 1): coverage of the genome with long DNA fragments, CF, and coverage of each long fragment with short reads, CR. The total sequencing depth is then C = CF × CR. The choice of CF and CR for a given short-read sequencing budget C heavily influences the ability of the read cloud approach to accurately discover variation within a target genome. Both CF and C have to be sufficiently high; in particular, CF has to be high enough so that both haplotypes of a diploid genome are covered with a sufficient number of long fragments (Lander and Waterman 1988). The original protocol (McCoy et al. 2014) required for each well to be sequenced at a high depth (CR = 50×) in order to first de novo assemble synthetic long reads (SLR) of the original source long fragments (Fig. 1B; Voskoboynik et al. 2013). However, performing WGS with this approach requires an exorbitant amount of total sequencing in order to obtain a sufficiently high CF. For example, if CR = 50× (Voskoboynik et al. 2013) and CF = 20×, C = 50 × 20 = 1000×, or the equivalent of 33 whole human genomes sequenced at the currently standard 30× coverage.The alternative strategy to true SLR approaches is to bypass the requirement for actual assembly of the original long fragments and to minimize short-read coverage (CR ≤ 2×). This strategy allows a sufficiently high CF in order to cover a genome at a reasonable coverage budget C. Choosing CR = 1.5 and CF = 20×, C = 1.5 × 20 = 30×, would yield valuable long-range information for the same total sequencing cost as the currently standard short-read WGS approach.In this work, we present RFA (Random Field Aligner), a novel methodology that utilizes the high CF, low CR read cloud approach to confidently map short reads within repetitive regions. In RFA, we directly model the short-read generative process from source long molecules in order to capture the dependencies of short reads through the hidden source long molecules. Using this probabilistic approach, we reduce the problem of finding optimal short-read alignments to optimizing a Markov Random Field (MRF). The resulting alignments tend to cluster the mapped reads into read clouds that fit the properties of the synthetic long-read sequencing protocol. The model naturally favors alignment of a read cloud to the specific copy of a repeated sequence that minimizes the sequence variation of the read cloud to the copy.To our knowledge, RFA is the first attempt to take advantage of the long-range information present in shallow read cloud sequencing to improve the resulting short-read alignments and also to use read clouds to directly genotype an individual. Prior implementations of read clouds to provide molecular-phased genotypes for a single individual require known genotypes as input (Kitzman et al. 2011; Amini et al. 2014; Kuleshov et al. 2014) and typically align the resulting read clouds with standard short-read aligners in order to observe the allele at a known SNV within each read cloud. As genotypes are typically determined with a standard whole-genome 30× shotgun sequencing, in which a short-read workflow would be used, variants in complex regions would remain unresolved.We demonstrate the utility of our approach using shallow-sequenced read clouds (CR = 1.5×) obtained from the Illumina TruSeq synthetic long-read protocol (henceforth referred to as TruSeq read clouds to avoid confusion with the Illumina product that uses deep sequencing to assemble synthetic long reads). We tested our approach on simulated read cloud wells, on TruSeq read cloud libraries for the cell line GM12878 for which assembled synthetic long reads are also available for direct validation (Genomes Moleculo NA12878, 2014, ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/integrated_sv_map/supporting/NA12878/moleculo/), and on a high coverage cancer sample that we sequenced. Evaluation of the results confirmed that our method accurately recovers precise nucleotide variation within a significant fraction of the human genome that was previously dark to current short-read technologies. We are able to leverage the read cloud strategy to recover this variation at a fraction of the cost of the original protocol and eliminate the need for first assembling synthetic long reads.  相似文献   

5.
6.
Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, leaving most of these regions unresolved in newer genome assemblies generated primarily by next-generation sequencing hybrid approaches. Here we show that it is possible to resolve regions that are complex in a genome-wide context but simple in isolation for a fraction of the time and cost of traditional methods using long-read single molecule, real-time (SMRT) sequencing and assembly technology from Pacific Biosciences (PacBio). We sequenced and assembled BAC clones corresponding to a 1.3-Mbp complex region of chromosome 17q21.31, demonstrating 99.994% identity to Sanger assemblies of the same clones. We targeted 44 differences using Illumina sequencing and find that PacBio and Sanger assemblies share a comparable number of validated variants, albeit with different sequence context biases. Finally, we targeted a poorly assembled 766-kbp duplicated region of the chimpanzee genome and resolved the structure and organization for a fraction of the cost and time of traditional finishing approaches. Our data suggest a straightforward path for upgrading genomes to a higher quality finished state.Complete high-quality sequence assembly remains a difficult problem for the de novo assembly of genomes (Alkan et al. 2011b; Church et al. 2011; Salzberg et al. 2012). Finishing of the human and mouse genomes involved selecting large-insert BAC clones and subjecting them to capillary-based shotgun sequence and assembly (English et al. 2012). Sanger-based assembly of large-insert clones has been typically a time-consuming and expensive operation requiring the infrastructure of large genome sequencing centers and specialists focused on particular problematic or repetitive regions (Zody et al. 2008; Dennis et al. 2012; Hughes et al. 2012). Such activities can significantly improve the quality of genomes, including the discovery of missing genes and gene families. A recent effort to upgrade the mouse genome assembly, for example, resulted in the correction or addition of 2185 genes, 61% of which corresponded to lineage-specific segmental duplications (Church et al. 2009). Within the human genome, there are >900 annotated genes mapping to large segmental duplications. About half of these map to particularly problematic regions of the genome where annotation and genetic variation are poorly understood (Sudmant et al. 2010). Such genes are typically missing or misassembled in working draft assemblies of genomes. These include genes such as the SRGAP2 family, which evolved specifically in the human lineage and is thought to be important in the development of the human brain (Charrier et al. 2012; Dennis et al. 2012). Other regions (e.g., 17q21.31 inversion) show incredible structural diversity, predispose specific populations to disease, and have been the target of remarkable selection in the human lineage (Stefansson et al. 2005; Zody et al. 2008; Steinberg et al. 2012). Such structurally complex regions were not resolved within the human reference sequence until large-insert clones were recovered and completely sequenced.The widespread adoption of next-generation sequencing methods for de novo genome assemblies has complicated the assembly of repetitive sequences and their organization. Although we can generate much more sequence, the short sequence read data and inability to scaffold across repetitive structures translates into more gaps, missing data, and more incomplete reference assemblies (Alkan et al. 2011a; Salzberg et al. 2012). Due to budgetary constraints, traditional capillary-based sequencing capacity as well as genome finishing efforts have dwindled in sequencing centers leaving most of the complex regions of working draft genomes unresolved. Clone-based hierarchical approaches remain important for reducing the complexity of genomes, but even targeted sequencing of these clones using short-read data fails to completely resolve and assemble these regions due to the presence of highly identical repeat sequences common in mammalian genomes. Here, we tested the efficacy of a method developed for finishing microbial genomes (Chin et al. 2013) to a 1.3-Mbp complex region of human chromosome 17q21.31 previously sequenced and assembled using traditional Sanger-based approaches. We directly compared sequenced and assembled clones and validated differences to highlight advantages and limitations of the different technologies. We then applied the approach to a previously uncharacterized, highly duplicated region of the chimpanzee genome and show that we can rapidly resolve the structure and organization of the region using this approach.  相似文献   

7.
The persistence of large blocks of homologous synteny and a high frequency of breakpoint reuse are distinctive features of mammalian chromosomes that are not well understood in evolutionary terms. To gain a better understanding of the evolutionary forces that affect genome architecture, synteny relationships among 10 amniotes (human, chimp, macaque, rat, mouse, pig, cattle, dog, opossum, and chicken) were compared at <1 human-Mbp resolution. Homologous synteny blocks (HSBs; N = 2233) and chromosome evolutionary breakpoint regions (EBRs; N = 1064) were identified from pairwise comparisons of all genomes. Analysis of the size distribution of HSBs shared in all 10 species'' chromosomes (msHSBs) identified three (>20 Mbp) that are larger than expected by chance. Gene network analysis of msHSBs >3 human-Mbp and EBRs <1 Mbp demonstrated that msHSBs are significantly enriched for genes involved in development of the central nervous and other organ systems, whereas EBRs are enriched for genes associated with adaptive functions. In addition, we found EBRs are significantly enriched for structural variations (segmental duplications, copy number variants, and indels), retrotransposed and zinc finger genes, and single nucleotide polymorphisms. These results demonstrate that chromosome breakage in evolution is nonrandom and that HSBs and EBRs are evolving in distinctly different ways. We suggest that natural selection acts on the genome to maintain combinations of genes and their regulatory elements that are essential to fundamental processes of amniote development and biological organization. Furthermore, EBRs may be used extensively to generate new genetic variation and novel combinations of genes and regulatory elements that contribute to adaptive phenotypes.The modern evolutionary synthesis attempts to explain Darwinian concepts of “descent with modification” and “natural selection” by applying quantitative methods to describe the behavior of chromosomes, genes, and their variants in populations of organisms. However, such methods, which focus primarily on variations in nucleic acids and proteins, have failed to adequately explain phenotypes found in nature. By largely overlooking the importance of chromosomes, a dynamic and pervasive feature of biology and the ultimate purveyor of genetic information, evolutionary science may have missed a key component of the mechanism for generating phenotypic variation used by natural selection. As such, an unresolved issue in evolutionary biology is whether chromosome rearrangements associated with speciation have adaptive value or are evolutionarily neutral (Ohno 1973; Ayala and Coluzzi 2005). The “chromosomal speciation” model posits that chromosome rearrangements contribute to reproductive isolation between geographically separated populations and promulgate speciation (Ayala and Coluzzi 2005). For example, a reciprocal translocation in yeast that is associated with resistance to sulfite concentrations was shown to be adaptive (Pérez-Ortín et al. 2002), whereas in insects (for review, see Ayala and Coluzzi 2005), chromosome inversions lead to reproductive isolation and thus contribute to speciation (Noor et al. 2001). However, reports supporting the chromosomal speciation model in higher taxa have been controversial (Lu et al. 2003; Navarro and Barton 2003). It is now possible to address this problem in vertebrate genomes from a different perspective because of the recent advances in comparative genomics, data visualization, and DNA sequence availability (Murphy et al. 2005; Ma et al. 2006).An important theoretical insight into how chromosomes evolve was made by Nadeau and Taylor (1984), who proposed that chromosome breakage in evolution is random. This model of genome evolution was supported by the size distribution of synteny blocks found shared in the human and mouse genomes (Nadeau and Taylor 1984). However, like meiotic recombination, the random breakage model turned out to be a generalization that did not hold up when comparative genome organization was examined in finer detail by using direct DNA sequence comparisons (Pevzner and Tesler 2003) and high resolution chromosome (Larkin et al. 2003) or whole genome (Murphy et al. 2005) maps. These studies revealed that many sites where interchromosomal and intrachromosomal breakages occur in evolution are “reused,” which led to a new “fragile site” breakage model of chromosome evolution. For identification of breakpoint reuse, Larkin et al. (2003) and later Murphy and coworkers (2005) used empirical evidence, i.e., direct identification and counting of overlapping breakpoint regions in multigenome synteny-based comparisons, whereas Pevzner and Tesler (2003) used an algorithmic approach that identified an excess of small synteny blocks that could be explained by breakpoint reuse. Although there has been debate in the literature concerning the algorithmic approach, and whether reuse is nonrandom (Sankoff and Trinh 2005; Peng et al. 2006), the verification of breakpoint reuse by direct observation leaves no doubt as to its validity. Whether breakpoint reuse is nonrandom or due to chance is more controversial because resolution of the comparisons and how data are analyzed will affect the results. For example, resolution of breakpoints at the nucleotide level will produce a very different reuse frequency than resolution at the megabase level as defined by either synteny or sequence-only approaches. Furthermore, relatively high resolution maps are necessary to avoid the problem of “breakpoint chaining” that can produce more overlaps and thus apparent reuse in multigenome comparisons (Murphy et al. 2005). However, with either low or high resolution comparisons, there is no question that the organization of chromosomes in extant species is due at least in part to the independent occurrence of breakpoints at the same chromosomal sites in different vertebrate lineages.This leads to an obvious question: Are there defining DNA sequence or chromosome features that might account for breakpoint use and reuse in chromosome evolution? It was shown that evolutionary breakpoint regions (EBRs) in chromosomes are gene-rich (Everts-van der Wind et al. 2004, 2005; Ma et al. 2006), are associated with the repositioning of centromeres and telomeres, and contain a higher than expected frequency of segmental duplications, among other features (Murphy et al. 2005; Bulazel et al. 2007). Evolutionary breakpoint regions are also frequently associated with chromosome fragile sites (Ruiz-Herrera et al. 2006) and chromosome rearrangements frequently found in certain cancers (Murphy et al. 2005; Darai-Ramqvist et al. 2008). The high frequency of segmental duplications and/or repetitive elements in EBRs (Bailey et al. 2004; Murphy et al. 2005; Schibler et al. 2006) specific to different lineages of mammals led to the hypothesis that EBRs are evolutionarily unstable regions that promote chromosome rearrangements by nonallelic homologous recombination (Murphy et al. 2005). These studies provided the first evidence for the distinguishing features of EBRs while suggesting a mechanism for use and reuse of specific sites in chromosome evolution. However, a comprehensive analysis of sequence features and functions of genes in EBRs compared with homologous synteny blocks (HSBs), i.e., regions of shared synteny between two or more genomes, is lacking. A better understanding of these features can help to explain not only processes related to chromosome evolution, e.g., whether breakpoint reuse is random or nonrandom, but also factors that are necessary or predisposing to many human and animal diseases.The relationship of EBRs to various sequence features associated with evolutionary processes, as well as the evidence cited above for the chromosome speciation model, has stimulated a growing interest in chromosomal evolution and its relationship to phenotypic adaptation and diseases. In the present study, genomic resources, data visualization, and annotation tools were used to identify, taxonomically classify, and compare the functional gene content of HSBs and EBRs in genomes of 10 amniote species separated by more than 300 Myr of evolution. These comparisons permitted a first examination of the relationship between chromosome organization, genome rearrangements, and natural selection.  相似文献   

8.
Summary Bovine rotavirus (brv) possessing a rearranged genome [Hundley et al. (1985), Virology 143: 88–103] was found to reassort with tissue culture-adapted group A human rotavirus carrying a standard genome. The rearranged part of the brv genome containing segment 5-specific sequences was exchanged with the normal RNA segment 5 of the human rotavirus in some of the reassortants.  相似文献   

9.
10.
11.
Recombination enables reciprocal exchange of genomic information between parental chromosomes and successful segregation of homologous chromosomes during meiosis. Errors in this process lead to negative health outcomes, whereas variability in recombination rate affects genome evolution. In mammals, most crossovers occur in hotspots defined by PRDM9 motifs, although PRDM9 binding peaks are not all equally hot. We hypothesize that dynamic patterns of meiotic genome folding are linked to recombination activity. We apply an integrative bioinformatics approach to analyze how three-dimensional (3D) chromosomal organization during meiosis relates to rates of double-strand-break (DSB) and crossover (CO) formation at PRDM9 binding peaks. We show that active, spatially accessible genomic regions during meiotic prophase are associated with DSB-favored loci, which further adopt a transient locally active configuration in early prophase. Conversely, crossover formation is depleted among DSBs in spatially accessible regions during meiotic prophase, particularly within gene bodies. We also find evidence that active chromatin regions have smaller average loop sizes in mammalian meiosis. Collectively, these findings establish that differences in chromatin architecture along chromosomal axes are associated with variable recombination activity. We propose an updated framework describing how 3D organization of brush-loop chromosomes during meiosis may modulate recombination.

The formation of crossovers during meiotic recombination is a highly orchestrated process, enhancing genetic diversity by allowing reciprocal exchange of genomic information to occur between parental chromosomes. Crossover formation also promotes proper segregation of homologous chromosomes (Baker et al. 1976), and errors in this process lead to chromosomal abnormalities such as aneuploidy, which are associated with negative health outcomes (Petronis 1999; Potapova and Gorbsky 2017). In mammals, crossovers are highly enriched (100-fold) in discrete ∼1- to 2-kb stretches along the genome, termed recombination hotspots (Paigen and Petkov 2010). These hotspots are in large part determined by the binding of PRDM9, a meiosis-specific zinc-finger protein that marks loci for potential recombination (Baudat et al. 2010; Myers et al. 2010; Parvanov et al. 2010).Although hotspot initiation is dependent on PRDM9, subsequent DSB and crossover formation are highly stochastic. Although exact numbers vary by species, a mammalian chromosome may harbor hundreds of PRDM9 binding loci, but during a typical meiotic cycle, only 10–20 double-stranded breaks (DSBs) occur (Diagouraga et al. 2018) per chromosome. Out of these DSBs, most are repaired as noncrossover conversion events, and only one or two per chromosome are chosen for crossover formation in mice (Baudat and de Massy 2007; Li et al. 2019). Local chromatin features such as GC content, histone modification, and cofactor binding are known to impact DSB formation at hotspots (Walker et al. 2015; Yamada et al. 2017), whereas nucleosome occupancy, GC content, and chromosomal position are associated with crossover formation (Hinch et al. 2019). Still, a full understanding of why certain hotspots are favored to form DSBs and crossovers remains undetermined.Meiotic chromosomes adopt a brush-loop conformation characterized by chromatin loops attached to a central axis (Møens and Pearlman 1988). Although recombination hotspots are found within loops, DSB machinery, such as DNA-repair proteins, resides on the axis (Blat et al. 2002; Grey et al. 2018; Tock and Henderson 2018; Slotman et al. 2020). This “tethered-loop/axis complex” model of recombination suggests that 3D genome folding could place constraints on the recombination process. Here we apply computational analyses to investigate how 3D chromatin organization relates to PRDM9 binding, DSBs, and crossover formation in male mammalian meiosis. Our analyses aim to integrate observations from multiple recent interphase and meiosis data sets measuring recombination activity and chromatin organization, including Hi-C, leading to an updated framework of how meiotic events related to recombination are associated with brush-loop chromosomal architecture.  相似文献   

12.
In diploid populations of size N, there will be 2 N mutations per nucleotide (nt) site (or per locus) per generation ( stands for mutation rate). If either the population or the coding genome double in size, one expects 4 N mutations. What is important is not the population size per se but the number of genes (coding sites), the two being often interconverted. Here we compared the total physical length of protein-coding genomes (n) with the corresponding absolute rates of synonymous substitution (KS), an empirical neutral reference. In the classical occupancy problem and in the coupons collector (CC) problem, n was expressed as the mean rate of change (KCC). Despite inherently very low power of the approaches involving averaging of rates, the mode of molecular evolution of the total size phenotype of the coding genome could be evidenced through differences between the genomic estimates of KCC [KCC=1/(ln n + 0.57721) n] and rate of molecular evolution, KS. We found that (1) the estimates of n and KS are reciprocally correlated across taxa (r=0.812; p 0.001); (2) the gamete-cell division hypothesis (Chang et al. Proc Natl Acad Sci USA 91:827–831, 1994) can be confirmed independently in terms of KCC/KS ratios; (3) the time scale of molecular evolution changes with change in mutation rate, as previously shown by Takahata (Proc Natl Acad Sci USA 87:2419–2423, 1990), Takahata et al. (Genetics 130:925–938, 1992), and Vekemans and Slatkin (Genetics 137:1157–1165, 1994); (4) the generation time and population size (Lynch and Conery, Science 302:1401–1404, 2003) effects left their signatures at the level of the size phenotype of the protein-coding genome.  相似文献   

13.
14.
Detection of DNA copy number aberrations by shallow whole-genome sequencing (WGS) faces many challenges, including lack of completion and errors in the human reference genome, repetitive sequences, polymorphisms, variable sample quality, and biases in the sequencing procedures. Formalin-fixed paraffin-embedded (FFPE) archival material, the analysis of which is important for studies of cancer, presents particular analytical difficulties due to degradation of the DNA and frequent lack of matched reference samples. We present a robust, cost-effective WGS method for DNA copy number analysis that addresses these challenges more successfully than currently available procedures. In practice, very useful profiles can be obtained with ∼0.1× genome coverage. We improve on previous methods by first implementing a combined correction for sequence mappability and GC content, and second, by applying this procedure to sequence data from the 1000 Genomes Project in order to develop a blacklist of problematic genome regions. A small subset of these blacklisted regions was previously identified by ENCODE, but the vast majority are novel unappreciated problematic regions. Our procedures are implemented in a pipeline called QDNAseq. We have analyzed over 1000 samples, most of which were obtained from the fixed tissue archives of more than 25 institutions. We demonstrate that for most samples our sequencing and analysis procedures yield genome profiles with noise levels near the statistical limit imposed by read counting. The described procedures also provide better correction of artifacts introduced by low DNA quality than prior approaches and better copy number data than high-resolution microarrays at a substantially lower cost.Alteration in chromosomal copy number is one of the main mechanisms by which cancerous cells acquire their hallmark characteristics (Pinkel et al. 1998; Hanahan and Weinberg 2011). For > 20 yr, these alterations have been routinely detected first by genome-wide comparative genomic hybridization (CGH) (Kallioniemi et al. 1992) and subsequently by array-based CGH (Snijders et al. 2001) or single nucleotide polymorphism (SNP) arrays (Ylstra et al. 2006). Now whole-genome sequencing (WGS) offers an alternative to microarrays for many genome analysis applications, including copy number detection.Several methods have been developed to estimate DNA copy number from WGS data. They can be grouped into the following four categories, each of which has its own set of requirements, strengths, and weaknesses (Teo et al. 2012): (1) Assembly-based methods construct the genome piece by piece from the sequence reads instead of aligning them to a known reference; these methods have the greatest sensitivity to detect deviations from the reference genome, including copy number changes and genome rearrangements, but require high sequence coverage (typically 40×) (Li et al. 2010) and therefore incur high cost; (2) split-read and (3) read-pair methods map sequence reads from both ends of size-fractionated genomic DNA molecules onto the reference genome; these methods can provide information on copy number and genome rearrangements, but they impose requirements on molecule sizes and therefore are highly sensitive to DNA integrity; and (4) depth of coverage (DOC) methods infer copy number from the observed sequence depth across the genome and do not require both ends of the molecule to be sequenced.Archival tissue is an invaluable resource for biomarker detection studies (Casparie et al. 2007). Projects investigating cancers with long survival, such as diffuse low-grade gliomas (LGGs) with a subset of patients surviving > 25 yr after diagnosis (van Thuijl et al. 2012), require long-term clinical follow-up. Archival FFPE tissue is often the only source of material for study (Blow 2007). The use of such samples has been challenging due to poor DNA quality; hence, array CGH results, for example, have been variable (Mc Sherry et al. 2007; Hostetter et al. 2010; Krijgsman et al. 2012; Warren et al. 2012). To make large archival sample series accessible for genome research, a robust technique is required that performs well on diverse sample types, with high resolution, quality and reproducibility, and at low cost without the necessity for a (matched) normal sample. Here we focus exclusively on DOC methods, because they are theoretically most compatible with DNA isolated from FFPE material.Typically, DOC methods for copy number divide the reference genome into bins and count the number of reads in each, although there are also bin-free intensity-based implementations (Shen and Zhang 2012). Copy number is then inferred from the observed read counts across the genome. To compensate for technological bias, many DOC algorithms, such as CNV-seq (Xie and Tammi 2009), SegSeq (Chiang et al. 2009), BIC-seq (Xi et al. 2011), and CNAnorm (Gusnanto et al. 2012), compare tumor signal to a normal reference signal, similar to array CGH. Commonly, a pool of different individuals is used as a normal reference DNA. In many applications, including cancer genome analysis, matched normal DNA from the same patient is preferable to avoid detection of germline copy number variants (Feuk et al. 2006), allowing focus solely on somatic aberrations (Perry et al. 2008).Two DOC methods, readDepth (Miller et al. 2011) and FREEC (Boeva et al. 2011), do not require a reference signal. This has three principal advantages: the cost is reduced by half, archival material for which matched normal reference tissue is unavailable (most cases) can be analyzed, and measurement noise from the reference sample is avoided. Achieving these benefits requires accurate computational correction for biases in the DOC sequence data since they are no longer being normalized by comparison with data from a matched reference specimen.Here we describe a multiplexed, single-read (SR), shallow WGS procedure based on the Illumina platform that produces improved DOC copy number profiles. Because DOC profiles are fundamentally based on counting the number of sequence reads, the minimum achievable noise can be easily calculated. We show that a larger proportion (most) of the samples we have analyzed with our procedures show noise levels at the theoretical minimum than with other analysis methods. We achieve the improved performance by simultaneous (rather than sequential) correction of primary read counts for sequence mappability and GC content, and by using a comprehensive empirical approach for recognition and filtering of problematic genome regions. We also show that compared to previous shallow WGS analysis procedures, our approach provides improved correction of spurious localized profile variations, which are presumably due to sample quality problems; and microarray analysis costs more and yields a poorer signal-to-noise ratio than shallow WGS. Thus our DOC profiles provide a more accurate representation of the genome copy number structure than can be obtained by other approaches and should allow segmentation and calling algorithms to more sensitively recognize true aberrations.  相似文献   

15.
16.
17.
18.
Hybrid zones can be valuable tools for studying evolution and identifying genomic regions responsible for adaptive divergence and underlying phenotypic variation. Hybrid zones between subspecies of Heliconius butterflies can be very narrow and are maintained by strong selection acting on color pattern. The comimetic species, H. erato and H. melpomene, have parallel hybrid zones in which both species undergo a change from one color pattern form to another. We use restriction-associated DNA sequencing to obtain several thousand genome-wide sequence markers and use these to analyze patterns of population divergence across two pairs of parallel hybrid zones in Peru and Ecuador. We compare two approaches for analysis of this type of data—alignment to a reference genome and de novo assembly—and find that alignment gives the best results for species both closely (H. melpomene) and distantly (H. erato, ∼15% divergent) related to the reference sequence. Our results confirm that the color pattern controlling loci account for the majority of divergent regions across the genome, but we also detect other divergent regions apparently unlinked to color pattern differences. We also use association mapping to identify previously unmapped color pattern loci, in particular the Ro locus. Finally, we identify a new cryptic population of H. timareta in Ecuador, which occurs at relatively low altitude and is mimetic with H. melpomene malleti.Natural hybrid zones occur where divergent forms meet, mate, and hybridize. Narrow hybrid zones can be maintained by strong selection that prevents mixing or favors particular forms in particular areas (Barton and Hewitt 1985). Studies of hybrid zones have provided many insights into the origins of diversity and the process of speciation (Mallet et al. 1990; Harrison 1993; Kawakami and Butlin 2001). High-throughput sequencing technologies now provide the opportunity for hybrid zones to fully meet their potential as windows into the evolutionary process by allowing us to move beyond studies of neutral variation at a handful of loci and identify the genetic loci under selection (Rieseberg and Buerkle 2002; Gompert et al. 2012; Crawford and Nielsen 2013).Butterflies of the Neotropical genus Heliconius are extremely diverse in their wing color patterns and combine within species diversity with convergence among species in wing phenotypes. Their bright wing patterns are used as aposematic warnings to predators and are under positive frequency-dependent selection favoring common color patterns that predators learn to avoid. This strong selection also maintains narrow hybrid zones between subspecies with different patterns (Benson 1972; Mallet and Barton 1989a; Kapan 2001; Langham 2004). In addition, frequency-dependent selection leads to Müllerian mimicry between many distinct species (Müller 1879). For instance, H. erato and H. melpomene are two distantly related species that diverged ∼15–20 million years ago, but have converged on common color patterns across most of the Neotropics. Divergent races of both species meet in parallel hybrid zones (Fig. 1). Evidence suggests that convergent color patterns in these two species have evolved independently (Hines et al. 2011; Supple et al. 2013). It has also been suggested that H. erato is more ancient and H. melpomene diversified more recently to mimic the H. erato forms (Brower 1996; Flanagan et al. 2004; Quek et al. 2010). Nevertheless, it appears that the same handful of genetic loci are responsible for producing most of the color pattern variation in both species (Joron et al. 2006; Baxter et al. 2008; Reed et al. 2011; Martin et al. 2012). This pattern of parallel adaptive radiation makes Heliconius an excellent system in which to address the predictability of the evolutionary process and the extent to which particular genes are re-used when evolving the same phenotypes (Papa et al. 2008a; Nadeau and Jiggins 2010).Open in a separate windowFigure 1.(A) Distribution in South America of the subspecies included in this study. (B) Maximum likelihood phylogenies with approximate likelihood branch supports. Co-mimics from outside the focal hybrid zones are connected with dotted lines. Focal hybrid zone individuals are shown in color. (Blue) H. m. plesseni and H. e. notabilis; (purple) Ecuador hybrids; (dark red) H. m. malleti and H. e. lativitta; (red) H. m. aglaope and H. e. emma; (orange) Peru hybrids; (yellow) H. m. amaryllis and H. e. favorinus. Additional populations are in black. Country abbreviations: (Ec) Ecuador; (FG) French Guiana; (Co) Colombia; (Pa) Panama.In this study, we use high-resolution genome scans to investigate patterns of divergence across two pairs of parallel hybrid zones in Peru and Ecuador. These occur between subspecies with different wing color patterns in both H. erato and H. melpomene (Fig. 1). In both regions, the clines in color pattern alleles between species are highly coincident (Mallet et al. 1990; Salazar 2012). The two hybrid zones in Peru have been the focus of several previous studies, whereas those in Ecuador have been less well studied. In Peru, strong natural selection has been shown to maintain color pattern differences (Mallet and Barton 1989a) and loci controlling color patterns show enhanced divergence (Baxter et al. 2010; Counterman et al. 2010; Nadeau et al. 2012; Martin et al. 2013; Supple et al. 2013). However, we still lack a complete picture of how many loci are divergent between subspecies and the extent to which the genomic architecture of divergence is the same between mimetic species.Extensive genetic mapping using experimental crosses between different color pattern forms has identified the chromosomal regions responsible for color pattern variation (Sheppard et al. 1985; Joron et al. 2006; Baxter et al. 2008; Papa et al. 2013). Three major clusters of loci control most of the color pattern variation observed in both species. The tightly linked B and D loci on chromosome 18 in H. melpomene control the red forewing band, and the red/orange hindwing rays and proximal “dennis” patches on both wings, respectively. These loci are homologous to the D locus in H. erato (Baxter et al. 2008) and appear to be cis regulatory elements of the optix gene (Reed et al. 2011; Supple et al. 2013). The Ac and Sd loci, in H. melpomene and H. erato, respectively, control the shape of the forewing band via regulation of the WntA gene on chromosome 10 (Martin et al. 2012). The presence of most yellow and white elements on the wing is largely controlled by three tightly linked loci, Yb, Sb, and N, on chromosome 15 in H. melpomene (Ferguson et al. 2010), which are homologous to the Cr locus in H. erato (Joron et al. 2006). Quantitative trait locus (QTL) mapping has identified other loci of minor effect, including at least seven additional QTL in H. erato (Papa et al. 2013), and QTL in H. melpomene on chromosomes 2, 7, and 13 that affect forewing band shape (Baxter et al. 2008). In some cases, mapping studies have been followed up by population genetic studies of the mapped intervals across natural hybrid zones, where many generations of backcrossing have led to narrow regions of association, permitting fine scale mapping (Baxter et al. 2010; Counterman et al. 2010; Nadeau et al. 2012; Supple et al. 2013). High-throughput sequencing technologies now provide the feasibility to generate a high density of genomic markers to identify the narrow QTL present in these hybrid zones without the need to perform controlled laboratory crosses (Crawford and Nielsen 2013). Here we test this approach, using a system in which some of the loci responsible for phenotypic differences are known.The Peru and Ecuador hybrid zones occur across altitudinal gradients (Fig. 2A). Therefore, it is possible that traits other than color pattern may also be differentiated by altitudinal selection; for example, traits related to temperature or changes in larval host plants. Such selection on additional regions of the genome could help to stabilize the geographic location of the hybrid zone (Barton and Hewitt 1985; Mallet and Barton 1989b; Mallet 2010; Bierne et al. 2011). Therefore another important question that we will address is whether there are divergent regions of the genome that are not controlling color pattern. These might be candidates for loci controlling other aspects of ecological adaptation.Open in a separate windowFigure 2.Population structure at each of the hybrid zones using the reference aligned data. (A) Sampling locations with altitude in meters, sample size in parentheses, and pie charts of the proportion of individuals of each type sampled from each site. Colors are the same as in Figure 1, except black indicates H. timareta in Ecuador. (B) Structure analysis with k = 2 (H. timareta individuals excluded). Each individual is shown as a horizontal bar with the allelic contribution from population 1 (gray) and population 2 (black). (C) Principal components analysis. (D) Distribution of FST values from BayeScan.In this study we use restriction-associated DNA (RAD) sequencing (Baird et al. 2008) to determine, for the first time:
  1. If association mapping in these hybrid zones can identify known and novel loci underlying phenotypic variation;
  2. How much of the genome is differentiated and under divergent selection between subspecies;
  3. How much of this differentiation is due to loci controlling color pattern variation;
  4. If the same regions are divergent between co-mimetic species.
Although previous studies have touched on questions 2 and 3 (Kronforst 2013; Martin et al. 2013), here we focus on divergence at the subspecies level where hybridization is frequent, rather than between occasionally hybridizing species. Compared to the study by Martin et al. (2013), we explored additional hybrid zones (Ecuador) and species (H. erato) using larger sample sizes, which allowed more robust tests to identify genomic regions under divergent selection. We also investigate the advantages and limitations of alignment and assembly methods when only a single reference genome is available. We compare two widely used approaches: de novo assembly of just the restriction-associated reads, using the program Stacks (Catchen et al. 2011), versus alignment of paired-end reads to the reference H. melpomene genome.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号