首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The Atlas genome assembly system   总被引:11,自引:0,他引:11       下载免费PDF全文
Atlas is a suite of programs developed for assembly of genomes by a "combined approach" that uses DNA sequence reads from both BACs and whole-genome shotgun (WGS) libraries. The BAC clones afford advantages of localized assembly with reduced computational load, and provide a robust method for dealing with repeated sequences. Inclusion of WGS sequences facilitates use of different clone insert sizes and reduces data production costs. A core function of Atlas software is recruitment of WGS sequences into appropriate BACs based on sequence overlaps. Because construction of consensus sequences is from local assembly of these reads, only small (<0.1%) units of the genome are assembled at a time. Once assembled, each BAC is used to derive a genomic layout. This "sequence-based" growth of the genome map has greater precision than with non-sequence-based methods. Use of BACs allows correction of artifacts due to repeats at each stage of the process. This is aided by ancillary data such as BAC fingerprint, other genomic maps, and syntenic relations with other genomes. Atlas was used to assemble a draft DNA sequence of the rat genome; its major components including overlapper and split-scaffold are also being used in pure WGS projects.  相似文献   

2.
Physical map-assisted whole-genome shotgun sequence assemblies   总被引:2,自引:0,他引:2       下载免费PDF全文
We describe a targeted approach to improve the contiguity of whole-genome shotgun sequence (WGS) assemblies at run-time, using information from Bacterial Artificial Chromosome (BAC)-based physical maps. Clone sizes and overlaps derived from clone fingerprints are used for the calculation of length constraints between any two BAC neighbors sharing 40% of their size. These constraints are used to promote the linkage and guide the arrangement of sequence contigs within a sequence scaffold at the layout phase of WGS assemblies. This process is facilitated by FASSI, a stand-alone application that calculates BAC end and BAC overlap length constraints from clone fingerprint map contigs created by the FPC package. FASSI is designed to work with the assembly tool PCAP, but its output can be formatted to work with other WGS assembly algorithms able to use length constraints for individual clones. The FASSI method is simple to implement, potentially cost-effective, and has resulted in the increase of scaffold contiguity for both the Drosophila melanogaster and Cryptococcus gattii genomes when compared to a control assembly without map-derived constraints. A 6.5-fold coverage draft DNA sequence of the Pan troglodytes (chimpanzee) genome was assembled using map-derived constraints and resulted in a 26.1% increase in scaffold contiguity.  相似文献   

3.
Second-generation sequencing technology can now be used to sequence an entire human genome in a matter of days and at low cost. Sequence read lengths, initially very short, have rapidly increased since the technology first appeared, and we now are seeing a growing number of efforts to sequence large genomes de novo from these short reads. In this Perspective, we describe the issues associated with short-read assembly, the different types of data produced by second-gen sequencers, and the latest assembly algorithms designed for these data. We also review the genomes that have been assembled recently from short reads and make recommendations for sequencing strategies that will yield a high-quality assembly.As genome sequencing technology has evolved, methods for assembling genomes have changed with it. Genome sequencers have never been able to “read” more than a relatively short stretch of DNA at once, with read lengths gradually increasing over time. Reconstructing a complete genome from a set of reads requires an assembly program, and a variety of genome assemblers have been used for this task. In 1995, when the first bacterial genome was published (Haemophilus influenzae), read lengths were ∼460 base pairs (bp), and that whole-genome shotgun (WGS) sequencing project generated 24,304 reads (Fleischmann et al. 1995). The human genome project required ∼30 million reads, with lengths up to 800 bp, using Sanger sequencing technology and automated capillary sequencers (International Human Genome Sequencing Consortium 2001; Venter et al. 2001). This corresponded to 24 billion bases (Gb), or approximately eightfold coverage of the 3-Gb human genome. Redundant coverage, in which on average every nucleotide is sequenced many times over, is required to produce a high-quality assembly. Another benefit of redundancy is greatly increased accuracy compared with a single read: Where a single read might have an error rate of 1%, eightfold coverage has an error rate as low as 10−16 when eight high-quality reads agree with one another. High coverage is also necessary to sequence polymorphic alleles within diploid or polyploid genomes.Current second-generation sequencing (SGS) technologies produce read lengths ranging from 35 to 400 bp, at far greater speed and much lower cost than Sanger sequencing. However, as reads get shorter, coverage needs to increase to compensate for the decreased connectivity and produce a comparable assembly. Certain problems cannot be overcome by deeper coverage: If a repetitive sequence is longer than a read, then coverage alone will never compensate, and all copies of that sequence will produce gaps in the assembly. These gaps can be spanned by paired reads—consisting of two reads generated from a single fragment of DNA and separated by a known distance—as long as the pair separation distance is longer than the repeat. Paired-end sequencing is available from most of the SGS machines, although it is not yet as flexible or as reliable as paired-end sequencing using traditional methods.After the successful assembly of the human (International Human Genome Sequencing Consortium 2001; Venter et al. 2001) and mouse (Waterston et al. 2002) genomes by whole-genome shotgun sequencing, most large-scale genome projects quickly moved to adopt the WGS approach, which has subsequently been used for dozens of eukaryotic genomes. Today, thanks to changes in sequencing technology, a major question confronting genome projects is, can we sequence a large genome (>100 Mbp) using short reads? If so, what are the limitations on read length, coverage, and error rates? How much paired-end sequencing is necessary? And what will the assembly look like? In this perspective we take a look at each of these questions and describe the solutions available today. Although we provide some answers, we have no doubt that the solutions will change rapidly over the next few years, as both the sequencing methods and the computational solutions improve.  相似文献   

4.
《Genome research》2015,25(3):445-458
Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads.The genome sequence of the fruit fly Drosophila melanogaster was first reported in 2000 (Adams et al. 2000). This sequence assembly, designated Release 1, represented the single-copy fraction of the genome in 116.2 megabases (Mb) of sequence in 134 large mapped scaffolds containing 1299 sequence gaps and an additional 3.8 Mb in 704 small (<64 kb) unmapped scaffolds. Release 1 was produced by combining a de novo whole-genome shotgun (WGS) sequence assembly, designated WGS1 (Myers et al. 2000), with sequences of mapped BAC and P1 genomic clones, including 29.7 Mb of finished sequences and draft sequences of a tiling path of BAC and P1 clones spanning the euchromatic portion of the genome (Adams et al. 2000).WGS1 and Release 1 were validated by comparison to the available finished genomic sequences and to a BAC-based physical map of the major autosomes (Hoskins et al. 2000).WGS1 was the first shotgun assembly of a eukaryotic genome and served as a model for sequencing mammalian genomes (Venter et al. 2001; Stark et al. 2007). WGS remains the method of choice in genome sequencing because it is rapid and efficient. However, because eukaryotic genomes typically contain a large fraction of repetitive sequences with complex structures, current WGS sequencing strategies produce fragmented assemblies in which the location, order, and orientation of sequence scaffolds along the chromosomes are poorly determined. Furthermore, tandem and dispersed repetitive sequences including gene families, pseudogenes, transposable elements (TEs), segmental duplications, and simple sequence repeats are poorly represented. This leads to misassembled regions, unmapped regions, and numerous gaps, particularly in heterochromatic regions which often span many megabases of the genome and include vital protein-coding genes and other essential loci. Therefore, physical mapping, cytogenetic mapping, and sequence finishing to improve genome sequence assemblies remain a priority, especially for human (International Human Genome Sequencing Consortium 2004) and model organisms of particular importance in biomedical research.Because D. melanogaster is a widely used research organism, we have continued to improve the reference genome sequence. Late in 2000, the Release 2 sequence corrected the order and orientation of a few small sequence scaffolds and filled a few hundred small sequence gaps. In 2002, we reported BAC-based finishing of 116.9 Mb of genome sequence in 13 scaffolds spanning the euchromatic portions of the six chromosome arms (Celniker et al. 2002) and an improved WGS assembly (WGS3) including 20.7 Mb of draft-quality sequence in larger scaffolds in the heterochromatic portion of the genome (Celniker et al. 2002; Hoskins et al. 2002). This Release 3 assembly had high sequence accuracy (estimated error rate < 1 in 100,000) and contiguity (37 sequence gaps; seven physical map gaps) in the euchromatic portion of the assembly, and the order and orientation of sequences within the assembly was confirmed by in situ hybridization of 915 BACs to salivary gland polytene chromosomes, representing 96% of the BACs in a tiling path spanning the euchromatic portion of the assembly (Hoskins et al. 2000; Celniker et al. 2002). The euchromatic sequence went through two unpublished revisions in 2004 and 2006 (Releases 4 and 5; http://www.fruitfly.org) to further improve accuracy and completeness. In 2007, we reported on further physical and cytogenetic mapping, and sequence finishing of 15 Mb in the heterochromatic portion of the genome, including essentially all single-copy regions (Hoskins et al. 2007). However, gaps and assembly errors remained due to the difficulties of mapping and finishing in repeat-rich regions. The remaining physical map gaps resulted from the absence of genomic regions from BAC libraries, likely due to incompatibility with molecular cloning or clone instability in E. coli. Sequence gaps within clone-based assemblies resulted from failure of assembly in complex nested repetitive regions. The remaining sequence assembly errors were due to incorrect but self-consistent clone-based sequence assemblies or clone rearrangements. Particularly in heterochromatin, errors in the physical and cytogenetic maps existed due to the presence of repeat-rich sequences.Despite impressive developments in high-throughput sequencing technology, the production of high-quality finished genome sequences has remained laborious and inefficient. Furthermore, highly repeat-rich genomic regions such as those in centric heterochromatin have remained inaccessible to mapping, sequencing, and assembly. We define the “centric heterochromatin” as the repeat-rich sequences found at the functional centromeres (Sun et al. 2003). “Pericentric heterochromatin” refers to the Mb-scale regions that flank the centromeres and contain large blocks of satellite DNA and other simple-sequence repeats (Supplemental Fig. S1) interspersed with large regions of transposable-element and other middle-repetitive sequences and including essential protein-coding genes. “Telomeric heterochromatin” refers to the subtelomeric regions composed of tandem repeats (Mason and Villasante 2014) and the arrays of telomeric retrotransposons at the most distal chromosome ends (Abad et al. 2004b). By these definitions, the Y chromosome is composed entirely of centric, pericentric, and telomeric heterochromatin.Here, we report the Release 6 assembly of the D. melanogaster reference genome sequence. Much of the improvement in the sequence is in the mapping, finishing, and assembly of repeat-rich regions in the heterochromatic portions of the genome. Release 6 incorporates (1) additional BAC-based cytogenetic mapping of previously unmapped, unordered, and unoriented sequence scaffolds by fluorescent in situ hybridization (FISH) to mitotic and polytene chromosomes, (2) BAC-based sequence finishing of clones spanning the remainder of the genome physical map guided by comparison to high-resolution BAC restriction fingerprints, and sequence finishing of 10-kb genomic plasmid clones spanning the remainder of the WGS3 assembly, (3) use of cDNA sequences to order and orient scaffolds, (4) incorporation of map and sequence data from other sources, and (5) validation of the sequence assembly by comparison to a whole-genome optical restriction map (Zhou et al. 2007). The resulting genome sequence assembly is a substantially improved reference that spans 143.9 Mb and represents the practical limit of established technologies. Relative to Release 5, Release 6 closes 628 gaps, extends the chromosome arm assemblies into telomeric and pericentric heterochromatin by 5.4 Mb, and increases the Y chromosome assembly 10-fold from ∼242 kb to 3.4 Mb. Further substantial improvement to the reference genome sequence will require new technologies that do not depend on standard molecular cloning. Emerging very-long-read WGS sequencing and assembly technologies will permit efficient production of more complete genome sequences for D. melanogaster and other species.  相似文献   

5.
CLONEPICKER is a software pipeline that integrates sequence data with BAC clone fingerprints to dynamically select a minimal overlapping clone set covering the whole genome. In the Rat Genome Sequencing Project (RGSP), a hybrid strategy of "clone by clone" and "whole genome shotgun" approaches was used to maximize the merits of both approaches. Like the "clone by clone" method, one key challenge for this strategy was to select a low-redundancy clone set that covered the whole genome while the sequencing is in progress. The CLONEPICKER pipeline met this challenge using restriction enzyme fingerprint data, BAC end sequence data, and sequences generated from individual BAC clones as well as WGS reads. In the RGSP, an average of 7.5 clones was identified from each side of a seed clone, and the minimal overlapping clones were reliably selected. Combined with the assembled BAC fingerprint map, a set of BAC clones that covered >97% of the genome was identified and used in the RGSP.  相似文献   

6.
Analysis of segmental duplications and genome assembly in the mouse   总被引:7,自引:2,他引:7       下载免费PDF全文
Limited comparative studies suggest that the human genome is particularly enriched for recent segmental duplications. The extent of segmental duplications in other mammalian genomes is unknown and confounded by methodological differences in genome assembly. Here, we present a detailed analysis of recent duplication content within the mouse genome using a whole-genome assembly comparison method and a novel assembly independent method, designed to take advantage of the reduced allelic variation of the C57BL/6J strain. We conservatively estimate that approximately 57% of all highly identical segmental duplications (>or=90%) were misassembled or collapsed within the working draft WGS assembly. The WGS approach often leaves duplications fragmented and unassigned to a chromosome when compared with the clone-ordered-based approach. Our preliminary analysis suggests that 1.7%-2.0% of the mouse genome is part of recent large segmental duplications (about half of what is observed for the human genome). We have constructed a mouse segmental duplication database to aid in the characterization of these regions and their integration into the final mouse genome assembly. This work suggests significant biological differences in the architecture of recent segmental duplications between human and mouse. In addition, our unique method provides the means for improving whole-genome shotgun sequence assembly of mouse and future mammalian genomes.  相似文献   

7.
Copy Number Variants (CNVs) are deletions, duplications or insertions larger than 50 base pairs. They account for a large percentage of the normal genome variation and play major roles in human pathology. While array-based approaches have long been used to detect them in clinical practice, whole-genome sequencing (WGS) bears the promise to allow concomitant exploration of CNVs and smaller variants. However, accurately calling CNVs from WGS remains a difficult computational task, for which a consensus is still lacking. In this paper, we explore practical calling options to reach the best compromise between sensitivity and sensibility. We show that callers based on different signal (paired-end reads, split reads, coverage depth) yield complementary results. We suggest approaches combining four selected callers (Manta, Delly, ERDS, CNVnator) and a regenotyping tool (SV2), and show that this is applicable in everyday practice in terms of computation time and further interpretation. We demonstrate the superiority of these approaches over array-based Comparative Genomic Hybridization (aCGH), specifically regarding the lack of resolution in breakpoint definition and the detection of potentially relevant CNVs. Finally, we confirm our results on the NA12878 benchmark genome, as well as one clinically validated sample. In conclusion, we suggest that WGS constitutes a timely and economically valid alternative to the combination of aCGH and whole-exome sequencing.Subject terms: DNA sequencing, Genome informatics  相似文献   

8.
9.
The phusion assembler   总被引:14,自引:2,他引:14       下载免费PDF全文
The Phusion assembler has assembled the mouse genome from the whole-genome shotgun (WGS) dataset collected by the Mouse Genome Sequencing Consortium, at ~7.5x sequence coverage, producing a high-quality draft assembly 2.6 gigabases in size, of which 90% of these bases are in 479 scaffolds. For the mouse genome, which is a large and repeat-rich genome, the input dataset was designed to include a high proportion of paired end sequences of various size selected inserts, from 2-200 kbp lengths, into various host vector templates. Phusion uses sequence data, called reads, and information about reads that share common templates, called read pairs, to drive the assembly of this large genome to highly accurate results. The preassembly stage, which clusters the reads into sensible groups, is a key element of the entire assembler, because it permits a simple approach to parallelization of the assembly stage, as each cluster can be treated independent of the others. In addition to the application of Phusion to the mouse genome, we will also present results from the WGS assembly of Caenorhabditis briggsae sequenced to about 11x coverage. The C. briggsae assembly was accessioned through EMBL, http://www.ebi.ac.uk/services/index.html, using the series CAAC01000001-CAAC01000578, however, the Phusion mouse assembly described here was not accessioned. The mouse data was generated by the Mouse Genome Sequencing Consortium. The C. briggsae sequence was generated at The Wellcome Trust Sanger Institute and the Genome Sequencing Center, Washington University School of Medicine.  相似文献   

10.
MEGAN analysis of metagenomic data   总被引:16,自引:1,他引:15       下载免费PDF全文
Metagenomics is the study of the genomic content of a sample of organisms obtained from a common habitat using targeted or random sequencing. Goals include understanding the extent and role of microbial diversity. The taxonomical content of such a sample is usually estimated by comparison against sequence databases of known sequences. Most published studies use the analysis of paired-end reads, complete sequences of environmental fosmid and BAC clones, or environmental assemblies. Emerging sequencing-by-synthesis technologies with very high throughput are paving the way to low-cost random "shotgun" approaches. This paper introduces MEGAN, a new computer program that allows laptop analysis of large metagenomic data sets. In a preprocessing step, the set of DNA sequences is compared against databases of known sequences using BLAST or another comparison tool. MEGAN is then used to compute and explore the taxonomical content of the data set, employing the NCBI taxonomy to summarize and order the results. A simple lowest common ancestor algorithm assigns reads to taxa such that the taxonomical level of the assigned taxon reflects the level of conservation of the sequence. The software allows large data sets to be dissected without the need for assembly or the targeting of specific phylogenetic markers. It provides graphical and statistical output for comparing different data sets. The approach is applied to several data sets, including the Sargasso Sea data set, a recently published metagenomic data set sampled from a mammoth bone, and several complete microbial genomes. Also, simulations that evaluate the performance of the approach for different read lengths are presented.  相似文献   

11.
A chemiluminescent DNA probe (Accuprobe) assay developed by Gen Probe, Inc., for the rapid identification of Histoplasma capsulatum was evaluated and compared with the exoantigen test by using 162 coded cultures including Histoplasma capsulatum var. capsulatum, Histoplasma capsulatum var. duboisii, Histoplasma capsulatum var. farciminosum, Blastomyces dermatitidis, Coccidioides immitis, Paracoccidioides brasiliensis, and morphologically related saprobic fungi. Each test uses a chemiluminescent, acridinium ester-labeled, single-stranded DNA probe that is complementary to the rRNA of the target organism. Lysates of the test cultures were prepared by sonication with glass beads and heat treated. After the rRNA was released from the target organism, the labeled DNA probe combined with the target H. capsulatum rRNA to form a stable DNA-RNA hybrid. A hybridization protection assay was used, and the chemiluminescence of hybrids was measured initially with a Leader 1 luminometer as relative light units and later during the investigation with a probe assay luminometer as probe light units. Of the 162 coded mycelial cultures tested by the Accuprobe assay, 105 were identified as H. capsulatum. The test could be performed with an inoculum of a few square millimeters (1 to 2 mm2) of growth. In the primary evaluation, the Accuprobe identified 103 of the 105 cultures as H. capsulatum within 2 h. The remaining two cultures, contaminated with bacteria, had to be purified before the Accuprobe assay identified them correctly as H. capsulatum. Since each coded culture was concurrently tested for H. capsulatum, B. dermatitidis, and C. immitis exoantigens, the identification of all three dimorphic pathogens was provided simultaneously. Of the 162 coded cultures tested, 105 were identified by the exoantigen test as H. capsulatum, 12 were identified as B. dermatitidis, 13 were identified as C. immitis, and 32 were negative for H. capsulatum, B. dermatitidis, and C. immitis. The bacterial contamination in two isolates did not interfere with the exoantigen testing. The exoantigen test required 7- to 10-day-old colonies and required 48 to 72 h of incubation before definitive identification was obtained.  相似文献   

12.
D Oddo  M Etchart  L Thompson 《Pathology, research and practice》1990,186(4):514-7; discussion 518
A case of histoplasmosis duboisii in a 30 year-old engineer is presented. The diagnosis was made with the help of light microscopy, electron microscopy and cultures. Although diagnosed in Chile, the patient probably acquired the disease in the endemic African area, more precisely in the Ivory Coast. Differential diagnosis between Histoplasma capsulatum var. capsulatum and Histoplasma capsulatum var. duboisii is based primarily on the larger in-vivo yeast form size of the latter. Electron microscopy study, the first done on the duboisii variety of Histoplasma capsulatum in human material to our knowledge, was not essential for this purpose. Differential diagnosis between Histoplasma capsulatum var. duboisii and Blastomyces dermatitidis is based on morphological tissue changes and mycologic characteristics. Once more, a case of an "exotic" or geographically restricted disease is detected far from its endemic area, thanks to easier means of transportation. Earth is a shrinking planet.  相似文献   

13.
Li X  Waterman MS 《Genome research》2003,13(8):1916-1922
In shotgun sequencing projects, the genome or BAC length is not always known. We approach estimating genome length by first estimating the repeat structure of the genome or BAC, sometimes of interest in its own right, on the basis of a set of random reads from a genome project. Moreover, we can find the consensus for repeat families before assembly. Our methods are based on the l-tuple content of the reads.  相似文献   

14.
Individuals carrying balanced translocations have a high risk of birth defects, recurrent spontaneous abortions and infertility. Thus, the detection and characterization of balanced translocations is important to reveal the genetic background of the carriers and to provide proper genetic counseling. Next‐generation sequencing (NGS), which has great advantages over other methods such as karyotyping and fluorescence in situ hybridization (FISH), has been used to detect disease‐associated breakpoints. Herein, to evaluate the application of this technology to detect balanced translocations in the clinic, we performed a parental study for prenatal cases with unbalanced translocations. Eight candidate families with potential balanced translocations were investigated using two strategies in parallel, low‐coverage whole‐genome sequencing (WGS) followed‐up by Sanger sequencing and G‐banding karyotype coupled with FISH. G‐banding analysis revealed three balanced translocations, and FISH detected two cryptic submicroscopic balanced translocations. Consistently, WGS detected five balanced translocations and mapped all the breakpoints by Sanger sequencing. Analysis of the breakpoints revealed that six genes were disrupted in the four apparently healthy carriers. In summary, our result suggested low‐coverage WGS can detect balanced translocations reliably and can map breakpoints precisely compared with conventional procedures. WGS may replace cytogenetic methods in the diagnosis of balanced translocation carriers in the clinic.  相似文献   

15.
We report a case of Histoplasma capsulatum endocarditis in which Histoplasma antigen assay and fungal blood cultures were negative. The diagnosis was made by microscopic examination and culture of the excised valve. Histoplasma capsulatum should be considered in the differential diagnosis of culture-negative endocarditis in regions where it is endemic and in travelers.  相似文献   

16.
Structural changes (deletions, insertions, and inversions) between human and chimpanzee genomes have likely had a significant impact on lineage-specific evolution because of their potential for dramatic and irreversible mutation. The low-quality nature of the current chimpanzee genome assembly precludes the reliable identification of many of these differences. To circumvent this, we applied a method to optimally map chimpanzee fosmid paired-end sequences against the human genome to systematically identify sites of structural variation > or = 12 kb between the two species. Our analysis yielded a total of 651 putative sites of chimpanzee deletion (n = 293), insertions (n = 184), and rearrangements consistent with local inversions between the two genomes (n = 174). We validated a subset (19/23) of insertion and deletions using PCR and Southern blot assays, confirming the accuracy of our method. The events are distributed throughout the genome on all chromosomes but are highly correlated with sites of segmental duplication in human and chimpanzee. These structural variants encompass at least 24 Mb of DNA and overlap with > 245 genes. Seventeen of these genes contain exons missing in the chimpanzee genomic sequence and also show a significant reduction in gene expression in chimpanzee. Compared with the pioneering work of Yunis, Prakash, Dutrillaux, and Lejeune, this analysis expands the number of potential rearrangements between chimpanzees and humans 50-fold. Furthermore, this work prioritizes regions for further finishing in the chimpanzee genome and provides a resource for interrogating functional differences between humans and chimpanzees.  相似文献   

17.
We designed and tested a real-time LightCycler PCR assay for Histoplasma capsulatum that correctly identified the 34 H. capsulatum isolates in a battery of 107 fungal isolates tested and also detected H. capsulatum in clinical specimens from three patients that were culture positive for this organism.  相似文献   

18.
The basidiomycete fungus Cryptococcus neoformans is an important opportunistic pathogen of humans that poses a significant threat to immunocompromised individuals. Isolates of C. neoformans are classified into serotypes (A, B, C, D, and AD) based on antigenic differences in the polysaccharide capsule that surrounds the fungal cells. Genomic and EST sequencing projects are underway for the serotype D strain JEC21 and the serotype A strain H99. As part of a genomics program for C. neoformans, we have constructed fingerprinted bacterial artificial chromosome (BAC) clone physical maps for strains H99 and JEC21 to support the genomic sequencing efforts and to provide an initial comparison of the two genomes. The BAC clones represented an estimated 10-fold redundant coverage of the genomes of each serotype and allowed the assembly of 20 contigs each for H99 and JEC21. We found that the genomes of the two strains are sufficiently distinct to prevent coassembly of the two maps when combined fingerprint data are used to construct contigs. Hybridization experiments placed 82 markers on the JEC21 map and 102 markers on the H99 map, enabling contigs to be linked with specific chromosomes identified by electrophoretic karyotyping. These markers revealed both extensive similarity in gene order (conservation of synteny) between JEC21 and H99 as well as examples of chromosomal rearrangements including inversions and translocations. Sequencing reads were generated from the ends of the BAC clones to allow correlation of genomic shotgun sequence data with physical map contigs. The BAC maps therefore represent a valuable resource for the generation, assembly, and finishing of the genomic sequence of both JEC21 and H99. The physical maps also serve as a link between map-based and sequence-based data, providing a powerful resource for continued genomic studies  相似文献   

19.
ARACHNE: A Whole-Genome Shotgun Assembler   总被引:27,自引:4,他引:23       下载免费PDF全文
We describe a new computer system, called ARACHNE, for assembling genome sequence using paired-end whole-genome shotgun reads. ARACHNE has several key features, including an efficient and sensitive procedure for finding read overlaps, a procedure for scoring overlaps that achieves high accuracy by correcting errors before assembly, read merger based on forward-reverse links, and detection of repeat contigs by forward-reverse link inconsistency. To test ARACHNE, we created simulated reads providing approximately 10-fold coverage of the genomes of H. influenzae, S. cerevisiae, and D. melanogaster, as well as human chromosomes 21 and 22. The assemblies of these simulated reads yielded nearly complete coverage of the respective genomes, with a small number of contigs joined into a smaller number of supercontigs (or scaffolds). For example, analysis of the D. melanogaster genome yielded approximately 98% coverage with an N50 contig length of 324 kb and an N50 supercontig length of 5143 kb. The assembly accuracy was high, although not perfect: small errors occurred at a frequency of roughly 1 per 1 Mb (typically, deletion of approximately 1 kb in size), with a very small number of other misassemblies. The assembly was rapid: the Drosophila assembly required only 21 hours on a single 667 MHz processor and used 8.4 Gb of memory.  相似文献   

20.
Multiple tools have been developed to identify copy number variants (CNVs) from whole exome (WES) and whole genome sequencing (WGS) data. Current tools such as XHMM for WES and CNVnator for WGS identify CNVs based on changes in read depth. For WGS, other methods to identify CNVs include utilizing discordant read pairs and split reads and genome‐wide local assembly with tools such as Lumpy and SvABA, respectively. Here, we introduce a new method to identify deletion CNVs from WES and WGS trio data based on the clustering of Mendelian errors (MEs). Using our Mendelian Error Method (MEM), we identified 127 deletions (inherited and de novo) in 2,601 WES trios from the Pediatric Cardiac Genomics Consortium, with a validation rate of 88% by digital droplet PCR. MEM identified additional de novo deletions compared with XHMM, and a significant enrichment of 15q11.2 deletions compared with controls. In addition, MEM identified eight cases of uniparental disomy, sample switches, and DNA contamination. We applied MEM to WGS data from the Genome In A Bottle Ashkenazi trio and identified deletions with 97% specificity. MEM provides a robust, computationally inexpensive method for identifying deletions, and an orthogonal approach for verifying deletions called by other tools.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号