共查询到20条相似文献,搜索用时 31 毫秒
1.
Havlak P Chen R Durbin KJ Egan A Ren Y Song XZ Weinstock GM Gibbs RA 《Genome research》2004,14(4):721-732
Atlas is a suite of programs developed for assembly of genomes by a "combined approach" that uses DNA sequence reads from both BACs and whole-genome shotgun (WGS) libraries. The BAC clones afford advantages of localized assembly with reduced computational load, and provide a robust method for dealing with repeated sequences. Inclusion of WGS sequences facilitates use of different clone insert sizes and reduces data production costs. A core function of Atlas software is recruitment of WGS sequences into appropriate BACs based on sequence overlaps. Because construction of consensus sequences is from local assembly of these reads, only small (<0.1%) units of the genome are assembled at a time. Once assembled, each BAC is used to derive a genomic layout. This "sequence-based" growth of the genome map has greater precision than with non-sequence-based methods. Use of BACs allows correction of artifacts due to repeats at each stage of the process. This is aided by ancillary data such as BAC fingerprint, other genomic maps, and syntenic relations with other genomes. Atlas was used to assemble a draft DNA sequence of the rat genome; its major components including overlapper and split-scaffold are also being used in pure WGS projects. 相似文献
2.
Warren RL Varabei D Platt D Huang X Messina D Yang SP Kronstad JW Krzywinski M Warren WC Wallis JW Hillier LW Chinwalla AT Schein JE Siddiqui AS Marra MA Wilson RK Jones SJ 《Genome research》2006,16(6):768-775
We describe a targeted approach to improve the contiguity of whole-genome shotgun sequence (WGS) assemblies at run-time, using information from Bacterial Artificial Chromosome (BAC)-based physical maps. Clone sizes and overlaps derived from clone fingerprints are used for the calculation of length constraints between any two BAC neighbors sharing 40% of their size. These constraints are used to promote the linkage and guide the arrangement of sequence contigs within a sequence scaffold at the layout phase of WGS assemblies. This process is facilitated by FASSI, a stand-alone application that calculates BAC end and BAC overlap length constraints from clone fingerprint map contigs created by the FPC package. FASSI is designed to work with the assembly tool PCAP, but its output can be formatted to work with other WGS assembly algorithms able to use length constraints for individual clones. The FASSI method is simple to implement, potentially cost-effective, and has resulted in the increase of scaffold contiguity for both the Drosophila melanogaster and Cryptococcus gattii genomes when compared to a control assembly without map-derived constraints. A 6.5-fold coverage draft DNA sequence of the Pan troglodytes (chimpanzee) genome was assembled using map-derived constraints and resulted in a 26.1% increase in scaffold contiguity. 相似文献
3.
Second-generation sequencing technology can now be used to sequence an entire human genome in a matter of days and at low cost. Sequence read lengths, initially very short, have rapidly increased since the technology first appeared, and we now are seeing a growing number of efforts to sequence large genomes de novo from these short reads. In this Perspective, we describe the issues associated with short-read assembly, the different types of data produced by second-gen sequencers, and the latest assembly algorithms designed for these data. We also review the genomes that have been assembled recently from short reads and make recommendations for sequencing strategies that will yield a high-quality assembly.As genome sequencing technology has evolved, methods for assembling genomes have changed with it. Genome sequencers have never been able to “read” more than a relatively short stretch of DNA at once, with read lengths gradually increasing over time. Reconstructing a complete genome from a set of reads requires an assembly program, and a variety of genome assemblers have been used for this task. In 1995, when the first bacterial genome was published (Haemophilus influenzae), read lengths were ∼460 base pairs (bp), and that whole-genome shotgun (WGS) sequencing project generated 24,304 reads (Fleischmann et al. 1995). The human genome project required ∼30 million reads, with lengths up to 800 bp, using Sanger sequencing technology and automated capillary sequencers (International Human Genome Sequencing Consortium 2001; Venter et al. 2001). This corresponded to 24 billion bases (Gb), or approximately eightfold coverage of the 3-Gb human genome. Redundant coverage, in which on average every nucleotide is sequenced many times over, is required to produce a high-quality assembly. Another benefit of redundancy is greatly increased accuracy compared with a single read: Where a single read might have an error rate of 1%, eightfold coverage has an error rate as low as 10−16 when eight high-quality reads agree with one another. High coverage is also necessary to sequence polymorphic alleles within diploid or polyploid genomes.Current second-generation sequencing (SGS) technologies produce read lengths ranging from 35 to 400 bp, at far greater speed and much lower cost than Sanger sequencing. However, as reads get shorter, coverage needs to increase to compensate for the decreased connectivity and produce a comparable assembly. Certain problems cannot be overcome by deeper coverage: If a repetitive sequence is longer than a read, then coverage alone will never compensate, and all copies of that sequence will produce gaps in the assembly. These gaps can be spanned by paired reads—consisting of two reads generated from a single fragment of DNA and separated by a known distance—as long as the pair separation distance is longer than the repeat. Paired-end sequencing is available from most of the SGS machines, although it is not yet as flexible or as reliable as paired-end sequencing using traditional methods.After the successful assembly of the human (International Human Genome Sequencing Consortium 2001; Venter et al. 2001) and mouse (Waterston et al. 2002) genomes by whole-genome shotgun sequencing, most large-scale genome projects quickly moved to adopt the WGS approach, which has subsequently been used for dozens of eukaryotic genomes. Today, thanks to changes in sequencing technology, a major question confronting genome projects is, can we sequence a large genome (>100 Mbp) using short reads? If so, what are the limitations on read length, coverage, and error rates? How much paired-end sequencing is necessary? And what will the assembly look like? In this perspective we take a look at each of these questions and describe the solutions available today. Although we provide some answers, we have no doubt that the solutions will change rapidly over the next few years, as both the sequencing methods and the computational solutions improve. 相似文献
4.
《Genome research》2015,25(3):445-458
Drosophila melanogaster plays an important role in molecular,
genetic, and genomic studies of heredity, development, metabolism, behavior, and
human disease. The initial reference genome sequence reported more than a decade ago
had a profound impact on progress in Drosophila research, and
improving the accuracy and completeness of this sequence continues to be important to
further progress. We previously described improvement of the 117-Mb sequence in the
euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a
whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here,
we report an improved reference sequence of the single-copy and middle-repetitive
regions of the genome, produced using cytogenetic mapping to mitotic and polytene
chromosomes, clone-based finishing and BAC fingerprint verification, ordering of
scaffolds by alignment to cDNA sequences, incorporation of other map and sequence
data, and validation by whole-genome optical restriction mapping. These data
substantially improve the accuracy and completeness of the reference sequence and the
order and orientation of sequence scaffolds into chromosome arm assemblies.
Representation of the Y chromosome and other heterochromatic regions
is particularly improved. The new 143.9-Mb reference sequence, designated Release 6,
effectively exhausts clone-based technologies for mapping and sequencing. Highly
repeat-rich regions, including large satellite blocks and functional elements such as
the ribosomal RNA genes and the centromeres, are largely inaccessible to current
sequencing and assembly methods and remain poorly represented. Further significant
improvements will require sequencing technologies that do not depend on molecular
cloning and that produce very long reads.The genome sequence of the fruit fly Drosophila melanogaster was first
reported in 2000 (Adams et al. 2000). This sequence
assembly, designated Release 1, represented the single-copy fraction of the genome in 116.2
megabases (Mb) of sequence in 134 large mapped scaffolds containing 1299 sequence gaps and
an additional 3.8 Mb in 704 small (<64 kb) unmapped scaffolds. Release 1 was produced
by combining a de novo whole-genome shotgun (WGS) sequence assembly, designated WGS1 (Myers et al. 2000), with sequences of mapped BAC and P1
genomic clones, including 29.7 Mb of finished sequences and draft sequences of a tiling
path of BAC and P1 clones spanning the euchromatic portion of the genome (Adams et al. 2000).WGS1 and Release 1 were validated by
comparison to the available finished genomic sequences and to a BAC-based physical map of
the major autosomes (Hoskins et al. 2000).WGS1 was the first shotgun assembly of a eukaryotic genome and served as a model for
sequencing mammalian genomes (Venter et al. 2001;
Stark et al. 2007). WGS remains the method of
choice in genome sequencing because it is rapid and efficient. However, because eukaryotic
genomes typically contain a large fraction of repetitive sequences with complex structures,
current WGS sequencing strategies produce fragmented assemblies in which the location,
order, and orientation of sequence scaffolds along the chromosomes are poorly determined.
Furthermore, tandem and dispersed repetitive sequences including gene families,
pseudogenes, transposable elements (TEs), segmental duplications, and simple sequence
repeats are poorly represented. This leads to misassembled regions, unmapped regions, and
numerous gaps, particularly in heterochromatic regions which often span many megabases of
the genome and include vital protein-coding genes and other essential loci. Therefore,
physical mapping, cytogenetic mapping, and sequence finishing to improve genome sequence
assemblies remain a priority, especially for human (International Human Genome Sequencing Consortium 2004) and model organisms of
particular importance in biomedical research.Because D. melanogaster is a widely used research organism, we have
continued to improve the reference genome sequence. Late in 2000, the Release 2 sequence
corrected the order and orientation of a few small sequence scaffolds and filled a few
hundred small sequence gaps. In 2002, we reported BAC-based finishing of 116.9 Mb of genome
sequence in 13 scaffolds spanning the euchromatic portions of the six chromosome arms
(Celniker et al. 2002) and an improved WGS
assembly (WGS3) including 20.7 Mb of draft-quality sequence in larger scaffolds in the
heterochromatic portion of the genome (Celniker et al.
2002; Hoskins et al. 2002). This Release 3
assembly had high sequence accuracy (estimated error rate < 1 in 100,000) and
contiguity (37 sequence gaps; seven physical map gaps) in the euchromatic portion of the
assembly, and the order and orientation of sequences within the assembly was confirmed by
in situ hybridization of 915 BACs to salivary gland polytene chromosomes, representing 96%
of the BACs in a tiling path spanning the euchromatic portion of the assembly (Hoskins et al. 2000; Celniker et al. 2002). The euchromatic sequence went through two unpublished
revisions in 2004 and 2006 (Releases 4 and 5; http://www.fruitfly.org) to further
improve accuracy and completeness. In 2007, we reported on further physical and cytogenetic
mapping, and sequence finishing of 15 Mb in the heterochromatic portion of the genome,
including essentially all single-copy regions (Hoskins et
al. 2007). However, gaps and assembly errors remained due to the difficulties of
mapping and finishing in repeat-rich regions. The remaining physical map gaps resulted from
the absence of genomic regions from BAC libraries, likely due to incompatibility with
molecular cloning or clone instability in E. coli. Sequence gaps within
clone-based assemblies resulted from failure of assembly in complex nested repetitive
regions. The remaining sequence assembly errors were due to incorrect but self-consistent
clone-based sequence assemblies or clone rearrangements. Particularly in heterochromatin,
errors in the physical and cytogenetic maps existed due to the presence of repeat-rich
sequences.Despite impressive developments in high-throughput sequencing technology, the production of
high-quality finished genome sequences has remained laborious and inefficient. Furthermore,
highly repeat-rich genomic regions such as those in centric heterochromatin have remained
inaccessible to mapping, sequencing, and assembly. We define the “centric
heterochromatin” as the repeat-rich sequences found at the functional centromeres
(Sun et al. 2003). “Pericentric
heterochromatin” refers to the Mb-scale regions that flank the centromeres and
contain large blocks of satellite DNA and other simple-sequence repeats (Supplemental Fig.
S1) interspersed with large regions of transposable-element and other middle-repetitive
sequences and including essential protein-coding genes. “Telomeric
heterochromatin” refers to the subtelomeric regions composed of tandem repeats
(Mason and Villasante 2014) and the arrays of
telomeric retrotransposons at the most distal chromosome ends (Abad et al. 2004b). By these definitions, the Y
chromosome is composed entirely of centric, pericentric, and telomeric heterochromatin.Here, we report the Release 6 assembly of the D. melanogaster reference
genome sequence. Much of the improvement in the sequence is in the mapping, finishing, and
assembly of repeat-rich regions in the heterochromatic portions of the genome. Release 6
incorporates (1) additional BAC-based cytogenetic mapping of previously unmapped,
unordered, and unoriented sequence scaffolds by fluorescent in situ hybridization (FISH) to
mitotic and polytene chromosomes, (2) BAC-based sequence finishing of clones spanning the
remainder of the genome physical map guided by comparison to high-resolution BAC
restriction fingerprints, and sequence finishing of 10-kb genomic plasmid clones spanning
the remainder of the WGS3 assembly, (3) use of cDNA sequences to order and orient
scaffolds, (4) incorporation of map and sequence data from other sources, and (5)
validation of the sequence assembly by comparison to a whole-genome optical restriction map
(Zhou et al. 2007). The resulting genome sequence
assembly is a substantially improved reference that spans 143.9 Mb and represents the
practical limit of established technologies. Relative to Release 5, Release 6 closes 628
gaps, extends the chromosome arm assemblies into telomeric and pericentric heterochromatin
by 5.4 Mb, and increases the Y chromosome assembly 10-fold from
∼242 kb to 3.4 Mb. Further substantial improvement to the reference genome sequence
will require new technologies that do not depend on standard molecular cloning. Emerging
very-long-read WGS sequencing and assembly technologies will permit efficient production of
more complete genome sequences for D. melanogaster and other species. 相似文献
5.
Dynamic building of a BAC clone tiling path for the Rat Genome Sequencing Project 总被引:1,自引:0,他引:1 下载免费PDF全文
CLONEPICKER is a software pipeline that integrates sequence data with BAC clone fingerprints to dynamically select a minimal overlapping clone set covering the whole genome. In the Rat Genome Sequencing Project (RGSP), a hybrid strategy of "clone by clone" and "whole genome shotgun" approaches was used to maximize the merits of both approaches. Like the "clone by clone" method, one key challenge for this strategy was to select a low-redundancy clone set that covered the whole genome while the sequencing is in progress. The CLONEPICKER pipeline met this challenge using restriction enzyme fingerprint data, BAC end sequence data, and sequences generated from individual BAC clones as well as WGS reads. In the RGSP, an average of 7.5 clones was identified from each side of a seed clone, and the minimal overlapping clones were reliably selected. Combined with the assembled BAC fingerprint map, a set of BAC clones that covered >97% of the genome was identified and used in the RGSP. 相似文献
6.
Limited comparative studies suggest that the human genome is particularly enriched for recent segmental duplications. The extent of segmental duplications in other mammalian genomes is unknown and confounded by methodological differences in genome assembly. Here, we present a detailed analysis of recent duplication content within the mouse genome using a whole-genome assembly comparison method and a novel assembly independent method, designed to take advantage of the reduced allelic variation of the C57BL/6J strain. We conservatively estimate that approximately 57% of all highly identical segmental duplications (>or=90%) were misassembled or collapsed within the working draft WGS assembly. The WGS approach often leaves duplications fragmented and unassigned to a chromosome when compared with the clone-ordered-based approach. Our preliminary analysis suggests that 1.7%-2.0% of the mouse genome is part of recent large segmental duplications (about half of what is observed for the human genome). We have constructed a mouse segmental duplication database to aid in the characterization of these regions and their integration into the final mouse genome assembly. This work suggests significant biological differences in the architecture of recent segmental duplications between human and mouse. In addition, our unique method provides the means for improving whole-genome shotgun sequence assembly of mouse and future mammalian genomes. 相似文献
7.
Marie Coutelier Manuel Holtgrewe Marten Jger Ricarda Flttman Martin A. Mensah Malte Spielmann Peter Krawitz Denise Horn Dieter Beule Stefan Mundlos 《European journal of human genetics : EJHG》2022,30(2):178
Copy Number Variants (CNVs) are deletions, duplications or insertions larger than 50 base pairs. They account for a large percentage of the normal genome variation and play major roles in human pathology. While array-based approaches have long been used to detect them in clinical practice, whole-genome sequencing (WGS) bears the promise to allow concomitant exploration of CNVs and smaller variants. However, accurately calling CNVs from WGS remains a difficult computational task, for which a consensus is still lacking. In this paper, we explore practical calling options to reach the best compromise between sensitivity and sensibility. We show that callers based on different signal (paired-end reads, split reads, coverage depth) yield complementary results. We suggest approaches combining four selected callers (Manta, Delly, ERDS, CNVnator) and a regenotyping tool (SV2), and show that this is applicable in everyday practice in terms of computation time and further interpretation. We demonstrate the superiority of these approaches over array-based Comparative Genomic Hybridization (aCGH), specifically regarding the lack of resolution in breakpoint definition and the detection of potentially relevant CNVs. Finally, we confirm our results on the NA12878 benchmark genome, as well as one clinically validated sample. In conclusion, we suggest that WGS constitutes a timely and economically valid alternative to the combination of aCGH and whole-exome sequencing.Subject terms: DNA sequencing, Genome informatics 相似文献
8.
9.
The Phusion assembler has assembled the mouse genome from the whole-genome shotgun (WGS) dataset collected by the Mouse Genome Sequencing Consortium, at ~7.5x sequence coverage, producing a high-quality draft assembly 2.6 gigabases in size, of which 90% of these bases are in 479 scaffolds. For the mouse genome, which is a large and repeat-rich genome, the input dataset was designed to include a high proportion of paired end sequences of various size selected inserts, from 2-200 kbp lengths, into various host vector templates. Phusion uses sequence data, called reads, and information about reads that share common templates, called read pairs, to drive the assembly of this large genome to highly accurate results. The preassembly stage, which clusters the reads into sensible groups, is a key element of the entire assembler, because it permits a simple approach to parallelization of the assembly stage, as each cluster can be treated independent of the others. In addition to the application of Phusion to the mouse genome, we will also present results from the WGS assembly of Caenorhabditis briggsae sequenced to about 11x coverage. The C. briggsae assembly was accessioned through EMBL, http://www.ebi.ac.uk/services/index.html, using the series CAAC01000001-CAAC01000578, however, the Phusion mouse assembly described here was not accessioned. The mouse data was generated by the Mouse Genome Sequencing Consortium. The C. briggsae sequence was generated at The Wellcome Trust Sanger Institute and the Genome Sequencing Center, Washington University School of Medicine. 相似文献
10.
Metagenomics is the study of the genomic content of a sample of organisms obtained from a common habitat using targeted or random sequencing. Goals include understanding the extent and role of microbial diversity. The taxonomical content of such a sample is usually estimated by comparison against sequence databases of known sequences. Most published studies use the analysis of paired-end reads, complete sequences of environmental fosmid and BAC clones, or environmental assemblies. Emerging sequencing-by-synthesis technologies with very high throughput are paving the way to low-cost random "shotgun" approaches. This paper introduces MEGAN, a new computer program that allows laptop analysis of large metagenomic data sets. In a preprocessing step, the set of DNA sequences is compared against databases of known sequences using BLAST or another comparison tool. MEGAN is then used to compute and explore the taxonomical content of the data set, employing the NCBI taxonomy to summarize and order the results. A simple lowest common ancestor algorithm assigns reads to taxa such that the taxonomical level of the assigned taxon reflects the level of conservation of the sequence. The software allows large data sets to be dissected without the need for assembly or the targeting of specific phylogenetic markers. It provides graphical and statistical output for comparing different data sets. The approach is applied to several data sets, including the Sargasso Sea data set, a recently published metagenomic data set sampled from a mammoth bone, and several complete microbial genomes. Also, simulations that evaluate the performance of the approach for different read lengths are presented. 相似文献
11.
A A Padhye G Smith D McLaughlin P G Standard L Kaufman 《Journal of clinical microbiology》1992,30(12):3108-3111
A chemiluminescent DNA probe (Accuprobe) assay developed by Gen Probe, Inc., for the rapid identification of Histoplasma capsulatum was evaluated and compared with the exoantigen test by using 162 coded cultures including Histoplasma capsulatum var. capsulatum, Histoplasma capsulatum var. duboisii, Histoplasma capsulatum var. farciminosum, Blastomyces dermatitidis, Coccidioides immitis, Paracoccidioides brasiliensis, and morphologically related saprobic fungi. Each test uses a chemiluminescent, acridinium ester-labeled, single-stranded DNA probe that is complementary to the rRNA of the target organism. Lysates of the test cultures were prepared by sonication with glass beads and heat treated. After the rRNA was released from the target organism, the labeled DNA probe combined with the target H. capsulatum rRNA to form a stable DNA-RNA hybrid. A hybridization protection assay was used, and the chemiluminescence of hybrids was measured initially with a Leader 1 luminometer as relative light units and later during the investigation with a probe assay luminometer as probe light units. Of the 162 coded mycelial cultures tested by the Accuprobe assay, 105 were identified as H. capsulatum. The test could be performed with an inoculum of a few square millimeters (1 to 2 mm2) of growth. In the primary evaluation, the Accuprobe identified 103 of the 105 cultures as H. capsulatum within 2 h. The remaining two cultures, contaminated with bacteria, had to be purified before the Accuprobe assay identified them correctly as H. capsulatum. Since each coded culture was concurrently tested for H. capsulatum, B. dermatitidis, and C. immitis exoantigens, the identification of all three dimorphic pathogens was provided simultaneously. Of the 162 coded cultures tested, 105 were identified by the exoantigen test as H. capsulatum, 12 were identified as B. dermatitidis, 13 were identified as C. immitis, and 32 were negative for H. capsulatum, B. dermatitidis, and C. immitis. The bacterial contamination in two isolates did not interfere with the exoantigen testing. The exoantigen test required 7- to 10-day-old colonies and required 48 to 72 h of incubation before definitive identification was obtained. 相似文献
12.
A case of histoplasmosis duboisii in a 30 year-old engineer is presented. The diagnosis was made with the help of light microscopy, electron microscopy and cultures. Although diagnosed in Chile, the patient probably acquired the disease in the endemic African area, more precisely in the Ivory Coast. Differential diagnosis between Histoplasma capsulatum var. capsulatum and Histoplasma capsulatum var. duboisii is based primarily on the larger in-vivo yeast form size of the latter. Electron microscopy study, the first done on the duboisii variety of Histoplasma capsulatum in human material to our knowledge, was not essential for this purpose. Differential diagnosis between Histoplasma capsulatum var. duboisii and Blastomyces dermatitidis is based on morphological tissue changes and mycologic characteristics. Once more, a case of an "exotic" or geographically restricted disease is detected far from its endemic area, thanks to easier means of transportation. Earth is a shrinking planet. 相似文献
13.
In shotgun sequencing projects, the genome or BAC length is not always known. We approach estimating genome length by first estimating the repeat structure of the genome or BAC, sometimes of interest in its own right, on the basis of a set of random reads from a genome project. Moreover, we can find the consensus for repeat families before assembly. Our methods are based on the l-tuple content of the reads. 相似文献
14.
Clinical application of whole‐genome low‐coverage next‐generation sequencing to detect and characterize balanced chromosomal translocations 下载免费PDF全文
D. Liang Y. Wang X. Ji H. Hu J. Zhang L. Meng Y. Lin D. Ma T. Jiang H. Jiang Asan L. Song J. Guo P. Hu Z. Xu 《Clinical genetics》2017,91(4):605-610
Individuals carrying balanced translocations have a high risk of birth defects, recurrent spontaneous abortions and infertility. Thus, the detection and characterization of balanced translocations is important to reveal the genetic background of the carriers and to provide proper genetic counseling. Next‐generation sequencing (NGS), which has great advantages over other methods such as karyotyping and fluorescence in situ hybridization (FISH), has been used to detect disease‐associated breakpoints. Herein, to evaluate the application of this technology to detect balanced translocations in the clinic, we performed a parental study for prenatal cases with unbalanced translocations. Eight candidate families with potential balanced translocations were investigated using two strategies in parallel, low‐coverage whole‐genome sequencing (WGS) followed‐up by Sanger sequencing and G‐banding karyotype coupled with FISH. G‐banding analysis revealed three balanced translocations, and FISH detected two cryptic submicroscopic balanced translocations. Consistently, WGS detected five balanced translocations and mapped all the breakpoints by Sanger sequencing. Analysis of the breakpoints revealed that six genes were disrupted in the four apparently healthy carriers. In summary, our result suggested low‐coverage WGS can detect balanced translocations reliably and can map breakpoints precisely compared with conventional procedures. WGS may replace cytogenetic methods in the diagnosis of balanced translocation carriers in the clinic. 相似文献
15.
Jinno S Gripshover BM Lemonovich TL Anderson JM Jacobs MR 《Journal of clinical microbiology》2010,48(12):4664-4666
We report a case of Histoplasma capsulatum endocarditis in which Histoplasma antigen assay and fungal blood cultures were negative. The diagnosis was made by microscopic examination and culture of the excised valve. Histoplasma capsulatum should be considered in the differential diagnosis of culture-negative endocarditis in regions where it is endemic and in travelers. 相似文献
16.
A genome-wide survey of structural variation between human and chimpanzee 总被引:15,自引:5,他引:15 下载免费PDF全文
Newman TL Tuzun E Morrison VA Hayden KE Ventura M McGrath SD Rocchi M Eichler EE 《Genome research》2005,15(10):1344-1356
Structural changes (deletions, insertions, and inversions) between human and chimpanzee genomes have likely had a significant impact on lineage-specific evolution because of their potential for dramatic and irreversible mutation. The low-quality nature of the current chimpanzee genome assembly precludes the reliable identification of many of these differences. To circumvent this, we applied a method to optimally map chimpanzee fosmid paired-end sequences against the human genome to systematically identify sites of structural variation > or = 12 kb between the two species. Our analysis yielded a total of 651 putative sites of chimpanzee deletion (n = 293), insertions (n = 184), and rearrangements consistent with local inversions between the two genomes (n = 174). We validated a subset (19/23) of insertion and deletions using PCR and Southern blot assays, confirming the accuracy of our method. The events are distributed throughout the genome on all chromosomes but are highly correlated with sites of segmental duplication in human and chimpanzee. These structural variants encompass at least 24 Mb of DNA and overlap with > 245 genes. Seventeen of these genes contain exons missing in the chimpanzee genomic sequence and also show a significant reduction in gene expression in chimpanzee. Compared with the pioneering work of Yunis, Prakash, Dutrillaux, and Lejeune, this analysis expands the number of potential rearrangements between chimpanzees and humans 50-fold. Furthermore, this work prioritizes regions for further finishing in the chimpanzee genome and provides a resource for interrogating functional differences between humans and chimpanzees. 相似文献
17.
Martagon-Villamil J Shrestha N Sholtis M Isada CM Hall GS Bryne T Lodge BA Reller LB Procop GW 《Journal of clinical microbiology》2003,41(3):1295-1298
We designed and tested a real-time LightCycler PCR assay for Histoplasma capsulatum that correctly identified the 34 H. capsulatum isolates in a battery of 107 fungal isolates tested and also detected H. capsulatum in clinical specimens from three patients that were culture positive for this organism. 相似文献
18.
Schein JE Tangen KL Chiu R Shin H Lengeler KB MacDonald WK Bosdet I Heitman J Jones SJ Marra MA Kronstad JW 《Genome research》2002,12(9):1445-1453
The basidiomycete fungus Cryptococcus neoformans is an important opportunistic pathogen of humans that poses a significant threat to immunocompromised individuals. Isolates of C. neoformans are classified into serotypes (A, B, C, D, and AD) based on antigenic differences in the polysaccharide capsule that surrounds the fungal cells. Genomic and EST sequencing projects are underway for the serotype D strain JEC21 and the serotype A strain H99. As part of a genomics program for C. neoformans, we have constructed fingerprinted bacterial artificial chromosome (BAC) clone physical maps for strains H99 and JEC21 to support the genomic sequencing efforts and to provide an initial comparison of the two genomes. The BAC clones represented an estimated 10-fold redundant coverage of the genomes of each serotype and allowed the assembly of 20 contigs each for H99 and JEC21. We found that the genomes of the two strains are sufficiently distinct to prevent coassembly of the two maps when combined fingerprint data are used to construct contigs. Hybridization experiments placed 82 markers on the JEC21 map and 102 markers on the H99 map, enabling contigs to be linked with specific chromosomes identified by electrophoretic karyotyping. These markers revealed both extensive similarity in gene order (conservation of synteny) between JEC21 and H99 as well as examples of chromosomal rearrangements including inversions and translocations. Sequencing reads were generated from the ends of the BAC clones to allow correlation of genomic shotgun sequence data with physical map contigs. The BAC maps therefore represent a valuable resource for the generation, assembly, and finishing of the genomic sequence of both JEC21 and H99. The physical maps also serve as a link between map-based and sequence-based data, providing a powerful resource for continued genomic studies 相似文献
19.
Serafim Batzoglou David B. Jaffe Ken Stanley Jonathan Butler Sante Gnerre Evan Mauceli Bonnie Berger Jill P. Mesirov Eric S. Lander 《Genome research》2002,12(1):177-189
We describe a new computer system, called ARACHNE, for assembling genome sequence using paired-end whole-genome shotgun reads. ARACHNE has several key features, including an efficient and sensitive procedure for finding read overlaps, a procedure for scoring overlaps that achieves high accuracy by correcting errors before assembly, read merger based on forward-reverse links, and detection of repeat contigs by forward-reverse link inconsistency. To test ARACHNE, we created simulated reads providing approximately 10-fold coverage of the genomes of H. influenzae, S. cerevisiae, and D. melanogaster, as well as human chromosomes 21 and 22. The assemblies of these simulated reads yielded nearly complete coverage of the respective genomes, with a small number of contigs joined into a smaller number of supercontigs (or scaffolds). For example, analysis of the D. melanogaster genome yielded approximately 98% coverage with an N50 contig length of 324 kb and an N50 supercontig length of 5143 kb. The assembly accuracy was high, although not perfect: small errors occurred at a frequency of roughly 1 per 1 Mb (typically, deletion of approximately 1 kb in size), with a very small number of other misassemblies. The assembly was rapid: the Drosophila assembly required only 21 hours on a single 667 MHz processor and used 8.4 Gb of memory. 相似文献
20.
Robust identification of deletions in exome and genome sequence data based on clustering of Mendelian errors 总被引:1,自引:0,他引:1 下载免费PDF全文
Kathryn B. Manheimer Nihir Patel Felix Richter Joshua Gorham Angela C. Tai Jason Homsy Marko T. Boskovski Michael Parfenov Elizabeth Goldmuntz Wendy K. Chung Martina Brueckner Martin Tristani‐Firouzi Deepak Srivastava Jonathan G. Seidman Christine E. Seidman Bruce D. Gelb Andrew J. Sharp 《Human mutation》2018,39(6):870-881
Multiple tools have been developed to identify copy number variants (CNVs) from whole exome (WES) and whole genome sequencing (WGS) data. Current tools such as XHMM for WES and CNVnator for WGS identify CNVs based on changes in read depth. For WGS, other methods to identify CNVs include utilizing discordant read pairs and split reads and genome‐wide local assembly with tools such as Lumpy and SvABA, respectively. Here, we introduce a new method to identify deletion CNVs from WES and WGS trio data based on the clustering of Mendelian errors (MEs). Using our Mendelian Error Method (MEM), we identified 127 deletions (inherited and de novo) in 2,601 WES trios from the Pediatric Cardiac Genomics Consortium, with a validation rate of 88% by digital droplet PCR. MEM identified additional de novo deletions compared with XHMM, and a significant enrichment of 15q11.2 deletions compared with controls. In addition, MEM identified eight cases of uniparental disomy, sample switches, and DNA contamination. We applied MEM to WGS data from the Genome In A Bottle Ashkenazi trio and identified deletions with 97% specificity. MEM provides a robust, computationally inexpensive method for identifying deletions, and an orthogonal approach for verifying deletions called by other tools. 相似文献