共查询到20条相似文献,搜索用时 15 毫秒
1.
The parking strategy is an iterative approach to DNA sequencing. Each iteration consists of sequencing a novel portion of target DNA that does not overlap any previously sequenced region. Subject to the constraint of no overlap, each new region is chosen randomly. A parking strategy is often ideal in the early stages of a project for rapidly generating unique data. As a project progresses, parking becomes progressively more expensive and eventually prohibitive. We present a mathematical model with a generalization to allow for overlaps. This model predicts multiple parameters, including progress, costs, and the distribution of gap sizes left by a parking strategy. The highly fragmented nature of the gaps left after an initial parking strategy may make it difficult to finish a project efficiently. Therefore, in addition to our parking model, we model gap closing by walking. Our gap-closing model is generalizable to many other strategies. Our discussion includes modified parking strategies and hybrids with other strategies. A hybrid parking strategy has been employed for portions of the Human Genome Project. 相似文献
2.
Gilissen C Hoischen A Brunner HG Veltman JA 《European journal of human genetics : EJHG》2012,20(5):490-497
Next generation sequencing can be used to search for Mendelian disease genes in an unbiased manner by sequencing the entire protein-coding sequence, known as the exome, or even the entire human genome. Identifying the pathogenic mutation amongst thousands to millions of genomic variants is a major challenge, and novel variant prioritization strategies are required. The choice of these strategies depends on the availability of well-phenotyped patients and family members, the mode of inheritance, the severity of the disease and its population frequency. In this review, we discuss the current strategies for Mendelian disease gene identification by exome resequencing. We conclude that exome strategies are successful and identify new Mendelian disease genes in approximately 60% of the projects. Improvements in bioinformatics as well as in sequencing technology will likely increase the success rate even further. Exome sequencing is likely to become the most commonly used tool for Mendelian disease gene identification for the coming years. 相似文献
3.
4.
Adams DR Sincan M Fuentes Fajardo K Mullikin JC Pierson TM Toro C Boerkoel CF Tifft CJ Gahl WA Markello TC 《Human mutation》2012,33(4):599-608
The Undiagnosed Diseases Program at the National Institutes of Health uses high-throughput sequencing (HTS) to diagnose rare and novel diseases. HTS techniques generate large numbers of DNA sequence variants, which must be analyzed and filtered to find candidates for disease causation. Despite the publication of an increasing number of successful exome-based projects, there has been little formal discussion of the analytic steps applied to HTS variant lists. We present the results of our experience with over 30 families for whom HTS sequencing was used in an attempt to find clinical diagnoses. For each family, exome sequence was augmented with high-density SNP-array data. We present a discussion of the theory and practical application of each analytic step and provide example data to illustrate our approach. The article is designed to provide an analytic roadmap for variant analysis, thereby enabling a wide range of researchers and clinical genetics practitioners to perform direct analysis of HTS data for their patients and projects. 相似文献
5.
A method for fast nucleotide sequencing is described. It is based in the selection of well-known small oligomers able to be hybridized with the unknown target. The selected oligomers are afterwards ordered following a simple statistical approach. The use of capillary electrophoresis for the analysis is emphasized. 相似文献
6.
The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data 总被引:10,自引:0,他引:10
Aaron McKenna Matthew Hanna Eric Banks Andrey Sivachenko Kristian Cibulskis Andrew Kernytsky Kiran Garimella David Altshuler Stacey Gabriel Mark Daly Mark A. DePristo 《Genome research》2010,20(9):1297-1303
Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.In recent years, there has been a rapid expansion in the number of next-generation sequencing platforms, including Illumina (Bentley et al. 2008), the Applied Biosystems SOLiD System (McKernan et al. 2009), 454 Life Sciences (Roche) (Margulies et al. 2005), Helicos HeliScope (Shendure and Ji 2008), and most recently Complete Genomics (Drmanac et al. 2010). Many tools have been created to work with next-generation sequencer data, from read based aligners like MAQ (Li et al. 2008a), BWA (Li and Durbin 2009), and SOAP (Li et al. 2008b), to single nucleotide polymorphism and structural variation detection tools like BreakDancer (Chen et al. 2009), VarScan (Koboldt et al. 2009), and MAQ. Although these tools are highly effective in their problem domains, there still exists a large development gap between sequencing output and analysis results, in part because tailoring these analysis tools to answer specific scientific questions can be laborious and difficult. General frameworks are available for processing next-generation sequencing data but tend to focus on specific classes of analysis problems—like quality assessment of sequencing data, as in PIQA (Martinez-Alcantara et al. 2009)—or require specialized knowledge of an existing framework, as in BioConductor in the ShortRead toolset (Morgan et al. 2009). The lack of sophisticated and flexible programming frameworks that enable downstream analysts to access and manipulate the massive sequencing data sets in a programmatic way has been a hindrance to the rapid development of new tools and methods.With the emergence of the SAM file specification (Li et al. 2009) as the standard format for storage of platform-independent next-generation sequencing data, we saw the opportunity to implement an analysis programming framework which takes advantage of this common input format to simplify the up-front coding costs for end users. Here, we present the Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce (Dean and Ghemawat 2008). By separating specific analysis calculations from common data management infrastructure, tools are easy to write while benefiting from ongoing improvements to the core GATK. The GATK engine is constantly being refined and optimized for correctness, stability, and CPU and memory efficiency; this well-structured software core allows the GATK to support advanced features such as distributed and automatic shared-memory parallelization. Here, we highlight the capabilities of the GATK, which has been used to implement a range of analysis methods for projects like The Cancer Genome Atlas (http://cancergenome.nih.gov) and the 1000 Genomes Project (http://www.1000genomes.org), by describing the implementation of depth of coverage analysis tools and a Bayesian single nucleotide polymorphism (SNP) genotyper, and show the application of these tools to the 1000 Genomes Project pilot data. 相似文献
7.
背景:DNA模板质量对DNA序列测定起着至关重要的作用。
目的:为基因组DNA或甲基化DNA测序寻找一种经济,简便的方法。
方法:分别采用96管集合板及96孔板提取质粒,并且针对质粒设计一对包含目的片段的引物,扩增后纯化PCR产物,通过以上3种方法制备DNA测序模板进行测序。
结果与结论:实验所采用的3种方法对于基因组DNA测序效果无差异(P > 0.05)。对于甲基化DNA测序效果,96管集合板法优于其他2种方法(P < 0.05)。说明3种方法均适用于基因组DNA的测序,而96管集合板法更适用于甲基化DNA的测序。 相似文献
8.
Aim
To apply massively parallel and clonal sequencing (next generation sequencing or NGS) to the analysis of forensic mixed samples.Methods
A duplex polymerase chain reaction (PCR) assay targeting the mitochondrial DNA (mtDNA) hypervariable regions I/II (HVI/HVII) was developed for NGS analysis on the Roche 454 GS Junior instrument. Eight sets of multiplex identifier-tagged 454 fusion primers were used in a combinatorial approach for amplification and deep sequencing of up to 64 samples in parallel.Results
This assay was shown to be highly sensitive for sequencing limited DNA amounts ( ~ 100 mtDNA copies) and analyzing contrived and biological mixtures with low level variants ( ~ 1%) as well as “complex” mixtures (≥3 contributors). PCR artifact “hybrid” sequences generated by jumping PCR or template switching were observed at a low level (<2%) in the analysis of mixed samples but could be eliminated by reducing the PCR cycle number.Conclusion
This study demonstrates the power of NGS technologies targeting the mtDNA HVI/HVII regions for analysis of challenging forensic samples, such as mixtures and specimens with limited DNA.Limited and mixed DNA samples are often encountered in forensic cases and pose both technical and interpretation challenges. The highly polymorphic hypervariable regions I/II (HVI/II) of the mitochondrial genome are often successfully used to analyze limited and/or degraded DNA samples (1). However, there are some limitations to the current standard approaches used for mitochondrial DNA (mtDNA) sequence analysis when mixtures are encountered. In a five year retrospective study of mtDNA analysis of 691 casework hair samples, a mixture of mtDNA sequences attributed to a secondary source was observed in 8.7% of the hairs and sequence heteroplasmy was observed in 11.7% of the cases (2). While approaches that use capillary electrophoresis technologies for Sanger sequencing of mtDNA polymorphic regions allow for detection of mixtures, they do not allow for resolving individual sequences in a mixture (3-10). Mitochondrial DNA markers are ideal targets for detecting mixtures since, with few exceptions, a single sequence per contributor is the expected result due to its haploid nature. However, unlike short tandem repeats (STRs), peak areas or heights in sequence electropherograms are not necessarily indicative of the amount of DNA contributed to a mixture (9,11,12). As a result, peak height ratios for two bases cannot be used to determine the relative proportions of components of a mixture for mtDNA Sanger sequencing analysis. For this reason, Sanger sequencing does not allow for determining the individual mtDNA sequence haplotypes of mixed samples. Therefore, when mixed base calls are encountered during mtDNA Sanger sequence analysis of forensic specimens, most forensic laboratories choose not to interpret the result and categorize mtDNA mixture results as inconclusive for reporting purposes (13). Furthermore, Sanger sequencing cannot detect minor components present at less than 10% in a DNA mixture (9,12).The 454 genome sequencing technology is a scalable, clonal, and highly parallel pyrosequencing system that can be used for de novo sequencing of small whole genomes or direct sequencing of DNA products generated by polymerase chain reaction (PCR). The technology uses emulsion PCR (emPCR) to amplify a single DNA sequence to 10 million identical copies. The “clonal sequencing” aspect of the technology enables separation of individual components of a mixture as well as analysis of highly degraded DNA. The clonal sequencing approach used with the 454 GS technology and other next-generation sequencing (NGS) technologies provides a digital readout of the number of reads or individual sequences allowing for a quantitative determination of the components in a mixture (14). Recently, the potential value of using NGS technologies for forensic applications has been demonstrated (15-18). This article aims to describe a highly sensitive NGS method that uses PCR for targeted enrichment of the HVI/HVII regions of mtDNA for resolving simple and complex mixtures as well as detecting low levels of heteroplasmy. 相似文献9.
Emerging technologies in DNA sequencing 总被引:12,自引:1,他引:11
Metzker ML 《Genome research》2005,15(12):1767-1776
Demand for DNA sequence information has never been greater, yet current Sanger technology is too costly, time consuming, and labor intensive to meet this ongoing demand. Applications span numerous research interests, including sequence variation studies, comparative genomics and evolution, forensics, and diagnostic and applied therapeutics. Several emerging technologies show promise of delivering next-generation solutions for fast and affordable genome sequencing. In this review article, the DNA polymerase-dependent strategies of Sanger sequencing, single nucleotide addition, and cyclic reversible termination are discussed to highlight recent advances and potential challenges these technologies face in their development for ultrafast DNA sequencing. 相似文献
10.
11.
12.
We have developed high-throughput DNA sequencing methods that generate high quality data from reactions as small as 400 nL, providing an approximate order of magnitude reduction in reagent use relative to standard protocols. Sequencing of clones from plasmid, fosmid, and BAC libraries yielded read lengths (PHRED20 bases) of 765 +/- 172 (n = 10,272), 621 +/- 201 (n = 1824), and 647 +/- 189 (n = 568), respectively. Implementation of these procedures at high-throughput genome centers could have a substantial impact on the amount of data that can be generated per unit cost. 相似文献
13.
Contamination by present-day human and microbial DNA is one of the major hindrances for large-scale genomic studies using ancient biological material. We describe a new molecular method, U selection, which exploits one of the most distinctive features of ancient DNA—the presence of deoxyuracils—for selective enrichment of endogenous DNA against a complex background of contamination during DNA library preparation. By applying the method to Neanderthal DNA extracts that are heavily contaminated with present-day human DNA, we show that the fraction of useful sequence information increases ∼10-fold and that the resulting sequences are more efficiently depleted of human contamination than when using purely computational approaches. Furthermore, we show that U selection can lead to a four- to fivefold increase in the proportion of endogenous DNA sequences relative to those of microbial contaminants in some samples. U selection may thus help to lower the costs for ancient genome sequencing of nonhuman samples also.High-throughput DNA sequencing and the shift to library-based sample preparation techniques have greatly facilitated genetic research on human evolution in recent years. Increasing amounts of sequence data are becoming available not only from present-day humans but also from ancient human remains, helping to uncover the evolutionary histories of present-day human populations as well as their relationship to extinct archaic groups (Stoneking and Krause 2011). In ancient DNA studies, the most accessible target is mitochondrial (mt) DNA, which is present in several hundreds of copies per cell as opposed to the diploid-only nuclear genome. Consequently, complete mtDNA genomes have been sequenced from more than a dozen remains of ancient modern humans (Ermini et al. 2008; Gilbert et al. 2008; Green et al. 2010; Fu et al. 2013b; Raghavan et al. 2014) as well as representatives of two archaic hominin groups that went extinct during the Late Pleistocene, the Neanderthals (Green et al. 2008; Briggs et al. 2009; Prüfer et al. 2014), and Denisovans (Krause et al. 2010b; Reich et al. 2010). The recent recovery of mitochondrial sequences from an ∼400,000 yr-old hominin from Sima de los Huesos in Spain indicates that more comprehensive sequence data may soon become available even from Middle Pleistocene remains (Meyer et al. 2014). Retrieving nuclear sequences from ancient human material is generally more challenging, but such data have also been generated at various scales, ranging from a few megabases of sequences to full genome sequences determined with high accuracy (see Shapiro and Hofreiter 2014 for a recent summary).Despite these advances, sequencing ancient human DNA continues to be challenging for several reasons. First, only trace amounts of highly fragmented DNA are usually preserved in ancient bones and teeth, imposing limits on the number of sequences that can be recovered from ancient specimens. Second, DNA extracted from ancient material is in many cases dominated by microbial DNA, which often contributes to 99% or more of the sequences, making direct shotgun sequencing economically infeasible. This problem can be partially overcome by hybridization enrichment of hominin sequences, such as those of a small chromosome (Fu et al. 2013a) or, as recently proposed, of sequences from throughout the whole genome (Carpenter et al. 2013). Another approach is restriction digestion of GC-rich sequence motifs, which was performed to change the ratio between endogenous and microbial library molecules in the first study of the Neanderthal genome (Green et al. 2010). A third and particularly severe problem for working with ancient human samples is present-day human contamination, which is inevitably introduced during excavation and laboratory work. Fortunately, a solid framework for validating the authenticity of ancient human sequences can be established using the distinct pattern of substitutions caused by cytosine deamination in ancient DNA sequences (Hofreiter et al. 2001; Briggs et al. 2007). The deamination product of cytosine is uracil, which is read as thymine by most DNA polymerases. Resulting C to T substitutions (or G to A substitutions, depending on the orientation in which DNA strands are sequenced and the method used to prepare DNA libraries) are particularly frequent at the 5′ and 3′ ends of sequences due to the higher rate of cytosine deamination in single-stranded overhangs (Lindahl 1993). Importantly, the frequency of deamination-induced substitutions correlates with sample age (Sawyer et al. 2012) and is low in present-day human contamination (Krause et al. 2010a). These substitutions can thus be taken as evidence for the presence of authentic ancient human sequences. Deamination-induced substitutions have also been exploited for separating ancient sequences from present-day human contamination in silico (Skoglund et al. 2012, 2014a,b; Meyer et al. 2014; Raghavan et al. 2014). Although effective in principle, this approach is costly, because a large proportion of sequence data is excluded from downstream analysis. Furthermore, ancient DNA base damage is not determined directly and may occasionally be confounded with evolutionary sequence differences.Here we describe a novel laboratory technique, uracil selection (“U selection”), which enables physical separation of uracil-containing DNA strands from nondeaminated strands at the stage of DNA library preparation. Our method builds on a single-stranded library preparation method, which has been shown to be particularly efficient for retrieving sequences from highly degraded ancient DNA (Meyer et al. 2012; Gansauge and Meyer 2013). We apply U selection to several Neanderthal DNA extracts and show that it is a powerful tool for enriching Neanderthal DNA sequences against a background of present-day human contamination. We also report cases where U selection drastically increases the proportion of Neanderthal DNA relative to microbial DNA. 相似文献
14.
15.
Djoulah S Le Monnier de Gouville I Martuchou-Dehay C Busson M Charron D Hors J 《Pathologie-biologie》1999,47(8):812-818
The HLA-C locus was sequenced in 106 normal unrelated members of the French CEPH families. Following generic PCR amplification, exons 2 and 3 were amplified separately then sequenced using the ALF Expres sequencer. The Sequi Typer program was used for data analysis. Of the 72 alleles identified to date, 20 were recognized in the panel studied. Results were compared to those provided by the lymphocytotoxicity test, which had a 13.5% error rate and failed to reach the level of specific recognition. Sequencing preceded by amplification allowed immediate unambiguous allele assignment in 96% of cases. In four cases, a complementary method was required to resolve ambiguities. Reproducibility was high. The sequencing strategy described herein is a significant advance and may be particularly valuable for achieving perfect donor/recipient matching for allogeneic stem cell transplants. 相似文献
16.
Pyrosequencing sheds light on DNA sequencing 总被引:23,自引:0,他引:23
Ronaghi M 《Genome research》2001,11(1):3-11
DNA sequencing is one of the most important platforms for the study of biological systems today. Sequence determination is most commonly performed using dideoxy chain termination technology. Recently, pyrosequencing has emerged as a new sequencing methodology. This technique is a widely applicable, alternative technology for the detailed characterization of nucleic acids. Pyrosequencing has the potential advantages of accuracy, flexibility, parallel processing, and can be easily automated. Furthermore, the technique dispenses with the need for labeled primers, labeled nucleotides, and gel-electrophoresis. This article considers key features regarding different aspects of pyrosequencing technology, including the general principles, enzyme properties, sequencing modes, instrumentation, and potential applications. 相似文献
17.
18.
Improvements in technology have reduced the cost of DNA sequencing to the point that the limiting factor for many experiments is the time and reagent cost of sample preparation. We present an approach in which 192 sequencing libraries can be produced in a single day of technician time at a cost of about $15 per sample. These libraries are effective not only for low-pass whole-genome sequencing, but also for simultaneously enriching them in pools of approximately 100 individually barcoded samples for a subset of the genome without substantial loss in efficiency of target capture. We illustrate the power and effectiveness of this approach on about 2000 samples from a prostate cancer study. 相似文献
19.
Recent advances in DNA sequencing technologies, both in the form of high lane-density gels and automated capillary systems, will lead to an increased requirement for sample preparation systems that operate at low cost and high throughput. As part of the development of a fully automated sequencing system, we have developed an automated subsystem capable of producing 10,000 sequence-ready ssDNA templates per day from libraries of M13 plaques at a cost of $0.29 per sample. This Front End has been in high throughput operation since June, 1997 and has produced > 400,000 high-quality DNA templates. 相似文献
20.
《Human immunology》2015,76(12):923-927
This communication describes our experience in large-scale G group-level high resolution HLA typing using three different DNA sequencing platforms – ABI 3730 xl, Illumina MiSeq and PacBio RS II. Recent advances in DNA sequencing technologies, so-called next generation sequencing (NGS), have brought breakthroughs in deciphering the genetic information in all living species at a large scale and at an affordable level. The NGS DNA indexing system allows sequencing multiple genes for large number of individuals in a single run. Our laboratory has adopted and used these technologies for HLA molecular testing services. We found that each sequencing technology has its own strengths and weaknesses, and their sequencing performances complement each other. HLA genes are highly complex and genotyping them is quite challenging. Using these three sequencing platforms, we were able to meet all requirements for G group-level high resolution and high volume HLA typing. 相似文献