首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Krogh A 《Genome research》2000,10(4):523-528
The application of the gene finder HMMGene to the Adh region of the Drosophila melanogaster is described, and the prediction results are analyzed. HMMGene is based on a probabilistic model called a hidden Markov model, and the probabilistic framework facilitates the inclusion of database matches of varying degrees of certainty. It is shown that database matches clearly improve the performance of the gene finder. For instance, the sensitivity for coding exons predicted with both ends correct grows from 62% to 70% on a high-quality test set, when matches to proteins, cDNAs, repeats, and transposons are included. The specificity drops more than the sensitivity increases when ESTs are used. This is due to the high noise level in EST matches, and it is discussed in more detail why this is and how it might be improved.  相似文献   

2.
3.
Recent studies show that along with single nucleotide polymorphisms and small indels, larger structural variants among human individuals are common. The Human Genome Structural Variation Project aims to identify and classify deletions, insertions, and inversions (>5 Kbp) in a small number of normal individuals with a fosmid-based paired-end sequencing approach using traditional sequencing technologies. The realization of new ultra-high-throughput sequencing platforms now makes it feasible to detect the full spectrum of genomic variation among many individual genomes, including cancer patients and others suffering from diseases of genomic origin. Unfortunately, existing algorithms for identifying structural variation (SV) among individuals have not been designed to handle the short read lengths and the errors implied by the “next-gen” sequencing (NGS) technologies. In this paper, we give combinatorial formulations for the SV detection between a reference genome sequence and a next-gen-based, paired-end, whole genome shotgun-sequenced individual. We describe efficient algorithms for each of the formulations we give, which all turn out to be fast and quite reliable; they are also applicable to all next-gen sequencing methods (Illumina, 454 Life Sciences [Roche], ABI SOLiD, etc.) and traditional capillary sequencing technology. We apply our algorithms to identify SV among individual genomes very recently sequenced by Illumina technology.Recent introduction of the next-generation sequencing technologies has significantly changed how genomics research is conducted (Mardis 2008). High-throughput, low-cost sequencing technologies such as pyrosequencing (454 Life Sciences [Roche]), sequencing-by-synthesis (Illumina and Helicos), and sequencing-by-ligation (ABI SOLiD) methods produce shorter reads than the traditional capillary sequencing, but they also increase the redundancy by 10- to 100-fold or more (Shendure et al. 2004; Mardis 2008). With the arrival of these new sequencing technologies, along with the capability of sequencing paired-ends (or “matepairs”) of a clone insert that follows a tight length distribution (Raphael et al. 2003; Volik et al. 2003; Dew et al. 2005; Tuzun et al. 2005; Korbel et al. 2007; Bashir et al. 2008; Kidd et al. 2008; Lee et al. 2008), it is becoming feasible to perform detailed and comprehensive genome variation and rearrangement studies.The genetic variation among human individuals has been traditionally analyzed at the single nucleotide polymorphism (SNP) level as demonstrated by the HapMap Project (International HapMap Consortium 2003, 2005), where the genomes of 270 individuals were systematically genotyped for 3.1 million SNPs. However, human genetic variation extends beyond SNPs. The Human Genome Structural Variation Project (Eichler et al. 2007) has been initiated to identify and catalog structural variation (SV). In the broadest sense, SV can be defined as the genomic changes among individuals that are not single nucleotide variants (Tuzun et al. 2005; Eichler et al. 2007). These include insertions, deletions, duplications, inversions, and translocations (Feuk et al. 2006; Sharp et al. 2006) (see Supplemental material for details on types of SV).End-sequence profiling (ESP) was first presented by Volik et al. (2003) and Raphael et al. (2003) to discover SV events using bacterial artificial chromosome (BAC) end sequences to map structural rearrangements in cancer cell line genomes, and it was used by Tuzun et al. (2005) to systematically discover structural variants in the genome of a human individual. Several other genome-wide studies (Iafrate et al. 2004; Sebat et al. 2004; Redon et al. 2006; Cooper et al. 2007; Korbel et al. 2007) demonstrated that SV among normal individuals is common and ubiquitous. More recently, Kidd et al. (2008) detected, experimentally validated, and sequenced SV from eight different individuals. The ESP method was also utilized by Dew et al. (2005) to evaluate and compare assemblies and detect assembly breakpoints.As the promise of these next-generation sequencing (NGS) technologies became reality with the publication of the first three human genomes sequenced with NGS platforms (Bentley et al. 2008; Wang et al. 2008; Wheeler et al. 2008), the sequencing of more than 1000 individuals (http://www.1000genomes.org), computational methods for analyzing and managing the massive numbers of the short-read pairs produced by these platforms are urgently needed to effectively detect SNPs, SVs, and copy-number variants (Pop and Salzberg 2008). Since most SV events are found in the duplicated regions (Eichler et al. 2007; Kidd et al. 2008), the algorithms must also be able to discover variation in the repetitive regions of the human genome.Detection of SVs in the human genome using NGS technologies was first presented by Korbel et al. (2007). In this study, paired-end sequences generated with the 454 Life Sciences (Roche) platform were employed to detect SVs in two human individuals; however, the same algorithms and heuristics designed for capillary-based sequencing presented by Tuzun et al. (2005) were used, and no further optimizations for NGS were introduced. Campbell et al. (2008) employed Illumina sequencing to discover genome rearrangements in cancer cell lines; however, they considered one “best” paired map location per insert, by the use of the alignment tool MAQ (Li et al. 2008), and thus did not utilize the full information produced by high-throughput sequencing methods. In the first study on the genome sequenced with a NGS platform (Illumina) that produced paired-end sequences, Bentley et al. (2008) also detected SVs using the same methods and unique map locations of the sequenced reads.More recently, Lee et al. (2008) presented a probabilistic method for detecting SV. In this work, a scoring function for each SV was defined as a weighted sum of (1) sequence similarity, (2) length of SV, and (3) the square of the number of paired-end reads supporting the SV. The scoring function was computed via a hill-climbing strategy to assign paired-end reads to SVs. In theory, the method of Lee et al. (2008) can be applied to data generated by new sequencing technologies; however, the experiments presented in this work were based on capillary sequencing (Levy et al. 2007). In another study, Bashir et al. (2008) presented a computational framework to evaluate the use of paired-end sequences to detect genome rearrangements and fusion genes in cancer; note that no NGS data were utilized in this study due to lack of availability of sequences at the time of publication.In this paper, we present novel combinatorial algorithms for SV detection using the paired-end, NGS methods. In comparison to “naïve” methods for SV detection, our algorithms evaluate all potential mapping locations of each paired-end read and decide on the final mapping and the SVs they imply interdependently. We define two alternative formulations for the problem of computationally predicting the SV between a reference genome sequence (i.e., human genome assembly) and a set of paired-end reads from a whole genome shotgun (WGS) sequence library obtained via an NGS method from an individual genome sequence. The first formulation, which aims to obtain the most parsimonious mapping of paired-end reads to the potential structural variants, is called Maximum Parsimony Structural Variation Problem (MPSV). MPSV problem turns out to be NP-hard; we give a simple O(log n) approximation algorithm to solve this problem in polynomial time. This algorithm is based on the classical approximation algorithm to solve the “Set-Cover” problem from the combinatorial algorithms literature and thus is called the VariationHunter-Set Cover method (abbreviated VariationHunter-SC). The second formulation aims to calculate the probability of each SV. For this variant we give expressions for (1) the probability of each possible SV conditioned on other SVs and the paired-end reads that “support them,” and (2) the probability of mapping each paired-end read to a particular location, conditioned on the set of SVs that are “realized.” We show how to obtain a consistent set of solutions to these expressions iteratively. The resulting algorithm is called VariationHunter-Probabilistic (VariationHunter-Pr). We test our algorithms (VariationHunter-SC and VariationHunter-Pr) on a paired-end WGS library generated with Illumina technology and compare the results with the validated SV set from the genome of the same individual, obtained via fosmid-based capillary end-sequencing (Kidd et al. 2008). We compare our results with the SV calls reported earlier on the same data set (Bentley et al. 2008), which was based on mapping each paired-end read to a single location (with the minimum number of mismatches) and clustering the mappings greedily to obtain the SVs.  相似文献   

4.
目的建立一种疟原虫雌性配子体的分子检测法。方法根据疟原虫雌性配子体特异性mRNA靶标(待测靶序列)即动合子表面蛋白(s25)的转录产物,设计特异性的捕获探针和连接探针。血液样品经裂解释放的mRNAs,无需核酸提取,通过"三明治"杂交被捕获到96孔板表面。洗去未结合探针后,将结合在mRNA靶标上的连接探针进行连接,得到两端为特殊设计序列的单链扩增模板。再用通用引物进行染料法qPCR扩增,或在端部设计TaqMan探针序列,用通用引物和通用TaqMan探针进行探针法qPCR扩增。评价这一基于捕获和连接扩增的方法(CLIP-PCR)的灵敏度、特异性和重复性并与普通的RT-qPCR方法进行比较,将其应用于临床样品的检测。结果该CLIP-PCR具有较高的灵敏度和特异性。与普通的RT-qPCR一样,均可检测低至11拷贝数的s25 mRNA靶标;而且该CLIP-PCR操作更简便。该CLIP-PCR可准确检测到疟疾患者血液中的雌性配子体。可将96个样品的检测时间缩短至3 h。结论建立了灵敏高效的染料法和通用TaqMan探针法CLIP-PCR检测疟原虫雌性配子体,为疟疾传播的控制、配子体大规模筛查奠定了基...  相似文献   

5.
Quality scores and SNP detection in sequencing-by-synthesis systems   总被引:3,自引:1,他引:2  
Promising new sequencing technologies, based on sequencing-by-synthesis (SBS), are starting to deliver large amounts of DNA sequence at very low cost. Polymorphism detection is a key application. We describe general methods for improved quality scores and accurate automated polymorphism detection, and apply them to data from the Roche (454) Genome Sequencer 20. We assess our methods using known-truth data sets, which is critical to the validity of the assessments. We developed informative, base-by-base error predictors for this sequencer and used a variant of the phred binning algorithm to combine them into a single empirically derived quality score. These quality scores are more useful than those produced by the system software: They both better predict actual error rates and identify many more high-quality bases. We developed a SNP detection method, with variants for low coverage, high coverage, and PCR amplicon applications, and evaluated it on known-truth data sets. We demonstrate good specificity in single reads, and excellent specificity (no false positives in 215 kb of genome) in high-coverage data.  相似文献   

6.
We evaluated a two-step algorithm for detection of Clostridium difficile in 1,468 stool specimens. First, specimens were screened by an immunoassay for C. difficile glutamate dehydrogenase antigen (C.DIFF CHEK-60). Second, screen-positive specimens underwent toxin testing by a rapid toxin A/B assay (TOX A/B QUIK CHEK); toxin-negative specimens were subjected to stool culture. This algorithm allowed final results for 92% of specimens with a turnaround time of 4 h.  相似文献   

7.
We propose a bioinformatics pipeline in which we use an ESTs database to predict and validate single-nucleotide polymorphisms (SNPs) directly linked to gene-coding regions at the HLA class I genes (HLA-A, HLA-B and HLA-C). Annotation originated from our analysis revealed various classes of possible new variations that may indicate possible new alleles. Thus, bioinformatics pipelines seem to be useful approaches to help screening for novel genetic variations at the HLA panel, and further analysis will foster this aim to provide celerity at the massive analysis of data currently generated in large-scale high-throughput experiments.  相似文献   

8.
9.
An automated technique is described which is capable of detecting sickle-cell haemoglobin and differentiating the sickle-cell trait from sickle-cell anaemia. The method is based upon the Itano solubility test and utilizes Technicon equipment.  相似文献   

10.
Goldrick MM 《Human mutation》2001,18(3):190-204
Mutation detection based on ribonuclease cleavage of basepair mismatches in single-stranded RNA probes hybridized to DNA targets was first described over 15 years ago. The original methods relied on RNase A for mismatch cleavage; however, this enzyme fails to cleave many mismatches and has other drawbacks. More recently, a new method for RNase-cleavage-based mutation scanning has been developed, which takes advantage of the ability of RNase 1 and RNase T1 to cleave mismatches in duplex RNA targets, when these enzymes are used in conjunction with nucleic acid intercalating dyes. The method, called NIRCA, is relatively low-cost in terms of materials and equipment required. It is being used to detect mutations and SNPs in a wide variety of genes involved in human genetic disease and cancer, as well as in disease-related viral and bacterial genes. This review describes historical and recently developed RNase cleavage-based methods for mutation/SNP scanning.  相似文献   

11.
A rapid, sensitive and automated assay procedure was developed for the in vitro evaluation of anti-HIV agents. An HTLV-I transformed T4-cell line, MT-4, which was previously shown by Koyanagi et al. (1985) to be highly susceptible to, and permissive for, HIV infection, served as the target cell line. Inhibition of the HIV-induced cytopathic effect was used as the end point. The viability of both HIV-and mock-infected cells was assessed spectrophotometrically via the in situ reduction of 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT). The procedure was optimized as to make optimal use of multichannel pipettes, microprocessor-controlled dispensing and optical density reading. The absorbance ratio of the mock-infected control to the HIV-infected samples was about 20. This allowed an accurate determination of the 50% effective doses, as demonstrated for 3'-azido-2',3'-dideoxythymidine (AZT), 2',3'-dideoxycytidine (ddCyd), dextran sulfate and heparin. The technique significantly reduced labor time as compared to the trypan blue exclusion method, and permits the evaluation of large numbers of compounds for their anti-HIV activity.  相似文献   

12.
13.
Mathematics ability and disability is as heritable as other cognitive abilities and disabilities, however its genetic etiology has received relatively little attention. In our recent genome-wide association study of mathematical ability in 10-year-old children, 10 SNP associations were nominated from scans of pooled DNA and validated in an individually genotyped sample. In this paper, we use a ‘SNP set’ composite of these 10 SNPs to investigate gene-environment (GE) interaction, examining whether the association between the 10-SNP set and mathematical ability differs as a function of ten environmental measures in the home and school in a sample of 1888 children with complete data. We found two significant GE interactions for environmental measures in the home and the school both in the direction of the diathesis-stress type of GE interaction: The 10-SNP set was more strongly associated with mathematical ability in chaotic homes and when parents are negative.  相似文献   

14.
15.
This study describes a new tool for accurate and reliable high-throughput detection of copy number variation in the human genome. We have constructed a large-insert clone DNA microarray covering the entire human genome in tiling path resolution that we have used to identify copy number variation in human populations. Crucial to this study has been the development of a robust array platform and analytic process for the automated identification of copy number variants (CNVs). The array consists of 26,574 clones covering 93.7% of euchromatic regions. Clones were selected primarily from the published "Golden Path," and mapping was confirmed by fingerprinting and BAC-end sequencing. Array performance was extensively tested by a series of validation assays. These included determining the hybridization characteristics of each individual clone on the array by chromosome-specific add-in experiments. Estimation of data reproducibility and false-positive/negative rates was carried out using self-self hybridizations, replicate experiments, and independent validations of CNVs. Based on these studies, we developed a variance-based automatic copy number detection analysis process (CNVfinder) and have demonstrated its robustness by comparison with the SW-ARRAY method.  相似文献   

16.
Kao EF  Lee C  Hsu JS  Jaw TS  Liu GC 《Medical physics》2006,33(1):118-123
Abnormalities in chest images often present as abnormal opacity or abnormal asymmetry. We have developed a novel method for automated detection of abnormalities in chest radiographs by use of these features. Our method is based on an analysis of the projection profile obtained by projecting the pixels data of a frontal chest image on to the mediolateral axis. Two indices, lung opacity index and lung symmetry index, are computed from the projection profile. Lung opacity index and lung symmetry index are then combined to detect gross abnormalities in chest radiographs. The values of lung opacity index are found to be 0.38 +/- 0.05 and 0.37 +/- 0.06 for normal right and left lung, respectively. The values of lung symmetry index are found to be 0.018 +/- 0.014 for normal chest images. The discrimination for the combination of the two indices is evaluated by linear discriminant analysis and receiver operating characteristic (ROC) analysis. Area Az under the ROC curve with the combination of the two indices in the classification of normal and abnormal chest images is 0.963.  相似文献   

17.
Neisseria gonorrhoeae is the most common sexually transmitted disease-causing bacterium worldwide. An in-house PCR assay targeting the carbamoyl-phosphate synthase subunit A (carA) gene was developed for the specific detection of N. gonorrhoeae in clinical specimens. Samples from 605 patients were cultured on selective medium and assayed by PCR in a double-blind fashion. Of 605 urethral/cervical samples analysed, 13 were PCR-positive, of which 11 were culture-positive. The PCR showed a sensitivity and specificity of 100% and 99.7% with these samples. PCR targeting the carA gene appears to be a reliable method for the detection of N. gonorrhoeae in clinical specimens.  相似文献   

18.
We present the nature of pathogenic SNP array findings in pregnancies without ultrasound (US) abnormalities and show the additional diagnostic value of SNP array as compared with rapid aneuploidy detection and karyotyping. 1,330 prenatal samples were investigated with a 0.5‐Mb SNP array after the exclusion of the most common aneuploidies. In 2.7% (36/1,330) of the cases, pathogenic chromosome aberrations were found; a microscopically detectable abnormality in 0.7% and a submicroscopic aberration in 2%. Our results show that in addition to the age‐ or screening‐related aneuploidy risk, in pregnancies without US abnormalities, there is a risk of 1:148 (9/1,330) for a (sub)microscopic abnormality associated with an early‐onset often severe disease, 1:222 (6/1,330) for a submicroscopic aberration causing an early‐onset disease, 1:74 (18/1,330) for carrying a susceptibility locus for a neurodevelopmental disorder, and 1:443 (3/1,330) for a late‐onset disorder (hereditary neuropathy with liability to pressure palsies in all three cases). These risk figures are important for adequate pretest counseling so that prospective parents can make informed individualized choices between targeted prenatal testing and broad testing with SNP array. Based on our results, we believe if invasive testing is performed, SNP array should be the preferred cytogenetic technique irrespective of the indication.  相似文献   

19.
Bioluminescent indicators have found many uses in, for example, the detection of ATP and free calcium levels. However, such probes often emit only very low levels of light and it is important to optimize the efficiency of the system used to detect this light. We describe some of the problems encountered in using photomultiplier tubes for detecting low levels of light, and some ways of overcoming these problems. We have developed a versatile photomultiplier light detection system which is both efficient and physically small. This system is described and details of its fabrication are given.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号