首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Cryptosporidium parvum is a protozoan enteropathogen that infects humans and animals and causes a pronounced diarrheal disease that can be life-threatening in immunocompromised hosts. No specific chemo- or immunotherapies exist to treat cryptosporidiosis and little molecular information is available to guide development of such therapies. To accelerate gene discovery and identify genes encoding potential drug and vaccine targets we constructed sporozoite cDNA and genomic DNA sequencing libraries from the Iowa isolate of C. parvum and determined approximately 2000 sequence tags by single-pass sequencing of random clones. Together, the 567 expressed sequence tags (ESTs) and 1507 genome survey sequences (GSSs) totaled one megabase (1 mb) of unique genomic sequence indicating that approximately 10% of the 10.4 mb C. parvum genome has been sequence tagged in this gene discovery expedition. The tags were used to search the public nucleic acid and protein databases via BLAST analyses, and 180 ESTs (32%) and 277 GSSs (18%) exhibited similarity with database sequences at smallest sum probabilities P(N)< or =10(-8). Some tags encoded proteins with clear therapeutic potential including S-adenosylhomocysteine hydrolase, histone deacetylase, polyketide/fatty-acid synthases, various cyclophilins, thrombospondin-related cysteine-rich protein and ATP-binding-cassette transporters. Several anonymous ESTs encoded proteins predicted to contain signal peptides or multiple transmembrane spanning segments suggesting they were destined for membrane-bound compartments, the cell surface or extracellular secretion. One-hundred four simple sequence repeats were identified within the nonredundant sequence tag collection with (TAA)(> or =6)/(TTA)(> or =6) and (TA)(> or = 10)/(AT)(> or =10 ) being the most prevalent, occurring 40 and 15 times, respectively. Various cellular RNAs and their genes were also identified including the small and large ribosomal RNAs, five tRNAs, the U2 small nuclear RNA, and the small and large virus-like, double-stranded RNAs. This investigation has demonstrated that survey sequencing is an efficient procedure for gene discovery and genome characterization and has identified and sequence tagged many C. parvum genes encoding potential therapeutic targets.  相似文献   

2.
Efficient sequencing of animal and plant genomes by next-generation technology should allow many neglected organisms of biological and medical importance to be better understood. As a test case, we have assembled a draft genome of Caenorhabditis sp. 3 PS1010 through a combination of direct sequencing and scaffolding with RNA-seq. We first sequenced genomic DNA and mixed-stage cDNA using paired 75-nt reads from an Illumina GAII. A set of 230 million genomic reads yielded an 80-Mb assembly, with a supercontig N50 of 5.0 kb, covering 90% of 429 kb from previously published genomic contigs. Mixed-stage poly(A)(+) cDNA gave 47.3 million mappable 75-mers (including 5.1 million spliced reads), which separately assembled into 17.8 Mb of cDNA, with an N50 of 1.06 kb. By further scaffolding our genomic supercontigs with cDNA, we increased their N50 to 9.4 kb, nearly double the average gene size in C. elegans. We predicted 22,851 protein-coding genes, and detected expression in 78% of them. Multigenome alignment and data filtering identified 2672 DNA elements conserved between PS1010 and C. elegans that are likely to encode regulatory sequences or previously unknown ncRNAs. Genomic and cDNA sequencing followed by joint assembly is a rapid and useful strategy for biological analysis.  相似文献   

3.
Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10(-5)) to sequences deposited in the GenBank databases. Among them, 925 represent putative novel genes for the Brucella genus. Out of 925 nonredundant GSSs, 470 were classified in 15 categories based on cellular function. Seven hundred GSSs showed no significant database matches and remain available for further studies in order to identify their function. A high number of GSSs with homology to Agrobacterium tumefaciens and Rhizobium meliloti proteins were observed, thus confirming their close phylogenetic relationship. Among them, several GSSs showed high similarity with genes related to nodule nitrogen fixation, synthesis of nod factors, nodulation protein symbiotic plasmid, and nodule bacteroid differentiation. We have also identified several B. abortus homologs of virulence and pathogenesis genes from other pathogens, including a homolog to both the Shda gene from Salmonella enterica serovar Typhimurium and the AidA-1 gene from Escherichia coli. Other GSSs displayed significant homologies to genes encoding components of the type III and type IV secretion machineries, suggesting that Brucella might also have an active type III secretion machinery.  相似文献   

4.
Cot-based sequence discovery represents a powerful means by which both low-copy and repetitive sequences can be selectively and efficiently fractionated, cloned, and characterized. Based upon the results of a Cot analysis, hydroxyapatite chromatography was used to fractionate sorghum (Sorghum bicolor) genomic DNA into highly repetitive (HR), moderately repetitive (MR), and single/low-copy (SL) sequence components that were consequently cloned to produce HRCot, MRCot, and SLCot genomic libraries. Filter hybridization (blotting) and sequence analysis both show that the HRCot library is enriched in sequences traditionally found in high-copy number (e.g., retroelements, rDNA, centromeric repeats), the SLCot library is enriched in low-copy sequences (e.g., genes and "nonrepetitive ESTs"), and the MRCot library contains sequences of moderate redundancy. The Cot analysis suggests that the sorghum genome is approximately 700 Mb (in agreement with previous estimates) and that HR, MR, and SL components comprise 15%, 41%, and 24% of sorghum DNA, respectively. Unlike previously described techniques to sequence the low-copy components of genomes, sequencing of Cot components is independent of expression and methylation patterns that vary widely among DNA elements, developmental stages, and taxa. High-throughput sequencing of Cot clones may be a means of "capturing" the sequence complexity of eukaryotic genomes at unprecedented efficiency.  相似文献   

5.
Repetitive DNA sequences of the Leishmania donovani genome have been identified by screening a recombinant DNA library made by cloning sheared genomic DNA into the vector pAT153. Bacterial clones containing a highly repetitive DNA sequence have been isolated. DNA sequencing has shown that this sequence is composed of tandem repeats of the sequence 5'-CCCTAA-3'. This sequence is identical to the telomeric repeats found in Trypanosoma brucei and hybridizes to all Leishmania chromosomes. In this study we show that there is considerable heterogeneity in the distribution and copy number of this repeat and associated hybridising sequences throughout the genomes of different Leishmania species.  相似文献   

6.
Segmental duplications play fundamental roles in both genomic disease and gene evolution. To understand their organization within the human genome, we have developed the computational tools and methods necessary to detect identity between long stretches of genomic sequence despite the presence of high copy repeats and large insertion-deletions. Here we present our analysis of the most recent genome assembly (January 2001) in which we focus on the global organization of these segments and the role they play in the whole-genome assembly process. Initially, we considered only large recent duplication events that fell well-below levels of draft sequencing error (alignments 90%-98% similar and > or =1 kb in length). Duplications (90%-98%; > or =1 kb) comprise 3.6% of all human sequence. These duplications show clustering and up to 10-fold enrichment within pericentromeric and subtelomeric regions. In terms of assembly, duplicated sequences were found to be over-represented in unordered and unassigned contigs indicating that duplicated sequences are difficult to assign to their proper position. To assess coverage of these regions within the genome, we selected BACs containing interchromosomal duplications and characterized their duplication pattern by FISH. Only 47% (106/224) of chromosomes positive by FISH had a corresponding chromosomal position by comparison. We present data that indicate that this is attributable to misassembly, misassignment, and/or decreased sequencing coverage within duplicated regions. Surprisingly, if we consider putative duplications >98% identity, we identify 10.6% (286 Mb) of the current assembly as paralogous. The majority of these alignments, we believe, represent unmerged overlaps within unique regions. Taken together the above data indicate that segmental duplications represent a significant impediment to accurate human genome assembly, requiring the development of specialized techniques to finish these exceptional regions of the genome. The identification and characterization of these highly duplicated regions represents an important step in the complete sequencing of a human reference genome.  相似文献   

7.
It is critical to avoid delays in detecting strain manipulations, such as the addition/deletion of a gene or modification of genes for increased virulence or antibiotic resistance, using genome analysis during an epidemic outbreak or a bioterrorist attack. Our objective was to evaluate the efficiency of genome analysis in such an emergency context by using contigs produced by pyrosequencing without time-consuming finishing processes and comparing them to available genomes for the same species. For this purpose, we analyzed a clinical isolate of Francisella tularensis subspecies holarctica (strain URFT1), a potential biological weapon, and compared the data obtained with available genomic sequences of other strains. The technique provided 1,800,530 bp of assembled sequences, resulting in 480 contigs. We found by comparative analysis with other strains that all the gaps but one in the genome sequence were caused by repeats. No new genes were found, but a deletion was detected that included three putative genes and part of a fourth gene. The set of 35 candidate LVS virulence attenuation genes was identified, as well as a DNA gyrase mutation associated with quinolone resistance. Selection for variable sequences in URFT1 allowed the design of a strain-specific, highly effective typing system that was applied to 74 strains and six clinical specimens. The analysis presented herein may be completed within approximately 6 wk, a duration compatible with that required by an urgent context. In the bioterrorism context, it allows the rapid detection of strain manipulation, including intentionally added virulence genes and genes that support antibiotic resistance.  相似文献   

8.
9.
Detailed restriction maps of microbial genomes are a valuable resource in genome sequencing studies but are toilsome to construct by contig construction of maps derived from cloned DNA. Analysis of genomic DNA enables large stretches of the genome to be mapped and circumvents library construction and associated cloning artifacts. We used pulsed-field gel electrophoresis purified Plasmodium falciparum chromosome 2 DNA as the starting material for optical mapping, a system for making ordered restriction maps from ensembles of individual DNA molecules. DNA molecules were bound to derivatized glass surfaces, cleaved with NheI or BamHI, and imaged by digital fluorescence microscopy. Large pieces of the chromosome containing ordered DNA restriction fragments were mapped. Maps were assembled from 50 molecules producing an average contig depth of 15 molecules and high-resolution restriction maps covering the entire chromosome. Chromosome 2 was found to be 976 kb by optical mapping with NheI, and 946 kb with BamHI, which compares closely to the published size of 947 kb from large-scale sequencing. The maps were used to further verify assemblies from the plasmid library used for sequencing. Maps generated in silico from the sequence data were compared to the optical mapping data, and good correspondence was found. Such high-resolution restriction maps may become an indispensable resource for large-scale genome sequencing projects.  相似文献   

10.
Pedersen C  Wu B  Giese H 《Current genetics》2002,42(2):103-113
A bacterial artificial chromosome (BAC) library of Blumeria graminis f.sp. hordei, containing 12,000 clones with an average insert size of 41 kb, was constructed. The library represents about three genome equivalents and BAC-end sequencing showed a high content of repetitive sequences, making contig-building difficult. To identify overlapping clones, several strategies were used: colony hybridisation, PCR screening, fingerprinting techniques and the use of single-copy expressed sequence tags. The latter proved to be the most efficient method for identification of overlapping clones. Two contigs, at or close to avirulence loci, were constructed. Single nucleotide polymorphism (SNP) markers were developed from BAC-end sequences to link the contigs to the genetic maps. Two other BAC contigs were used to study microsynteny between B. graminis and two other ascomycetes, Neurospora crassa and Aspergillus fumigatus. The library provides an invaluable tool for the isolation of avirulence genes from B. graminis and for the study of gene synteny between this fungus and other fungi.  相似文献   

11.
Actin is an ubiquitous and highly conserved microfilament protein which is hypothesized to play a mechanical, force-generating role in the unusual gliding motility of sporozoan zoites and in their active penetration of host cells. We have identified and isolated an actin gene from a Cryptosporidium parvum genomic DNA library using a chicken beta-actin cDNA as an hybridization probe. The nucleotide sequences of two overlapping recombinant clones were identical and the amino acid sequence deduced from the single open reading frame was 85 % identical to the P. falciparum actin I and human gamma-actin proteins. The predicted 42 106-Da Cryptosporidium actin contains 376 amino acids and is encoded by a single-copy gene which contains no introns. The nucleic acid coding sequence is 72% biased to the use of A or T in the third position of codons. Chromosome-sized DNA released from intact C. parvum oocysts was resolved by OFAGE into 5 discrete ethidium bromide-staining DNAs ranging in size from 900 to 1400 kb; the cloned C. parvum actin gene hybridized to a single chromosomal DNA of approximately 1200 kb.  相似文献   

12.
The success of the ongoing Human Genome Project has resulted in accelerated plans for completing the human genome sequence and the earlier-than-anticipated initiation of efforts to sequence the mouse genome. As a complement to these efforts, we are utilizing the available human sequence to refine human-mouse comparative maps and to assemble sequence-ready mouse physical maps. Here we describe how the first glimpses of genomic sequence from human chromosome 7 are directly facilitating these activities. Specifically, we are actively enhancing the available human-mouse comparative map by analyzing human chromosome 7 sequence for the presence of orthologs of mapped mouse genes. Such orthologs can then be precisely positioned relative to mapped human STSs and other genes. The chromosome 7 sequence generated to date has allowed us to more than double the number of genes that can be placed on the comparative map. The latter effort reveals that human chromosome 7 is represented by at least 20 orthologous segments of DNA in the mouse genome. A second component of our program involves systematically analyzing the evolving human chromosome 7 sequence for the presence of matching mouse genes and expressed-sequence tags (ESTs). Mouse-specific hybridization probes are designed from such sequences and used to screen a mouse bacterial artificial chromosome (BAC) library, with the resulting data used to assemble BAC contigs based on probe-content data. Nascent contigs are then expanded using probes derived from newly generated BAC-end sequences. This approach produces BAC-based sequence-ready maps that are known to contain a gene(s) and are homologous to segments of the human genome for which sequence is already available. Our ongoing efforts have thus far resulted in the isolation and mapping of >3,800 mouse BACs, which have been assembled into >100 contigs. These contigs include >250 genes and represent approximately 40% of the mouse genome that is homologous to human chromosome 7. Together, these approaches illustrate how the availability of genomic sequence directly facilitates studies in comparative genomics and genome evolution.  相似文献   

13.
The genetic similarity between Mycobacterium avium subsp. paratuberculosis and other mycobacterial species has confounded the development of M. avium subsp. paratuberculosis-specific diagnostic reagents. Random shotgun sequencing of the M. avium subsp. paratuberculosis genome in our laboratories has shown >98% sequence identity with Mycobacterium avium subsp. avium in some regions. However, an in silico comparison of the largest annotated M. avium subsp. paratuberculosis contigs, totaling 2,658,271 bp, with the unfinished M. avium subsp. avium genome has revealed 27 predicted M. avium subsp. paratuberculosis coding sequences that do not align with M. avium subsp. avium sequences. BLASTP analysis of the 27 predicted coding sequences (genes) shows that 24 do not match sequences in public sequence databases, such as GenBank. These novel sequences were examined by PCR amplification with genomic DNA from eight mycobacterial species and ten independent isolates of M. avium subsp. paratuberculosis. From these analyses, 21 genes were found to be present in all M. avium subsp. paratuberculosis isolates and absent from all other mycobacterial species tested. One region of the M. avium subsp. paratuberculosis genome contains a cluster of eight genes, arranged in tandem, that is absent in other mycobacterial species. This region spans 4.4 kb and is separated from other predicted coding regions by 1,408 bp upstream and 1,092 bp downstream. The gene upstream of this eight-gene cluster has strong similarity to mycobacteriophage integrase sequences. The GC content of this 4.4-kb region is 66%, which is similar to the rest of the genome, indicating that this region was not horizontally acquired recently. Southern hybridization analysis confirmed that this gene cluster is present only in M. avium subsp. paratuberculosis. Collectively, these studies suggest that a genomics approach will help in identifying novel M. avium subsp. paratuberculosis genes as candidate diagnostic sequences.  相似文献   

14.
A random sequence survey of the genome of Trypanosoma cruzi, the agent of Chagas disease, was performed and 11,459 genomic sequences were obtained, resulting in approximately 4.3 Mb of readable sequences or approximately 10% of the parasite haploid genome. The estimated total GC content was 50.9%, with a high representation of A and T di- and trinucleotide repeats. Out of the estimated 5000 parasite genes, 947 putative new genes were identified. Another 1723 sequences corresponded to genes detected previously in T. cruzi through expression sequence tag analysis. 7735 sequences had no matches in the database, but the presence of open reading frames that passed Fickett's test suggests that some might contain coding DNA. The survey was highly redundant, with approximately 35% of the sequences included in a few large sequence families. Some of them code for protein families present in dozens of copies, including proteins essential for parasite survival and retrotransposons. Other sequence families include repetitive DNA present in thousands of copies per haploid genome. Some families in the latter group are new, parasite-specific, repetitive DNAs. These results suggest that T. cruzi could constitute an interesting model to analyze gene and genome evolution due to its plasticity in terms of sequence amplification and divergence. Additional information can be found at http://www.iib.unsam.edu.ar/tcruzi.gss. html.  相似文献   

15.
16.
Deletions and duplications of genomic segments commonly cause developmental disorders. The resolution and efficiency in diagnosing such gene-dosage alterations can be drastically increased using microarray-based comparative genomic hybridization (array-CGH). However, array-CGH currently relies on spotting genomic clones as targets, which confers severe limitations to the approach including resolution of analysis and reliable gene-dosage assessment of regions with high content of redundant sequences. To improve the methodology for analysis, we compared the use of genomic clones, repeat-free pools of amplified genomic DNA and cDNAs (single and pooled) as targets on the array. For this purpose, we chose q11.2 locus on chromosome 22 as a testing ground. Microdeletions at 22q11 cause birth defects collectively described as the DiGeorge/velocardiofacial syndrome. The majority of patients present 3 Mb typical deletions. Here, we report the construction of a gene-dosage array, covering 6 Mb of 22q11 and including the typically deleted region. We hybridized DNA from six DiGeorge syndrome patients to the array, and show that as little as 11.5 kb non-redundant, repeat-free PCR-generated sequence can be used for reliable detection of hemizygous deletions. By extrapolation, this would allow analysis of the genome with an average resolution of 25 kb. In the case of cDNAs our results indicate that 3.5 kb sequence is necessary for accurate identification of haploid/diploid dosage alterations. Thus, for regions rich in redundant sequences and repeats, such as 22q11, a specifically tailored array-CGH approach is good for gene copy number profiling.  相似文献   

17.
Cloning the shared components of complex DNA resources   总被引:1,自引:0,他引:1  
The complex and repetitive nature of mammalian genomes limitsthe ability of conventional molecular techniques to recoversequences of interest. Here we describe a rapid and simple procedurefor the direct cloning of sequences which are coincident betweenDNA mixtures of whole genome complexity. The system, calledend ligation coincident sequence cloning (EL-CSC), can enrichcoincident DNA by greater than 106-fold and overcomes problemsassociated with repetitive elements. Applying EL-CSC to variouspaired DNA resources enables the facile cloning of both genomicmarkers and novel genes. To demonstrate the power of the methodwe have 1) selectively purified single copy sequences from acomplete genome, and II) isolated gene fragments from 260 kbof cloned genomic DNA.  相似文献   

18.
Goto  Chie  Hayakawa  Tohru  Maeda  Susumu 《Virus genes》1998,16(2):199-210
In order to characterize the genome organization of Xestia c-nigrum granulovirus (XcGV), mapping of putative XcGV genes was performed by construction of lambda and M13 phage libraries followed by Southern blot and nucleotide sequencing analyses. Mapping of the lambda (32 clones covering the entire XcGV genome) and M13 (133 clones made by random cloning) phage library clones was carried out by hybridization of the labeled lambda phage clone DNAs to 1) Southern blotted XcGV genomic DNA fragments cleaved with EcoRI, BamHI, or HindIII, and 2) dot blotted M13 clone DNAs. All 133 M13 clone DNAs were sequenced, and coding possibilities were investigated by computer-assisted homology search; in total, about 43 kb of the genome was sequenced. Amino acid sequence homology searches of 67 M13 clones suggested that these GV DNAs coded for previously characterized genes identified in nucleopolyhedroviruses (NPVs) and GVs. These 67 M13 clones were classified into 25 gene homolog groups (including 29 putative genes) based on their homologies to NPV and GV genes. The remaining M13 clones, except one that encoded a putative metalloproteinase, did not possess deduced amino acid sequences with significant homology to proteins in gene databases. Complete nucleotide sequences of the putative XcGV DNA polymerase and Ac144 homolog genes confirmed the reliability of our speculation of putative genes based on the M13 clones sequencing analysis. In a comparison of relative locations of putative XcGV genes with locations of their homologs in NPVs, most XcGV genes were mapped close to the corresponding locations in NPV genomes. These results suggested that XcGV, compared to NPVs, had relatively conserved gene arrangements, although about 22 kb of 43 kb of DNA sequenced randomly in the XcGV genome consisted of sequences/genes non-homologous to those of previously characterized NPVs. This revised version was published online in August 2006 with corrections to the Cover Date.  相似文献   

19.
Highly abundant DNA fragments obtained after restriction enzyme digests of nuclear DNA of Entamoeba histolytica strain HM-1:IMSS have been cloned and characterized. Northern blot hybridization to E. histolytica rRNA and sequence analysis identified the abundant DNAs as ribosomal DNA containing species. Several overlapping clones containing these abundant DNAs were isolated from 4 different genomic libraries of E. histolytica. Alignment of the restriction maps was consistent with a circular molecule, about 24.6 kilobase pairs (kb) in size. Nuclease BA131 digestion provided additional evidence for the circular nature of this DNA. The ribosomal DNA molecule contains two large inverted repeat-regions, each at least 5.2 kb in length. Sequence analysis of clone R715 revealed homology to the large rRNA units of various eukaryotic organisms. This clone was located in both inverted repeats, suggesting two rRNA cistrons per molecule. The inverted repeats are flanked by stretches of DNA which contain tandemly reiterated sequences. Southern blot analysis of E. histolytica nuclear DNA revealed the presence of two populations of molecules. These molecules have identical arrangements of restriction sites, but differ in size (0.7 kb) in a fragment containing tandemly reiterated sequences. Analysis of E. histolytica nuclear DNA by electron microscopy also revealed circular molecules. These molecules are about 26.6 kb +/- 0.5 kb in size and contain structural features predicted by the restriction map of the extrachromosomal ribosomal DNA of E. histolytica.  相似文献   

20.
Summary Romanomennis culicivorax, an obligate parasitic nematode of mosquitos, possesses an unusually large mitochondrial genome. Individuals are monomorphic for one of several mitochondrial DNA (mtDNA) size variants ranging from 26–32 kb. In this report, we demonstrate that the mitochondrial genome size differential in three isofemale lineages is due to the presence of mtDNA sequences amplified to different copy numbers within each mtDNA molecule. Restriction enzyme analysis and DNA sequencing studies reveal that each mitochondrial genome contains one of two 3.0 kb repeat types that differ by approximately 30 bp. This difference is primarily due to a short (23 bp) imperfect tandem duplication present within the larger of two polymorphic repeating units. The 3.0 kb reiterated DNA sequences are present as direct, tandem repeats and as inverted portions of the same sequence located elsewhere in the genome. Based on mtDNA analysis of an independently reared R. culicivorax culture, we conclude that events resulting in mitochondrial genome rearrangement occurred in natural field populations prior to propagation within the laboratory.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号