首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
3.
4.
Cryptosporidium parvum is an obligate intracellular pathogen responsible for widespread infections in humans and animals. The inability to obtain purified samples of this organism's various developmental stages has limited the understanding of the biochemical mechanisms important for C. parvum development or host-parasite interaction. To identify C. parvum genes independent of their developmental expression, a random sequence analysis of the 10.4-megabase genome of C. parvum was undertaken. Total genomic DNA was sheared by nebulization, and fragments between 800 and 1,500 bp were gel purified and cloned into a plasmid vector. A total of 442 clones were randomly selected and subjected to automated sequencing by using one or two primers flanking the cloning site. In this way, 654 genomic survey sequences (GSSs) were generated, corresponding to >320 kb of genomic sequence. These sequences were assembled into 408 contigs containing >250 kb of unique sequence, representing approximately 2.5% of the C. parvum genome. Comparison of the GSSs with sequences in the public DNA and protein databases revealed that 107 contigs (26%) displayed similarity to previously identified proteins and rRNA and tRNA genes. These included putative genes involved in the glycolytic pathway, DNA, RNA, and protein metabolism, and signal transduction pathways. The repetitive sequence elements identified included a telomere-like sequence containing hexamer repeats, 57 microsatellite-like elements composed of dinucleotide or trinucleotide repeats, and a direct repeat sequence. This study demonstrates that large-scale genomic sequencing is an efficient approach to analyze the organizational characteristics and information content of the C. parvum genome.  相似文献   

5.
Second-generation sequencing technology can now be used to sequence an entire human genome in a matter of days and at low cost. Sequence read lengths, initially very short, have rapidly increased since the technology first appeared, and we now are seeing a growing number of efforts to sequence large genomes de novo from these short reads. In this Perspective, we describe the issues associated with short-read assembly, the different types of data produced by second-gen sequencers, and the latest assembly algorithms designed for these data. We also review the genomes that have been assembled recently from short reads and make recommendations for sequencing strategies that will yield a high-quality assembly.As genome sequencing technology has evolved, methods for assembling genomes have changed with it. Genome sequencers have never been able to “read” more than a relatively short stretch of DNA at once, with read lengths gradually increasing over time. Reconstructing a complete genome from a set of reads requires an assembly program, and a variety of genome assemblers have been used for this task. In 1995, when the first bacterial genome was published (Haemophilus influenzae), read lengths were ∼460 base pairs (bp), and that whole-genome shotgun (WGS) sequencing project generated 24,304 reads (Fleischmann et al. 1995). The human genome project required ∼30 million reads, with lengths up to 800 bp, using Sanger sequencing technology and automated capillary sequencers (International Human Genome Sequencing Consortium 2001; Venter et al. 2001). This corresponded to 24 billion bases (Gb), or approximately eightfold coverage of the 3-Gb human genome. Redundant coverage, in which on average every nucleotide is sequenced many times over, is required to produce a high-quality assembly. Another benefit of redundancy is greatly increased accuracy compared with a single read: Where a single read might have an error rate of 1%, eightfold coverage has an error rate as low as 10−16 when eight high-quality reads agree with one another. High coverage is also necessary to sequence polymorphic alleles within diploid or polyploid genomes.Current second-generation sequencing (SGS) technologies produce read lengths ranging from 35 to 400 bp, at far greater speed and much lower cost than Sanger sequencing. However, as reads get shorter, coverage needs to increase to compensate for the decreased connectivity and produce a comparable assembly. Certain problems cannot be overcome by deeper coverage: If a repetitive sequence is longer than a read, then coverage alone will never compensate, and all copies of that sequence will produce gaps in the assembly. These gaps can be spanned by paired reads—consisting of two reads generated from a single fragment of DNA and separated by a known distance—as long as the pair separation distance is longer than the repeat. Paired-end sequencing is available from most of the SGS machines, although it is not yet as flexible or as reliable as paired-end sequencing using traditional methods.After the successful assembly of the human (International Human Genome Sequencing Consortium 2001; Venter et al. 2001) and mouse (Waterston et al. 2002) genomes by whole-genome shotgun sequencing, most large-scale genome projects quickly moved to adopt the WGS approach, which has subsequently been used for dozens of eukaryotic genomes. Today, thanks to changes in sequencing technology, a major question confronting genome projects is, can we sequence a large genome (>100 Mbp) using short reads? If so, what are the limitations on read length, coverage, and error rates? How much paired-end sequencing is necessary? And what will the assembly look like? In this perspective we take a look at each of these questions and describe the solutions available today. Although we provide some answers, we have no doubt that the solutions will change rapidly over the next few years, as both the sequencing methods and the computational solutions improve.  相似文献   

6.
PRLr mRNA在人体免疫系统中的表达   总被引:3,自引:0,他引:3  
陈真  林玲 《中国免疫学杂志》2005,21(12):915-917
目的:探讨催乳素受体(PRLr)在人体免疫系统中的表达。方法:临床获取人中枢免疫器官胸腺瘤、骨髓和外周免疫器官淋巴结、外周血单个核细胞,经RNA抽提,RT-PCR扩增PRLr mRNA的特异片段,并进行测序。结果:从胸腺瘤、骨髓、淋巴结及外周血单个核细胞均扩增出PRLr mRNA,其片段长度与预计长度一致,均为276bp,经测序证实为所需片段。结论:人中枢免疫器官胸腺瘤、骨髓以及外周免疫器官淋巴结、外周血单个核细胞存在PRLr表达,从受体角度直接证实内分泌激素PRL发挥免疫调节作用的生物学结构依据。  相似文献   

7.
8.
The maize-infecting nucleorhabdovirus, Maize mosaic virus (MMV), was sequenced to near completion using the random shotgun approach. Sequences of 102 clones from a cDNA library constructed from randomly-primed viral RNA were compiled into a 12,133 nucleotide (nt) contig containing six open reading frames. The contig consisted of 97 sequences averaging 660 bp in length. The average sequence coverage was six-fold, and 93% of the contig had sequence reads covering both strands. The remaining sequence was derived from single (5%) or multiple (2%) reads on the same strand. Three of the six ORFs showed significant similarities to the deduced protein sequences of the nucleocapsid, glycoprotein and polymerase sequences of other rhabdoviruses. The predicted gene order of the MMV genome was 3'-N-P-3-M-G-L-5'. Shotgun sequencing of the MMV genome took approximately 127 h and cost 0.38 dollars per nt (including labor), whereas the primer walking approach for sequencing the 13,782-nt MFSV genome [Tsai, C.-W., Redinbaugh, M.G., Willie, K.J., Reed, S., Goodin, M., Hogenhout, S. A., 2005. Complete genome sequence and in planta subcellular localization of maize fine streak virus proteins. J. Virol. 79, 5304-5314] took about 217 h and cost 0.50 dollars per nt. Thus, the shotgun approach gave good depth of coverage for the viral genome sequence while being significantly faster and less expensive than the primer walking method. This technique will facilitate the sequencing of multiple rhabdovirus genomes.  相似文献   

9.
Efficient sequencing of animal and plant genomes by next-generation technology should allow many neglected organisms of biological and medical importance to be better understood. As a test case, we have assembled a draft genome of Caenorhabditis sp. 3 PS1010 through a combination of direct sequencing and scaffolding with RNA-seq. We first sequenced genomic DNA and mixed-stage cDNA using paired 75-nt reads from an Illumina GAII. A set of 230 million genomic reads yielded an 80-Mb assembly, with a supercontig N50 of 5.0 kb, covering 90% of 429 kb from previously published genomic contigs. Mixed-stage poly(A)(+) cDNA gave 47.3 million mappable 75-mers (including 5.1 million spliced reads), which separately assembled into 17.8 Mb of cDNA, with an N50 of 1.06 kb. By further scaffolding our genomic supercontigs with cDNA, we increased their N50 to 9.4 kb, nearly double the average gene size in C. elegans. We predicted 22,851 protein-coding genes, and detected expression in 78% of them. Multigenome alignment and data filtering identified 2672 DNA elements conserved between PS1010 and C. elegans that are likely to encode regulatory sequences or previously unknown ncRNAs. Genomic and cDNA sequencing followed by joint assembly is a rapid and useful strategy for biological analysis.  相似文献   

10.
A primer pair which was expected to generate an amplicon of the estimated size (approximately 1700 base pair (bp)) of the flaA gene for Campylobacter jejuni amplified products of approximately 1450 bp for 33 of the 44 isolates of urease-positive thermophilic Campylobacter (UPTC). The primer pair, however, failed to amplify fragments for 11 isolates of UPTC, for all of the 12 isolates of urease-negative C. lari and for one isolate of C. coli. Nevertheless, it successfully amplified fragments of approximately 1700 bp for five isolates of C. jejuni and for nine isolates of C. coli. Thus, the fragments of the flaA gene of UPTC were shorter than those of C. jejuni and C. coli. After PCR amplification and nucleotide sequencing of the flaA genes from five UPTC NCTC isolates, the putative open reading frames (ORFs) were found to range from 1461 to 1479 bp. The amino acid and nucleotide sequence alignments demonstrated that the PCR clones contained the flaA gene; however, our data indicated that this locus was markedly shorter in the UPTC organisms examined, as they were approximately 85 amino acid residues shorter, mainly corresponding to approximate residue numbers 390-470 of the large variable region of C. jejuni 81116. Heterogeneity was indicated in the molecular mass of the flagellin purified from the isolates examined. Flagellin of UPTC was demonstrated to be genotypically and phenotypically smaller than those of C. jejuni.  相似文献   

11.
Next-generation sequencing (NGS) technologies can be a boon to human mutation detection given their high throughput: consequently, many genes and samples may be simultaneously studied with high coverage for accurate detection of heterozygotes. In circumstances requiring the intensive study of a few genes, particularly in clinical applications, a rapid turn around is another desirable goal. To this end, we assessed the performance of the bench-top 454 GS Junior platform as an optimized solution for mutation detection by amplicon sequencing of three type 3 semaphorin genes SEMA3A, SEMA3C, and SEMA3D implicated in Hirschsprung disease (HSCR). We performed mutation detection on 39 PCR amplicons totaling 14,014 bp in 47 samples studied in pools of 12 samples. Each 10-hr run was able to generate ~75,000 reads and ~28 million high-quality bases at an average read length of 371 bp. The overall sequencing error was 0.26 changes per kb at a coverage depth of ≥20 reads. Altogether, 37 sequence variants were found in this study of which 10 were unique to HSCR patients. We identified five missense mutations in these three genes that may potentially be involved in the pathogenesis of HSCR and need to be studied in larger patient samples.  相似文献   

12.
13.
14.
15.
Adaptation of avian influenza viruses (AIVs) from waterfowl to domestic poultry with a deletion in the neuraminidase (NA) stalk has already been reported. The way the virus undergoes this evolution, however, is thus far unclear. We address this question using pyrosequencing of duck and turkey low-pathogenicity AIVs. Ducks and turkeys were sampled at the very beginning of an H6N1 outbreak, and turkeys were swabbed again 8 days later. NA stalk deletions were evidenced in turkeys by Sanger sequencing. To further investigate viral evolution, 454 pyrosequencing was performed: for each set of samples, up to 41,500 reads of ca. 400 bp were generated and aligned. Genetic polymorphisms between duck and turkey viruses were tracked on the whole genome. NA deletion was detected in less than 2% of reads in duck feces but in 100% of reads in turkey tracheal specimens collected at the same time. Further variations in length were observed in NA from turkeys 8 days later. Similarly, minority mutants emerged on the hemagglutinin (HA) gene, with substitutions mostly in the receptor binding site on the globular head. These critical changes suggest a strong evolutionary pressure in turkeys. The increasing performances of next-generation sequencing technologies should enable us to monitor the genomic diversity of avian influenza viruses and early emergence of potentially pathogenic variants within bird flocks. The present study, based on 454 pyrosequencing, suggests that NA deletion, an example of AIV adaptation from waterfowl to domestic poultry, occurs by selection rather than de novo emergence of viral mutants.  相似文献   

16.
A large-scale BAC end-sequencing project at The Institute for Genomic Research (TIGR) has generated one of the most extensive sets of sequence markers for the mouse genome to date. With a sequencing success rate of >80%, an average read length of 485 bp, and ABI3700 capillary sequencers, we have generated 449,234 nonredundant mouse BAC end sequences (mBESs) with 218 Mb total from 257,318 clones from libraries RPCI-23 and RPCI-24, representing 15x clone coverage, 7% sequence coverage, and a marker every 7 kb across the genome. A total of 191,916 BACs have sequences from both ends providing 12x genome coverage. The average Q20 length is 406 bp and 84% of the bases have phred quality scores > or = 20. RPCI-24 mBESs have more Q20 bases and longer reads on average than RPCI-23 sequences. ABI3700 sequencers and the sample tracking system ensure that > 95% of mBESs are associated with the right clone identifiers. We have found that a significant fraction of mBESs contains L1 repeats and approximately 48% of the clones have both ends with > or = 100 bp contiguous unique Q20 bases. About 3% mBESs match ESTs and > 70% of matches were conserved between the mouse and the human or the rat. Approximately 0.1% mBESs contain STSs. About 0.2% mBESs match human finished sequences and > 70% of these sequences have EST hits. The analyses indicate that our high-quality mouse BAC end sequences will be a valuable resource to the community.  相似文献   

17.
背景:脑源性神经营养因子(brain-derived neurotrophic factor,BDNF)作用广泛,但属于生物大分子,不能通过血脑屏障。基因治疗是目前解决脑源性神经营养因子给药途径最有希望的方案。 目的:拟构建大鼠脑源性神经营养因子基因真核表达载体。 方法:采用反转录聚合酶链式反应技术从SD大鼠脑组织提取总RNA,扩增脑源性神经营养因子基因cDNA序列,并将其克隆到真核表达载体pcDNA3中,分别取10 g质粒pcDNA3和纯化的目的基因分别进行EcoR Ⅰ、xho Ⅰ双酶切。将目的基因片段和pcDNA3载体连接,转入感受态DH5α细胞中,经酶切鉴定后送上海博亚生物技术有限公司测序。 结果与结论:RT-PCR产物为749 bp的特异片段,重组质粒pcDNA3/BDNF酶切后产生 749 bp和5 446 bp的片段,DNA测序证实749 bp片段的碱基序列与大鼠脑源性神经营养因子基因序列完全一致,成功构建了pcDNA3/BDNF重组质粒。  相似文献   

18.
Here we present the cloning of three novel mouse mast cell-specific serine proteases, MMCP-1, MMCP-4 and MMCP-5. A region of approximately 4 kb covering the five exons and 930 bp 5' and 280 bp 3' flanking sequences of the gene for MMCP-1 was characterized by nucleotide sequence analysis. A comparison with the corresponding region of the rat mucosal mast cell-specific protease RMCP-II is presented. cDNA clones for the mast cell proteases MMCP-4 (950 bp) and MMCP-5 (1098 bp) were isolated from a cDNA library of a connective tissue mast cell-like mouse mastocytoma cell line. All three proteases were found to belong to the family of chymotrypic serine proteases as deduced from the absence of the Asp 189 which is characteristic for all serine proteases having cleavage specificities similar to pancreatic trypsin. The active polypeptides, excluding possible post-translational glycosylations, have an Mr of 25-26 kDa. Analysis of the amino acid composition reveals a positive net charge for all three proteases MMCP-1 +3, MMCP-4 +18 and MMCP-5 +12). Based on their high sequence identity (88%) and high positive net charges (+18 and +18, respectively) we assume that the MMCP-4 is the mouse homolog to rat RMCP-I. Probes specific for each of these three highly homologous protease genes have been generated by subcloning of fragments of approximately 100 bp in length, originating from the 3' ends of the mRNA into plasmid vectors. Northern blot analysis of mRNA from a number of murine cell lines shows gene expression of these proteases to be specific for the differentiation stage of the mast cell. The MMCP-1 is expressed only at the mucosal mast cell stage and 5 only in mast cells of the connective tissue mast cell stage. These serine proteases may serve as highly specific markers in the analysis of mast cell heterogeneity, differentiation and function.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号