首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Ciliates are the only unicellular eukaryotes known to separate germinal and somatic functions. Diploid but silent micronuclei transmit the genetic information to the next sexual generation. Polyploid macronuclei express the genetic information from a streamlined version of the genome but are replaced at each sexual generation. The macronuclear genome of Paramecium tetraurelia was recently sequenced by a shotgun approach, providing access to the gene repertoire. The 72-Mb assembly represents a consensus sequence for the somatic DNA, which is produced after sexual events by reproducible rearrangements of the zygotic genome involving elimination of repeated sequences, precise excision of unique-copy internal eliminated sequences (IES), and amplification of the cellular genes to high copy number. We report use of the shotgun sequencing data (>10(6) reads representing 13 x coverage of a completely homozygous clone) to evaluate variability in the somatic DNA produced by these developmental genome rearrangements. Although DNA amplification appears uniform, both of the DNA elimination processes produce sequence heterogeneity. The variability that arises from IES excision allowed identification of hundreds of putative new IESs, compared to 42 that were previously known, and revealed cases of erroneous excision of segments of coding sequences. We demonstrate that IESs in coding regions are under selective pressure to introduce premature termination of translation in case of excision failure.  相似文献   

2.
Stickler syndrome is one of the milder phenotypes resulting from mutations in the gene that encodes type-II collagen, COL2A1. All COL2A1 mutations known to cause Stickler syndrome result in the formation of a premature termination codon within the type-II collagen gene. COL2A1 has 10 in-frame CGA codons, which can mutate to TGA STOP codons via a methylation-deamination mechanism. We have analyzed these sites in genomic DNA from a panel of 40 Stickler syndrome patients to test the hypothesis that mutations that cause Stickler syndrome preferentially occur at these bases. Polymerase chain reaction (PCR) amplification of genomic DNA containing each of the in-frame CGA codons was done by one of two methods: either using primers that amplify DNA that includes the CGA codon, or using allele-specific primers that either amplify normal sequence containing a CGA codon or amplify a mutant sequence containing a TGA codon. Analysis of PCR products by restriction endonuclease digestion or sequencing demonstrated the presence of a normal or mutated codon. TGA mutations were identified in eight patients, at five of the 10 in-frame CGA codons. The identification of these mutations in eight of 40 patients demonstrates that these sites are common sites for mutations in individuals with Stickler syndrome and, we propose, should be analyzed as a first step in the search for mutations that result in this disorder.  相似文献   

3.
4.
Gatto GJ  Berg JM 《Genome research》2003,13(4):617-623
The availability of complete genome sequences enables the statistical analysis of sequence features without significant database-imposed bias. The carboxyl termini of proteins often contain regions associated with protein targeting and enhanced translational termination. We analyzed the frequency of occurrence of C-terminal tripeptides in representative archaeal, bacterial, and eukaryotic genomes. The sequence distribution in prokaryotic genomes nearly matches that generated by the randomization of the observed tripeptide set. In contrast, eukaryotic genomes contain large numbers of overrepresented sequences. Some of these correspond to highly repeated sequences from either duplicated endogenous genes or transposon open reading frames. Gratifyingly, others represent previously known targeting signals or sequences associated with an increase in translational termination efficiency. However, a number of overrepresented tripeptides have not been previously noted and may represent novel functional sequences. For example, the sequence XSS may enhance translational termination efficiency in plants, whereas FWC may be a targeting or processing signal for certain amino acid permeases in yeast.  相似文献   

5.
Distinguishing regulatory DNA from neutral sites   总被引:14,自引:2,他引:14       下载免费PDF全文
We explore several computational approaches to analyzing interspecies genomic sequence alignments, aiming to distinguish regulatory regions from neutrally evolving DNA. Human-mouse genomic alignments were collected for three sets of human regions: (1) experimentally defined gene regulatory regions, (2) well-characterized exons (coding sequences, as a positive control), and (3) interspersed repeats thought to have inserted before the human-mouse split (a good model for neutrally evolving DNA). Models that potentially could distinguish functional noncoding sequences from neutral DNA were evaluated on these three data sets, as well as bulk genome alignments. Our analyses show that discrimination based on frequencies of individual nucleotide pairs or gaps (i.e., of possible alignment columns) is only partially successful. In contrast, scoring procedures that include the alignment context, based on frequencies of short runs of alignment columns, dramatically improve separation between regulatory and neutral features. Such scoring functions should aid in the identification of putative regulatory regions throughout the human genome.  相似文献   

6.
7.
BACKGROUND: Spontaneous read-through of a premature termination codon (PTC) has so far not been observed in patients carrying nonsense mutations. This report describes a patient with junctional epidermolysis bullosa who was expected to die because of compound heterozygous nonsense mutations in the gene LAMA3 (R943X/R1159X), but was rescued by spontaneous read-through of the R943X allele. RESULTS AND CONCLUSION: FACS analysis of cells carrying various PTCs surrounded by their natural neighbouring codons revealed significant reporter gene expression despite the PTC only for this patient's genetic context. Gene expression could be abolished by replacing the first or third nucleotide before, or one of the two nucleotides following the PTC. Site-directed mutagenesis was used to identify genotypes allowing PTC read-through. The genetic context of the LAMA3 mutation R943X is close to a hypothetical consensus sequence for maximum PTC read-through. Bioinformatic analysis showed that this consensus sequence is present in four sequences from the NCBI reference database, each of which contains another in-frame termination codon three or four codons apart. This indicates strong selective pressure against leaky termination codons in the human genome. This patient's mutated full length mRNA escaped nonsense-mediated decay, leading to LAMA3 mRNA levels similar to those of a healthy control, and full length laminin α3 could be detected in culture supernatant of the patient's keratinocytes. Immunofluorescence analyses of skin biopsies and continuous clinical improvement of the patient's condition suggested accumulation of intact laminin-332 in the epidermal basement membrane. These findings provide important clues for the prediction of PTC read-through in human genetic disease.  相似文献   

8.
DNA sequencing reveals that the genomes of the human, gorilla and chimpanzee share more than 98% homology. Comparative chromosome painting and gene mapping have demonstrated that only a few rearrangements of a putative ancestral mammalian genome occurred during great ape and human evolution. However, interspecies representational difference analysis (RDA) of the gorilla between human and gorilla revealed gorilla-specific DNA sequences. Cloning and sequencing of gorilla-specific DNA sequences indicate that there are repetitive elements. Gorilla-specific DNA sequences were mapped by fluorescence in-situ hybridization (FISH) to the subcentromeric/centromeric regions of three pairs of gorilla submetacentric chromosomes. These sequences could represent either ancient sequences that got lost in other species, such as human and orang-utan, or, more likely, recent sequences which evolved or originated specifically in the gorilla genome.  相似文献   

9.
Massively parallel (“next generation”) DNA sequencing (NGS) has quickly become the method of choice for seeking pathogenic mutations in rare uncharacterized monogenic diseases. Typically, before DNA sequencing, protein‐coding regions are enriched from patient genomic DNA, representing either the entire genome (“exome sequencing”) or selected mapped candidate loci. Sequence variants, identified as differences between the patient's and the human genome reference sequences, are then filtered according to various quality parameters. Changes are screened against datasets of known polymorphisms, such as dbSNP and the 1000 Genomes Project, in the effort to narrow the list of candidate causative variants. An increasing number of commercial services now offer to both generate and align NGS data to a reference genome. This potentially allows small groups with limited computing infrastructure and informatics skills to utilize this technology. However, the capability to effectively filter and assess sequence variants is still an important bottleneck in the identification of deleterious sequence variants in both research and diagnostic settings. We have developed an approach to this problem comprising a user‐friendly suite of programs that can interactively analyze, filter and screen data from enrichment‐capture NGS data. These programs (“Agile Suite”) are particularly suitable for small‐scale gene discovery or for diagnostic analysis.  相似文献   

10.
11.
The currently favored approach for sequencing the human genome involves selecting representative large-insert clones (100–200 kb), randomly shearing this DNA to construct shotgun libraries, and then sequencing many different isolates from the library. This method, entitled directed random shotgun sequencing, requires highly redundant sequencing to obtain a complete and accurate finished consensus sequence. Recently it has been suggested that a rapidly generated lower redundancy sequence might be of use to the scientific community. Low-redundancy sequencing has been examined previously using simulated data sets. Here we utilize trace data from a number of projects submitted to GenBank to perform reconstruction experiments that mimic low-redundancy sequencing. These low-redundancy sequences have been examined for the completeness and quality of the consensus product, information content, and usefulness for interspecies comparisons.The data presented here suggest three different sequencing strategies, each with different utilities. (1) Nearly complete sequence data can be obtained by sequencing a random shotgun library at sixfold redundancy. This may therefore represent a good point to switch from a random to directed approach. (2) Sequencing can be performed with as little as twofold redundancy to find most of the information about exons, EST hits, and putative exon similarity matches. (3) To obtain contiguity of coding regions, sequencing at three- to fourfold redundancy would be appropriate. From these results, we suggest that a useful intermediate product for genome sequencing might be obtained by three- to fourfold redundancy. Such a product would allow a large amount of biologically useful data to be extracted while postponing the majority of work involved in producing a high quality consensus sequence.  相似文献   

12.
Goto  Chie  Hayakawa  Tohru  Maeda  Susumu 《Virus genes》1998,16(2):199-210
In order to characterize the genome organization of Xestia c-nigrum granulovirus (XcGV), mapping of putative XcGV genes was performed by construction of lambda and M13 phage libraries followed by Southern blot and nucleotide sequencing analyses. Mapping of the lambda (32 clones covering the entire XcGV genome) and M13 (133 clones made by random cloning) phage library clones was carried out by hybridization of the labeled lambda phage clone DNAs to 1) Southern blotted XcGV genomic DNA fragments cleaved with EcoRI, BamHI, or HindIII, and 2) dot blotted M13 clone DNAs. All 133 M13 clone DNAs were sequenced, and coding possibilities were investigated by computer-assisted homology search; in total, about 43 kb of the genome was sequenced. Amino acid sequence homology searches of 67 M13 clones suggested that these GV DNAs coded for previously characterized genes identified in nucleopolyhedroviruses (NPVs) and GVs. These 67 M13 clones were classified into 25 gene homolog groups (including 29 putative genes) based on their homologies to NPV and GV genes. The remaining M13 clones, except one that encoded a putative metalloproteinase, did not possess deduced amino acid sequences with significant homology to proteins in gene databases. Complete nucleotide sequences of the putative XcGV DNA polymerase and Ac144 homolog genes confirmed the reliability of our speculation of putative genes based on the M13 clones sequencing analysis. In a comparison of relative locations of putative XcGV genes with locations of their homologs in NPVs, most XcGV genes were mapped close to the corresponding locations in NPV genomes. These results suggested that XcGV, compared to NPVs, had relatively conserved gene arrangements, although about 22 kb of 43 kb of DNA sequenced randomly in the XcGV genome consisted of sequences/genes non-homologous to those of previously characterized NPVs. This revised version was published online in August 2006 with corrections to the Cover Date.  相似文献   

13.
This article describes a set of alignments of 28 vertebrate genome sequences that is provided by the UCSC Genome Browser. The alignments can be viewed on the Human Genome Browser (March 2006 assembly) at http://genome.ucsc.edu, downloaded in bulk by anonymous FTP from http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz28way, or analyzed with the Galaxy server at http://g2.bx.psu.edu. This article illustrates the power of this resource for exploring vertebrate and mammalian evolution, using three examples. First, we present several vignettes involving insertions and deletions within protein-coding regions, including a look at some human-specific indels. Then we study the extent to which start codons and stop codons in the human sequence are conserved in other species, showing that start codons are in general more poorly conserved than stop codons. Finally, an investigation of the phylogenetic depth of conservation for several classes of functional elements in the human genome reveals striking differences in the rates and modes of decay in alignability. Each functional class has a distinctive period of stringent constraint, followed by decays that allow (for the case of regulatory regions) or reject (for coding regions and ultraconserved elements) insertions and deletions.  相似文献   

14.
Massive sequence comparisons as a help in annotating genomic sequences.   总被引:1,自引:0,他引:1  
An all-by-all comparison of all the publicly available protein sequences from plants has been performed, followed by a clusterization process. Within each of the 1064 resulting clusters-containing sequences that are orthologous as well as paralogous-the sequences have been submitted to a pyramidal classification and their domains delineated by an automated procedure à la. This process provides a means for easily checking for any apparent inconsistency in a cluster, for example, whether one sequence is shorter or longer than the others, one domain is missing, etc. In such cases, the alignment of the DNA sequence of the gene with that of a close homologous protein often reveals (in 10% of the clusters) probable sequencing errors (leading to frameshifts) or probable wrong intron/exon predictions. The composition of the clusters, their pyramidal classifications, and domain decomposition, as well as our comments when appropriate, are available from http://chlora.infobiogen.fr:1234/PHYTOPROT.  相似文献   

15.
Isolation of an SSAV-related endogenous sequence from human DNA   总被引:5,自引:0,他引:5  
  相似文献   

16.
A bacteriophage infecting the secondary endosymbiont of the pea aphid Acyrthosiphon pisum was isolated and characterized. The phage was tentatively named bacteriophage APSE-1, for bacteriophage 1 of the A. pisum secondary endosymbiont. The APSE-1 phage particles morphologically resembled those of species of the Podoviridae. The complete nucleotide sequence of the bacteriophage APSE-1 genome was elucidated, and its genomic organization was deduced. The genome consists of a circularly permuted and terminally redundant double-stranded DNA molecule of 36524 bp. Fifty-four open reading frames, putatively encoding proteins with molecular masses of more than 8 kDa, were distinguished. ORF24 was identified as the gene coding for the major head protein by N-terminal amino acid sequencing of the protein. Comparison of APSE-1 sequences with bacteriophage-derived sequences present in databases revealed the putative function of 24 products, including the lysis proteins, scaffolding protein, transfer proteins, and DNA polymerase. This is the first report of a phage infecting an endosymbiont of an arthropod.  相似文献   

17.
The linkage of disease gene mapping with DNA sequencing is an essential strategy for defining the genetic basis of a disease. New massively parallel sequencing procedures will greatly facilitate this process, although enrichment for the target region before sequencing remains necessary. For this step, various DNA capture approaches have been described that rely on sequence-defined probe sets. To avoid making assumptions on the sequences present in the targeted region, we accessed specific cytogenetic regions in preparation for next-generation sequencing. We directly microdissected the target region in metaphase chromosomes, amplified it by degenerate oligonucleotide-primed PCR, and obtained sufficient material of high quality for high-throughput sequencing. Sequence reads could be obtained from as few as six chromosomal fragments. The power of cytogenetic enrichment followed by next-generation sequencing is that it does not depend on earlier knowledge of sequences in the region being studied. Accordingly, this method is uniquely suited for situations in which the sequence of a reference region of the genome is not available, including population-specific or tumor rearrangements, as well as previously unsequenced genomic regions such as centromeres.  相似文献   

18.
Recently a new group of circoviruses have been detected in tissues of Barbel fish and European catfish in Hungary. In our study circovirus genomes were screened in eight additional fish species for the detection and characterization of circoviruses. Two species of these bore circoviral sequences based on conventional PCR assay targeting the replication-associated protein coding gene fragments. Interestingly, the methods successfully used before failed to amplify other parts of the circular viral genome, suggesting the presence of partial, integrated genetic elements in the genome of the host. The successfully sequenced fragments of the Indian rohu (Labeo rohita) encoded mutations which may cause frameshifts or termination in the coding region described previously in other vertebrates. Phylogenetic analyses presumed that integration of the viral genetic elements might have progressed concurrently or following the diversification of cyprinid fish. Further studies on the nature of whole circovirus genomes and integrated elements may help to understand their potential role and evolution in different fish species.  相似文献   

19.
20.
A random sequence survey of the genome of Trypanosoma cruzi, the agent of Chagas disease, was performed and 11,459 genomic sequences were obtained, resulting in approximately 4.3 Mb of readable sequences or approximately 10% of the parasite haploid genome. The estimated total GC content was 50.9%, with a high representation of A and T di- and trinucleotide repeats. Out of the estimated 5000 parasite genes, 947 putative new genes were identified. Another 1723 sequences corresponded to genes detected previously in T. cruzi through expression sequence tag analysis. 7735 sequences had no matches in the database, but the presence of open reading frames that passed Fickett's test suggests that some might contain coding DNA. The survey was highly redundant, with approximately 35% of the sequences included in a few large sequence families. Some of them code for protein families present in dozens of copies, including proteins essential for parasite survival and retrotransposons. Other sequence families include repetitive DNA present in thousands of copies per haploid genome. Some families in the latter group are new, parasite-specific, repetitive DNAs. These results suggest that T. cruzi could constitute an interesting model to analyze gene and genome evolution due to its plasticity in terms of sequence amplification and divergence. Additional information can be found at http://www.iib.unsam.edu.ar/tcruzi.gss. html.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号