首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Multiple sequence alignment: in pursuit of homologous DNA positions   总被引:2,自引:0,他引:2       下载免费PDF全文
DNA sequence alignment is a prerequisite to virtually all comparative genomic analyses, including the identification of conserved sequence motifs, estimation of evolutionary divergence between sequences, and inference of historical relationships among genes and species. While it is mere common sense that inaccuracies in multiple sequence alignments can have detrimental effects on downstream analyses, it is important to know the extent to which the inferences drawn from these alignments are robust to errors and biases inherent in all sequence alignments. A survey of investigations into strengths and weaknesses of sequence alignments reveals, as expected, that alignment quality is generally poor for two distantly related sequences and can often be improved by adding additional sequences as stepping stones between distantly related species. Errors in sequence alignment are also found to have a significant negative effect on subsequent inference of sequence divergence, phylogenetic trees, and conserved motifs. However, our understanding of alignment biases remains rudimentary, and sequence alignment procedures continue to be used somewhat like benign formatting operations to make sequences equal in length. Because of the central role these alignments now play in our endeavors to establish the tree of life and to identify important parts of genomes through evolutionary functional genomics, we see a need for increased community effort to investigate influences of alignment bias on the accuracy of large-scale comparative genomics.  相似文献   

2.
Knowledge of the origin and evolution of viruses could provide a better understanding of a number of phenomena in the field of evolution such as the origin and development of multi-cellular organisms, the rapid diversification of species over the last 600-700 million years and the lack of transitional forms in the evolution of species ("missing links") etc. One of the possible effects of escaped DNA/RNA sequences or viruses on the evolution of multi-cellular organisms, especially vertebrates, could be the phenomenon of horizontal transmission and dissemination of genes. Interestingly, if so, this effect could be considered as a model of primeval and natural genetic engineering. Other possible links between the evolution of multi-cellular organisms and viruses are connected with the fact that viruses represent the source of different forms of selective pressure such as epidemics of infectious diseases, autoimmunity, malignant alteration, reproductive efficiency, etc. At the same time, these two models of "long-term evolutionary relations" could represent "key factors" in the evolution between viruses and multi-cellular organisms. The capability of a genome to produce and emit DNA/RNA sequences or de novo created viruses which can be a vector of genes horizontal transmission and/or cause selective pressure on concurrent or predator species gives a new characteristic to viruses--the possibility of their acting as natural biological weapons. Finally, possibly evolutionary advantages of this genome capability could be one of explanations for the phenomena such as genome instability and its ability to emit DNA/RNA sequences and/or de novo created viruses, as well as evolutionary conservation of this unique phenomena.  相似文献   

3.
Cellular automata are introduced as a model for DNA structure, function and evolution. DNA is modeled as a one-dimensional cellular automaton with four states per cell. These states are the four DNA bases A, C, T and G. The four states are represented by numbers of the quaternary number system. Linear evolution rules, represented by square matrices, are considered. Based on this model a simulator of DNA evolution is developed and simulation results are presented. This simulator has a user-friendly input interface and can be used for the study of DNA evolution.  相似文献   

4.
The family Sarcocystidae contains a wide variety of parasitic protozoa, some of which are important pathogens of livestock and humans. The taxonomic relationships between two of the genera in this family (Toxoplasma andSarcocystis) have been debated for a number of years and remain controversial. Recent studies, from comparisons of 18S rDNA-sequence data, have suggested thatSarcocystis is paraphyletic, although a hypothesis supporting monophyly ofSarcocystis could not be rejected. The present study shows that the phylogenetically informative nucleotide positions within the 18S rDNA are primarily located in the regions that make up the helices in the secondary structure of the 18S rRNA. A phylogenetic analysis of 18S rDNA-sequence data aligned by secondary structure constraints, or a subset of the data corresponding to all nucleotides found in the helices, provide unambiguous evidence supporting monophyly ofSarcocystis.  相似文献   

5.
Thousands of species will be sequenced in the next few years; however, understanding how their genomes work, without an unlimited budget, requires both molecular and novel evolutionary approaches. We developed a sensitive sequence alignment pipeline to identify conserved noncoding sequences (CNSs) in the Andropogoneae tribe (multiple crop species descended from a common ancestor ∼18 million years ago). The Andropogoneae share similar physiology while being tremendously genomically diverse, harboring a broad range of ploidy levels, structural variation, and transposons. These contribute to the potential of Andropogoneae as a powerful system for studying CNSs and are factors we leverage to understand the function of maize CNSs. We found that 86% of CNSs were comprised of annotated features, including introns, UTRs, putative cis-regulatory elements, chromatin loop anchors, noncoding RNA (ncRNA) genes, and several transposable element superfamilies. CNSs were enriched in active regions of DNA replication in the early S phase of the mitotic cell cycle and showed different DNA methylation ratios compared to the genome-wide background. More than half of putative cis-regulatory sequences (identified via other methods) overlapped with CNSs detected in this study. Variants in CNSs were associated with gene expression levels, and CNS absence contributed to loss of gene expression. Furthermore, the evolution of CNSs was associated with the functional diversification of duplicated genes in the context of maize subgenomes. Our results provide a quantitative understanding of the molecular processes governing the evolution of CNSs in maize.

The genomes of a million eukaryote species will likely be sequenced within the next decade (Lewin et al. 2018), but understanding how these genomes work without ENCODE-scale projects and data (The ENCODE Project Consortium 2012) for each species will require that we also use evolutionary approaches to identify key functional regions. In general, noncoding sequences (CNSs) occupy a larger portion of the genome than coding regions. Most genome-wide association hits have been reported to be located in the noncoding regions in, for example, maize and humans, and are enriched in putative gene expression regulatory sequences (Wallace et al. 2014; Zhang and Lupski 2015; Nishizaki and Boyle 2017; Giral et al. 2018). Comparison of noncoding sequences across species can identify regions under purifying selection to reveal functional constraint (Guo and Moose 2003; Vandepoele et al. 2006; Haudry et al. 2013; Finucane et al. 2015; Polychronopoulos et al. 2017; Xiang et al. 2019). However, detection of conserved noncoding sequences in plants is an ongoing challenge (Van de Velde et al. 2016), receiving extensive recent attention in a broad range of species (Inada et al. 2003; Freeling and Subramaniam 2009; Algama et al. 2017; Polychronopoulos et al. 2017; Xie et al. 2018). A genome-wide comparison of features of putative functional elements (Zhang et al. 2012; Rodgers-Melnick et al. 2016; Oka et al. 2017; Wang et al. 2017b; Li et al. 2019; Lu et al. 2019; Ricci et al. 2019; Tu et al. 2020) with CNSs could provide new insight into understudied noncoding fractions of the genome.Genomes of the grass tribe Andropogoneae provide a valuable and powerful system for the study of conserved sequences. Species of the Andropogoneae tribe have diverged in a relatively short time frame, sharing a common ancestor ∼16–20 million years ago (Vicentini et al. 2008). Andropogoneae species include maize, sorghum, sugarcane, and silvergrass, some of the most productive grain, sugar, and biofuel crops worldwide (Manners 2011; Brosse et al. 2012). Andropogoneae species share the NADP-ME C4 photosynthesis system (Black et al. 1969; Sage and Zhu 2011) and similar development patterns, whereas their genomes are highly diverse with frequent polyploidization (Estep et al. 2014) and extremely active transposable elements (TEs) (Ramachandran et al. 2020). Nevertheless, despite rapid sequence turnover elsewhere in Andropogoneae genomes, functional sequences are expected to be under purifying selection, making the tribe an ideal system in which to identify and understand the role of CNSs.  相似文献   

6.
DNA sequencing reveals that the genomes of the human, gorilla and chimpanzee share more than 98% homology. Comparative chromosome painting and gene mapping have demonstrated that only a few rearrangements of a putative ancestral mammalian genome occurred during great ape and human evolution. However, interspecies representational difference analysis (RDA) of the gorilla between human and gorilla revealed gorilla-specific DNA sequences. Cloning and sequencing of gorilla-specific DNA sequences indicate that there are repetitive elements. Gorilla-specific DNA sequences were mapped by fluorescence in-situ hybridization (FISH) to the subcentromeric/centromeric regions of three pairs of gorilla submetacentric chromosomes. These sequences could represent either ancient sequences that got lost in other species, such as human and orang-utan, or, more likely, recent sequences which evolved or originated specifically in the gorilla genome.  相似文献   

7.
MAVID: constrained ancestral alignment of multiple sequences   总被引:11,自引:7,他引:11       下载免费PDF全文
Bray N  Pachter L 《Genome research》2004,14(4):693-699
We describe a new global multiple-alignment program capable of aligning a large number of genomic regions. Our progressive-alignment approach incorporates the following ideas: maximum-likelihood inference of ancestral sequences, automatic guide-tree construction, protein-based anchoring of ab-initio gene predictions, and constraints derived from a global homology map of the sequences. We have implemented these ideas in the MAVID program, which is able to accurately align multiple genomic regions up to megabases long. MAVID is able to effectively align divergent sequences, as well as incomplete unfinished sequences. We demonstrate the capabilities of the program on the benchmark CFTR region, which consists of 1.8 Mb of human sequence and 20 orthologous regions in marsupials, birds, fish, and mammals. Finally, we describe two large MAVID alignments, an alignment of all the available HIV genomes and a multiple alignment of the entire human, mouse, and rat genomes.  相似文献   

8.
Hepatitis C virus (HCV) genotyping of samples from 184 patients with chronic HCV infection by the Trugene 5'NC genotyping kit, based on sequence analysis of the 5' noncoding region (5' NCR), and the InnoLiPA assay was evaluated. In addition to these methods, the 184 samples were also analyzed by sequencing of part of the NS5B of the HCV genome after in-house PCR amplification, as a means of validating results obtained with the 5' NCR. The distribution of the genotypes typed by NS5B sequence analysis was as follows: 1a, 41 samples; 1b, 58 samples; 1d, 1 sample; 2a, 5 samples; 2b, 2 samples; 2c, 7 samples; 3a, 46 samples; 4a, 7 samples; 4c, 1 samples; 4e, 9 samples; 5a, 6 samples; 6a, 1 sample. The Trugene and InnoLiPA assays gave concordant results within HCV types in 100% of cases. The ability to discriminate at the subtype level was 76 and 74% for the Trugene and the InnoLiPA assays, respectively.  相似文献   

9.
10.
Endothelial and smooth muscle dysfunctions are widely implicated in the pathogenesis of atherosclerosis. Modern mechanical and pharmacologic treatments aim to remodel abnormalities of the vessel intima and media. We hypothesize that adventitial dysfunction comprises the dominant source of atherosclerosis by originating many endothelial and smooth muscle abnormalities. The autonomic nervous system innervates the adventitia, and autonomic dysfunction induces many end-organ dysfunctions including inflammation and thrombosis. The link between diabetes and atherosclerosis may operate through adventitial autonomic neuropathy. Smoking may promote atherosclerosis by inducing adventitial autonomic dysfunction related to nicotine-mediated compensatory upregulation of sympathetic bias independent of endothelial injury induced by purported tobacco toxins. While hypertension is thought to cause atherosclerosis, the two conditions may instead represent independent consequences of autonomic dysfunction. The link between aging and atherosclerosis may operate through adventitial dysfunction induced by autonomic dysregulations. Exercise may ameliorate atherosclerosis by restoring adventitial autonomic function, thereby normalizing adventitial regulation of medial and intimal biology. Feed-forward adventitial vascular baroreceptor and chemoreceptor dysregulation may further exacerbate atherosclerosis as intimal plaque interferes with these sensors. Since penetrating external physical injury likely represented a dominant selective force during evolution, the adventitia may be preferentially equipped with sensors and response systems for vessel trauma. The convergent response of adrenergia, inflammation, and coagulation, which is adaptive for physical trauma, may be maladaptive today when different stressors trigger the cascade. Endoluminal therapies including atherectomy, angioplasty, and stent deployment involve balloon expansion that traumatizes all layers of the vessel wall. These interventions may paradoxically reinitiate the cascade of atherogenesis that begins with adventitial dysfunction and leads to restenosis. Methods to reduce adventitial trauma, a maladaptive trigger of adventitial dysfunction, may reduce the risk of restenosis. We envision novel mechanical and biopharmaceutical solutions that target the adventitia to prevent or treat atherosclerosis including novel drug delivery strategies, exo-stents that wrap vessels, and neuromodulation of vessels.  相似文献   

11.
Summary. Cladograms of iridoviruses were inferred from bootstrap analysis of molecular data sets comprising all published protein and DNA sequences of the major capsid protein, ATPase and DNA polymerase genes of members of the Iridoviridae family Iridovirus. All data sets yielded cladograms supporting the separation of the Iridovirus, Ranavirus and Lymphocystivirus genera, and the cladogram based on data derived from major capsid proteins further divided both the Iridovirus and Ranavirus genera into two groups. Tests of alternative hypotheses of topological constraints were also performed to further investigate relationships between infectious spleen and kidney necrosis virus (ISKNV), an unclassified fish iridovirus for which the complete genome sequence data is available, and other iridoviruses. Cladograms inferred and results of Shimodaira–Hasegawa tests indicated that ISKNV is more closely related to the Ranavirus genus than it is to the other genera of the family.Received November 21, 2002; accepted June 9, 2003 Published online August 18, 2003  相似文献   

12.
Based on digital signal method, we propose a new representation of DNA primary sequence. The representation can completely avoid loss of information in the transfer of data from a DNA sequence to its mathematical representation. Afterwards, we suggest one such approach to reach quantification of similarities based on digital signal similarity theory. The examination of similarities/dissimilarities among the coding sequences of the first exon of β-globin gene of 11 species shows the utility of the scheme.  相似文献   

13.
In this work, we deal with temporal abstraction of clinical data. Abstractions are, for example, blood pressure state (e.g. normal, high, low) and trend (e.g. increasing, decreasing and stationary) over time intervals. The goal of our work is to provide clinicians with automatic tools to extract high-level, concise, important features of available collections of time-stamped clinical data. This capability is especially important when the available collections constantly increase in size, as in long-term clinical follow-up, leading to information overload. The approach we propose exploits the integration of the deductive and object-oriented approaches in clinical databases. The main result of this work is an object-oriented data model based on the event calculus to support temporal abstraction. The proposed approach has been validated building the CARDIOTABS system for the abstraction of clinical data collected during echocardiographic tests.  相似文献   

14.
As genomes evolve, they undergo large-scale evolutionary processes that present a challenge to sequence comparison not posed by short sequences. Recombination causes frequent genome rearrangements, horizontal transfer introduces new sequences into bacterial chromosomes, and deletions remove segments of the genome. Consequently, each genome is a mosaic of unique lineage-specific segments, regions shared with a subset of other genomes and segments conserved among all the genomes under consideration. Furthermore, the linear order of these segments may be shuffled among genomes. We present methods for identification and alignment of conserved genomic DNA in the presence of rearrangements and horizontal transfer. Our methods have been implemented in a software package called Mauve. Mauve has been applied to align nine enterobacterial genomes and to determine global rearrangement structure in three mammalian genomes. We have evaluated the quality of Mauve alignments and drawn comparison to other methods through extensive simulations of genome evolution.  相似文献   

15.
Active Motif Finder (AMF) is a novel algorithmic tool, designed based on mutations in DNA sequences. Tools available at present for finding motifs are based on matching a given motif in the query sequence. AMF describes a new algorithm that identifies the occurrences of patterns which possess all kinds of mutations like insertion, deletion and mismatch. The algorithm is mainly based on the Alignment Score Matrix (ASM) computation by comparing input motif with full length sequence. Much of the effort in bioinformatics is directed to identify these motifs in the sequences of newly discovered genes. The proposed bio-tool serves as an open resource for analysis and useful for studying polymorphisms in DNA sequences. AMF can be searched via a user-friendly interface. This tool is intended to serve the scientific community working in the areas of chemical and structural biology, and is freely available to all users, at http://www.sastra.edu/scbt/amf/.  相似文献   

16.
Differentiation of Candida albicans and the recently described C. dubliniensis has proven difficult due to the high degree of phenotypic similarity of these species. The present study examines sequence variations in the ribosomal DNA (rDNA) intergenic transcribed spacer (ITS) regions of C. albicans (n = 5) and C. dubliniensis (n = 7) strains, with a view to identifying sequence differences that would enable consistent differentiation of these two species by restriction fragment length polymorphism (RFLP) analysis. The ITS1 and ITS2 regions, together with the entire 5.8S rRNA gene of the strains, were amplified by the polymerase chain reaction (PCR), using primers ITS1 and ITS4, PCR products from both C. albicans and C. dubliniensis were of similar size (around 540 bp); however, sequence analysis revealed over 20 consistent base differences between the products of the two species. On the basis of sequence variation, the restriction enzyme MspA1 I was selected and used to differentiate the PCR products of C. albicans and C. dubliniensis by RFLP analysis. MspA1 I yielded two discernible fragments from C. albicans PCR products, whilst those from C. dubliniensis appeared undigested, thereby providing an approach to differentiate the two species.  相似文献   

17.
T I Bonner  G J Todaro 《Virology》1980,103(1):217-227
The cellular DNA of species distantly related to the baboon have been tested for the presence of sequences related to baboon endogenous type C virus. Hybrids between viral cDNA and cellular DNA were detected using very low stringency hydroxyapatite conditions. The results confirm and extend the observation that the related sequences are more conserved in African primates than in non-African primates. Our data suggest that this result is due to unusual conservation of viral sequences in the African primates. Our assay is sufficiently sensitive to detect distantly related sequences in mouse type C viruses and in other primate endogenous viruses, including both the type C virus isolated from macaques (MAC-1) and the type D virus isolated from langurs (LAD-1). However, using baboon viral cDNA, we cannot distinguish human DNA from the DNA of several non-primate mammalian species.  相似文献   

18.
A complete Human polyomavirus 9 (HPyV9) genome, designated HPyV9 UF-1, was amplified by rolling circle DNA amplification from DNA extracted from the peripheral blood mononuclear cells (PBMC) of an AIDS patient. The noncoding control (enhancer/promoter) region (NCCR) of HPyV9 UF-1 has one less AML-1a binding site and three more potential Sp1/GC box binding sites than the NCCRs of two previously described HPyV9 genomes. Nucleotide polymorphisms within the coding regions result in two amino acid differences in the deduced VP2 and VP3 proteins of HPyV9 UF-1 relative to those of the two previously described HPyV9 genomes. Exhaustive attempts to detect HPyV9 in DNA samples extracted from the PBMC of 40 healthy humans and 9 other AIDS patients were unsuccessful, highlighting the need for improved search strategies and optimal specimens for the detection of HPyV9 in humans.  相似文献   

19.
The nuclear ribosomal DNA (rDNA) region spanning the first (ITS-1) and second (ITS-2) internal transcribed spacers was sequenced for 15 taxa of ascaridoid nematodes. The length of the ITS-1 and ITS-2 sequences in the 15 taxa ranged from 392–500 bp and 240–348 bp, respectively. While nucleotide variation of 0–2.9% in the ITS-1 and/or ITS-2 sequences was detected within taxa where multiple samples were sequenced, significantly higher level of nucleotide difference (9.4–66.6%) was detected between the taxa, except for Ascaris suum and A. lumbricoides whose taxonomic status remains uncertain. These interspecific differences were linked with the considerable size differences (0–108 bp) in the rDNA spacers. Phenograms based on the genetic differences among the 15 taxa showed some concordance with previous classification schemes derived from morphological data. Received: 29 January 2000 / Accepted: 16 March 2000  相似文献   

20.
Rabies, an acute progressive encephalomyelitis caused by viruses in the genus Lyssavirus, is one of the oldest known infectious diseases. Although dogs and other carnivores represent the greatest threat to public health as rabies reservoirs, it is commonly accepted that bats are the primary evolutionary hosts of lyssaviruses. Despite early historical documentation of rabies, molecular clock analyses indicate a quite young age of lyssaviruses, which is confusing. For example, the results obtained for partial and complete nucleoprotein gene sequences of rabies viruses (RABV), or for a limited number of glycoprotein gene sequences, indicated that the time of the most recent common ancestor (TMRCA) for current bat RABV diversity in the Americas lies in the seventeenth to eighteenth centuries and might be directly or indirectly associated with the European colonization. Conversely, several other reports demonstrated high genetic similarity between lyssavirus isolates, including RABV, obtained within a time interval of 25–50 years. In the present study, we attempted to re-estimate the age of several North American bat RABV lineages based on the largest set of complete and partial glycoprotein gene sequences compiled to date (n = 201) employing a codon substitution model. Although our results overlap with previous estimates in marginal areas of the 95 % high probability density (HPD), they suggest a longer evolutionary history of American bat RABV lineages (TMRCA at least 732 years, with a 95 % HPD 436–1107 years).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号