首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 416 毫秒
1.
The aim of the present study was to generate hypotheses on the involvement of uncharacterized genes in biological processes. To this end, supervised learning was used to analyze microarray-derived time-series gene expression data. Our method was objectively evaluated on known genes using cross-validation and provided high-precision Gene Ontology biological process classifications for 211 of the 213 uncharacterized genes in the data set used. In addition, new roles in biological process were hypothesized for known genes. Our method uses biological knowledge expressed by Gene Ontology and generates a rule model associating this knowledge with minimal characteristic features of temporal gene expression profiles. This model allows learning and classification of multiple biological process roles for each gene and can predict participation of genes in a biological process even though the genes of this class exhibit a wide variety of gene expression profiles including inverse coregulation. A considerable number of the hypothesized new roles for known genes were confirmed by literature search. In addition, many biological process roles hypothesized for uncharacterized genes were found to agree with assumptions based on homology information. To our knowledge, a gene classifier of similar scope and functionality has not been reported earlier.  相似文献   

2.
Gene expression data are the representation of nonlinear interactions among genes and environmental factors. Computing analysis of these data is expected to gain knowledge of gene functions and disease mechanisms. Clustering is a classical exploratory technique of discovering similar expression patterns and function modules. However, gene expression data are usually of high dimensions and relatively small samples, which results in the main difficulty for the application of clustering algorithms. Principal component analysis (PCA) is usually used to reduce the data dimensions for further clustering analysis. While PCA estimates the similarity between expression profiles based on the Euclidean distance, which cannot reveal the nonlinear connections between genes. This paper uses nonlinear dimensionality reduction (NDR) as a preprocessing strategy for feature selection and visualization, and then applies clustering algorithms to the reduced feature spaces. In order to estimate the effectiveness of NDR for capturing biologically relevant structures, the comparative analysis between NDR and PCA is exploited to five real cancer expression datasets. Results show that NDR can perform better than PCA in visualization and clustering analysis of complex gene expression data.  相似文献   

3.
Gene expression patterns vary dramatically in a tissue-specific and age-dependent manner. RNA-binding proteins that regulate mRNA turnover and/or translation (TTR-RBPs) critically affect the subsets of expressed proteins. Although many proteins implicated in age-related processes are encoded by mRNAs that are targets of TTR-RBPs, very little is known regarding the tissue- and age-dependent expression of TTR-RBPs in humans. Recent analysis of TTR-RBPs expression using human tissue microarray has provided us interesting insight into their possibly physiologic roles as a function of age. This analysis has also revealed striking discrepancies between the levels of TTR-RBPs in senescent human diploid fibroblasts (HDFs), widely used as an in vitro model of aging, and the levels of TTR-RBPs in tissues from individuals of advancing age. In this article, we will review our knowledge of human TTR-RBP expression in different tissues as a function of age.  相似文献   

4.
5.
Targeted discovery of novel human exons by comparative genomics   总被引:2,自引:0,他引:2       下载免费PDF全文
A complete and accurate set of human protein-coding gene annotations is perhaps the single most important resource for genomic research after the human-genome sequence itself, yet the major gene catalogs remain incomplete and imperfect. Here we describe a genome-wide effort, carried out as part of the Mammalian Gene Collection (MGC) project, to identify human genes not yet in the gene catalogs. Our approach was to produce gene predictions by algorithms that rely on comparative sequence data but do not require direct cDNA evidence, then to test predicted novel genes by RT-PCR. We have identified 734 novel gene fragments (NGFs) containing 2188 exons with, at most, weak prior cDNA support. These NGFs correspond to an estimated 563 distinct genes, of which >160 are completely absent from the major gene catalogs, while hundreds of others represent significant extensions of known genes. The NGFs appear to be predominantly protein-coding genes rather than noncoding RNAs, unlike novel transcribed sequences identified by technologies such as tiling arrays and CAGE. They tend to be expressed at low levels and in a tissue-specific manner, and they are enriched for roles in motor activity, cell adhesion, connective tissue, and central nervous system development. Our results demonstrate that many important genes and gene fragments have been missed by traditional approaches to gene discovery but can be identified by their evolutionary signatures using comparative sequence data. However, they suggest that hundreds-not thousands-of protein-coding genes are completely missing from the current gene catalogs.  相似文献   

6.
The human major histocompatibility (MHC) genomic region at chromosomal position 6p21 encodes the six classical transplantation HLA genes and many other genes that have important roles in the regulation of the immune system as well as in some fundamental cellular processes. This small segment of the human genome has been associated with more than 100 diseases, including common diseases--such as diabetes, rheumatoid arthritis, psoriasis, asthma and various autoimmune disorders. The MHC 3.6 Mb genomic sequence was first reported in 1999 with the annotation of 224 gene loci. The locus and allelic information of the MHC continue to be updated by identifying newly mapped expressed genes and pseudogenes based on comparative genomics, SNP analysis and cDNA projects. Since 1999, new innovations in bioinformatics and gene-specific functional databases and studies on the MHC genes have resulted in numerous changes to gene names and better ways to update and link the MHC gene symbols, names and sequences together with function, variation and disease associations. In this study, we present a brief overview of the MHC genomic structure and the recent information that we have gathered on the MHC gene loci via LocusLink at the National Centre for Biological Information (http://www.ncbi.nih.gov/.) and the MHC genes' association with various diseases taken from publications and records in public databases, such as the Online Mendelian Inheritance in Man and the Genetic Association Database.  相似文献   

7.
8.
Summary. Plant viruses containing a Triple Gene Block (TGB) movement protein gene cassette fall into two classes. We have shown previously that the third TGB protein (TGBp3) of beet necrotic yellow vein virus (BNYVV; Class 1) and peanut clump virus (Class 1) inhibit BNYVV intercellular movement when expressed from a co-inoculated BNYVV RNA 3-based replicon. Here we show that autonomous expression of TGBp3’s of four other Class 1 viruses of various genera also inhibits BNYVV movement. No such effect was observed for four Class 2 virus TGBp3’s, suggesting that the roles of Class 1 and 2 TGBp3’s in movement differ significantly.  相似文献   

9.
BACKGROUND: Text-mining has been used to link biomedical concepts, such as genes or biological processes, to each other for annotation purposes or the generation of new hypotheses. To relate two concepts to each other several authors have used the vector space model, as vectors can be compared efficiently and transparently. Using this model, a concept is characterized by a list of associated concepts, together with weights that indicate the strength of the association. The associated concepts in the vectors and their weights are derived from a set of documents linked to the concept of interest. An important issue with this approach is the determination of the weights of the associated concepts. Various schemes have been proposed to determine these weights, but no comparative studies of the different approaches are available. Here we compare several weighting approaches in a large scale classification experiment. METHODS: Three different techniques were evaluated: (1) weighting based on averaging, an empirical approach; (2) the log likelihood ratio, a test-based measure; (3) the uncertainty coefficient, an information-theory based measure. The weighting schemes were applied in a system that annotates genes with Gene Ontology codes. As the gold standard for our study we used the annotations provided by the Gene Ontology Annotation project. Classification performance was evaluated by means of the receiver operating characteristics (ROC) curve using the area under the curve (AUC) as the measure of performance. RESULTS AND DISCUSSION: All methods performed well with median AUC scores greater than 0.84, and scored considerably higher than a binary approach without any weighting. Especially for the more specific Gene Ontology codes excellent performance was observed. The differences between the methods were small when considering the whole experiment. However, the number of documents that were linked to a concept proved to be an important variable. When larger amounts of texts were available for the generation of the concepts' vectors, the performance of the methods diverged considerably, with the uncertainty coefficient then outperforming the two other methods.  相似文献   

10.
11.
Generation of a high-density rat EST map   总被引:2,自引:2,他引:0       下载免费PDF全文
We have developed a high-density EST map of the rat, consisting of >11,000 ESTs. These ESTs were placed on a radiation hybrid framework map of genetic markers spanning all 20 rat autosomes, plus the X chromosome. The framework maps have a total size of approximately 12,400 cR, giving an average correspondence of 240 kb/cR. The frameworks are all LOD 3 chromosomal maps consisting of 775 radiation-hybrid-mapped genetic markers and ESTs. To date, we have generated radiation-hybrid-mapping data for >14,000 novel ESTs identified by our Rat Gene Discovery and Mapping Project (http://ratEST.uiowa.edu), from which we have placed >11,000 on our framework maps. To minimize mapping errors, ESTs were mapped in duplicate and consensus RH vectors produced for use in the placement procedure. This EST map was then used to construct high-density comparative maps between rat and human and rat and mouse. These maps will be a useful resource for positional cloning of genes for rat models of human diseases and in the creation and verification of a tiling set of map order for the upcoming rat-genome sequencing.  相似文献   

12.
Constitutional obesity and mental retardation cooccur in several multiple congenital anomaly syndromes, including Prader–Willi syndrome, Bardet–Biedl syndrome, Cohen syndrome, Albright hereditary osteodystrophy, and Borjeson–Forssman–Lehmann syndrome as well as some rarer disorders. Although hypothalamic–pituitary axis abnormalities are thought to be a possible causative mechanism in some of these disorders, current knowledge is insufficient to explain the pathophysiologic mechanism of obesity in most multiple congenital anomaly/mental retardation syndromes. The chromosomal location of many of these syndromes is known, and studies are ongoing to identify the causative genes. Further delineation of the functions of the underlying genes will likely be instructive regarding mechanisms of appetite, satiety, and obesity in the general population. This review details current knowledge of the clinical and molecular genetic findings of multiple congenital anomaly/mental retardation syndromes associated with intrinsic obesity in an effort to delineate causative mechanisms and genetic abnormalities contributing to obesity.  相似文献   

13.
Clustering is widely used in bioinformatics to find gene correlation patterns. Although many algorithms have been proposed, these are usually confronted with difficulties in meeting the requirements of both automation and high quality. In this paper, we propose a novel algorithm for clustering genes from their expression profiles. The unique features of the proposed algorithm are twofold: it takes into consideration global, rather than local, gene correlation information in clustering processes; and it incorporates clustering quality measurement into the clustering processes to implement non-parametric, automatic and global optimal gene clustering. The evaluation on simulated and real gene data sets demonstrates the effectiveness of the algorithm.  相似文献   

14.
The Mediator complex functions as a control center, orchestrating diverse signaling, gene activities, and biological processes. However, how Mediator subunits determine distinct cell fates remains to be fully elucidated. Here, we show that Mediator MED23 controls the cell fate preference that directs differentiation into smooth muscle cells (SMCs) or adipocytes. Med23 deficiency facilitates SMC differentiation but represses adipocyte differentiation from the multipotent mesenchymal stem cells. Gene profiling revealed that the presence or absence of Med23 oppositely regulates two sets of genes: the RhoA/MAL targeted cytoskeleton/SMC genes and the Ras/ELK1 targeted growth/adipogenic genes. Mechanistically, MED23 favors ELK1–SRF binding to SMC gene promoters for repression, whereas the lack of MED23 favors MAL–SRF binding to SMC gene promoters for activation. Remarkably, the effect of MED23 on SMC differentiation can be recapitulated in zebrafish embryogenesis. Collectively, our data demonstrate the dual, opposing roles for MED23 in regulating the cytoskeleton/SMC and growth/adipogenic gene programs, suggesting its “Ying-Yang” function in directing adipogenesis versus SMC differentiation.  相似文献   

15.
目的:利用全基因组表达谱芯片筛查与卵巢浆液性囊腺癌发生相关的基因,对在卵巢浆液性囊腺癌发生过程中可能参与的基因间的信号转导通路进行分析。方法:选取癌症基因组图谱(TCGA)数据库中卵巢浆液性囊腺癌的Affymetrix Gene Chip Human Exon 1.0 ST Array数据共16张,分别为卵巢浆液性囊腺癌组8张和正常组8张,筛选出差异表达基因,并进行基因本体(gene ontology,GO)分析和信号通路分析,构建卵巢浆液性囊腺癌相关基因间的信号转导通路,分析网络中具有重要作用的基因。结果:共筛选出1 144个在卵巢癌中差异表达的基因,其中表达上调的基因有747个,表达下调的基因有397个。GO分析得到上调差异基因的显著性功能分析结果362项,下调差异基因的显著性功能分析结果 160项(P0.05)。其中包括与肿瘤发生相关的基因功能有细胞周期、DNA复制、细胞增殖、细胞凋亡、细胞黏附等。信号通路分析得到45个显著上调信号通路和14个显著下调信号通路(P0.05)。其中参与肿瘤发生相关的信号通路主要有细胞周期、P53信号通路、DNA复制、肿瘤中的信号通路、PI3K-Akt信号通路、ECM-receptor信号通路、细胞黏附因子、细胞凋亡等。挑选显著性基因功能和信号通路分析的交集基因229个,构建显著性GO与信号通路基因间信号转导网络。分析发现CDK1、PLK1、MCM3和PGK1这4个基因在卵巢癌的基因调控网络中具有重要作用。结论:卵巢浆液性囊腺癌中有大量差异表达基因,差异表达的基因在多个与肿瘤发生密切相关的信号通路中发挥重要的调控作用。  相似文献   

16.
The National Drug File – Reference Terminology (NDF-RT) is a large and complex drug terminology consisting of several classification hierarchies on top of an extensive collection of drug concepts. These hierarchies provide important information about clinical drugs, e.g., their chemical ingredients, mechanisms of action, dosage form and physiological effects. Within NDF-RT such information is represented using tens of thousands of roles connecting drugs to classifications. In previous studies, we have introduced various kinds of Abstraction Networks to summarize the content and structure of terminologies in order to facilitate their visual comprehension, and support quality assurance of terminologies.However, these previous kinds of Abstraction Networks are not appropriate for summarizing the NDF-RT classification hierarchies, due to its unique structure. In this paper, we present the novel Ingredient Abstraction Network (IAbN) to summarize, visualize and support the audit of NDF-RT’s Chemical Ingredients hierarchy and its associated drugs. A common theme in our quality assurance framework is to use characterizations of sets of concepts, revealed by the Abstraction Network structure, to capture concepts, the modeling of which is more complex than for other concepts. For the IAbN, we characterize drug ingredient concepts as more complex if they belong to IAbN groups with multiple parent groups. We show that such concepts have a statistically significantly higher rate of errors than a control sample and identify two especially common patterns of errors.  相似文献   

17.
Channels and developmental genes belong to the molecular key players in the human central nervous system (CNS). Mutations in these genes often cause monogenic neurological disease and interspecies comparisons had shown reduced divergence. On the other hand, accelerated evolution of genes with roles in neurotransmission and development had indicated widespread positive selection in hominids. In the present study, we hypothesized that recombination hotspots could be enriched at genes with particularly important role in the CNS, because at those loci beneficial mutations may occur on a highly constrained background and consequently increased recombination could promote their fixation. To test this hypothesis, we retrieved CNS genes based on keyword search, expression data and expert knowledge. Consistent with our hypothesis, we find an enrichment of hotspot predictions around genes that are retrieved by all three strategies. Moreover, when comparing human genes based on their Gene Ontology annotations, we find hotspot predictions preferentially located around channels and neurodevelopmental genes. Taken together with the distinct sequence evolution that was reported by comparative genomic studies, this finding indicates continued positive selection at many CNS gene loci. In support of this interpretation, we also find an enrichment of recombination hotspot predictions around conserved noncoding regions that were reported to display a signature of accelerated evolution in the human lineage. Widespread positive selection acting on CNS gene loci could relate to the high prevalence of human nervous system disorders with genetically complex inheritance, potentially under an ancestral susceptibility allele model.  相似文献   

18.
The Foundational Model of Anatomy (FMA) ontology is a domain reference ontology based on a disciplined modeling approach. Due to its large size, semantic complexity and manual data entry process, errors and inconsistencies are unavoidable and might remain within the FMA structure without detection. In this paper, we present computable methods to highlight candidate concepts for various relationship assignment errors. The process starts with locating structures formed by transitive structural relationships (part_of, tributary_of, branch_of) and examine their assignments in the context of the IS-A hierarchy. The algorithms were designed to detect five major categories of possible incorrect relationship assignments: circular, mutually exclusive, redundant, inconsistent, and missed entries. A domain expert reviewed samples of these presumptive errors to confirm the findings. Seven thousand and fifty-two presumptive errors were detected, the largest proportion related to part_of relationship assignments. The results highlight the fact that errors are unavoidable in complex ontologies and that well designed algorithms can help domain experts to focus on concepts with high likelihood of errors and maximize their effort to ensure consistency and reliability. In the future similar methods might be integrated with data entry processes to offer real-time error detection.  相似文献   

19.
《Research in microbiology》2017,168(6):503-514
Measuring gene expression at the single cell and single molecule level has recently made possible the quantitative measurement of stochasticity of gene expression. This enables identification of the probable sources and roles of noise. Gene expression noise can result in bacterial population heterogeneity, offering specific advantages for fitness and survival in various environments. This trait is therefore selected during the evolution of the species, and is consequently regulated by a specific genetic network architecture. Examples exist in stress-response mechanisms, as well as in infection and pathogenicity strategies, pointing to advantages for multicellularity of bacterial populations.  相似文献   

20.
目的 从分子水平揭示富亮氨酸重复激酶2(LRRK2)基因G2019S突变帕金森病的发病机制,为临床诊断及治疗提供新思路。 方法 在公共基因芯片数据库(GEO)中下载LRRK2基因G2019S突变帕金森病的相关基因芯片数据(GSE22491),其中LRRK2(G2019S)突变帕金森病样本10 例,正常控制组样本8 例,利用Qlucore Omics Explorer(QOE)3.0 软件、DAVID、STRING等在线分析软件对LRRK2基因G2019S突变帕金森病差异基因进行生物信息学分析。结果 QOE3.0分析筛选出1752个LRRK2基因G2019S突变帕金森病差异基因,其中上调191个,下调1561个。对其进行生物信息学分析发现,SKP2、RBX1、SKP1、CUL1、CUL4A 等基因以及核糖体信号通路、氧化磷酸化信号通路、蛋白酶体信号通路、白细胞跨内皮迁移信号通路、磷酸戊糖途径信号通路、枸橼酸信号通路、Fcγ受体(FcγR)介导的吞噬通路等在LRRK2基因G2019S突变帕金森病的发生发展中可能起着重要作用。 结论 通过生物信息学分析LRRK2基因G2019S突变帕金森病相关基因芯片数据,提示LRRK2基因G2019S突变帕金森病发病是多种基因、多种分子机制相互作用的结果,对相关分子机制的进一步分析有利于揭示LRRK2基因G2019S突变帕金森病的发病机制。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号