首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 531 毫秒
1.
Although microarray technology has emerged as a powerful tool to explore expression levels of thousands of genes or even complete genomes after exposure to toxicants, the functional interpretation of microarray data sets still represents a time-consuming and challenging task. Gene ontology (GO) and pathway mapping have both been shown to be powerful approaches to generate a global view of biological processes and cellular components impacted by toxicants. However, current methods only allow for comparisons across two experimental settings at one particular time point. In addition, the resulting annotations are presented in extensive gene lists with minimal or limited quantitative information, data that are crucial in the application of toxicogenomic data for risk assessment. To facilitate quantitative interpretation of dose- or time-dependent genomic data, we propose to use combined average raw gene expression values (e.g., intensity or ratio) of genes associated with specific functional categories derived from the GO database. We developed an extended program (GO-Quant) to extract quantitative gene expression values and to calculate the average intensity or ratio for those significantly altered by functional gene category based on MAPPFinder results. To demonstrate its application, we applied this approach to a previously published dose- and time-dependent toxicogenomic data set (J. F. Dillman et al., 2005, Chem. Res. Toxicol. 18, 28-34). Our results indicate that the above systems approach can describe quantitatively the degree to which functional gene systems change across dose or time. Additionally, this approach provides a robust measurement to illustrate results compared to single-gene assessments and enables the user to calculate the corresponding ED(50) for each specific functional GO term, important for risk assessment.  相似文献   

2.
A knowledge-based clustering algorithm driven by Gene Ontology   总被引:5,自引:0,他引:5  
We have developed an algorithm for inferring the degree of similarity between genes by using the graph-based structure of Gene Ontology (GO). We applied this knowledge-based similarity metric to a clique-finding algorithm for detecting sets of related genes with biological classifications. We also combined it with an expression-based distance metric to produce a co-cluster analysis, which accentuates genes with both similar expression profiles and similar biological characteristics and identifies gene clusters that are more stable and biologically meaningful. These algorithms are demonstrated in the analysis of MPRO cell differentiation time series experiments.  相似文献   

3.
Abstract

We have developed an algorithm for inferring the degree of similarity between genes by using the graph-based structure of Gene Ontology (GO). We applied this knowledge-based similarity metric to a clique-finding algorithm for detecting sets of related genes with biological classifications. We also combined it with an expression-based distance metric to produce a co-cluster analysis, which accentuates genes with both similar expression profiles and similar biological characteristics and identifies gene clusters that are more stable and biologically meaningful. These algorithms are demonstrated in the analysis of MPRO cell differentiation time series experiments.  相似文献   

4.
Identification and functional characterization of the genes in the human genome remain a major challenge. A principal source of publicly available information used for this purpose is the National Center for Biotechnology Information database of expressed sequence tags (dbEST), which contains over 4 million human ESTs. To extract the information buried in this data more effectively, we have developed a semiautomated method to mine dbEST for uncharacterized human genes. Starting with a single protein input sequence, a family of related proteins from all species is compiled. This entire family is then used to mine the human EST database for new gene candidates. Evaluation of putative new gene candidates in the context of a family of characterized proteins provides a framework for inference of the structure and function of the new genes. When applied to a test data set of 28 families within the major facilitator superfamily (MFS) of membrane transporters, our protocol found 73 previously characterized human MFS genes and 43 new MFS gene candidates. Development of this approach provided insights into the problems and pitfalls of automated data mining using public databases.  相似文献   

5.
Embryonic stem cell tests (EST) are considered promising alternative assays for developmental toxicity testing. Classical mouse derived assays (mEST) are being replaced by human derived assays (hEST), in view of their relevance for human hazard assessment. We have compared mouse and human neural ESTn assays for neurodevelopmental toxicity as to regulation of gene expression during cell differentiation in both assays. Commonalities were observed in a range of neurodevelopmental genes and gene ontology (GO) terms. The mESTn showed a higher specificity in neurodevelopment than the hESTn, which may in part be caused by necessary differences in test protocols. Moreover, gene expression responses to the anticonvulsant and human teratogen valproic acid were compared. Both assays detected pharmacological and neurodevelopmental gene sets regulated by valproic acid. Common significant expression changes were observed in a subset of homologous neurodevelopmental genes. We suggest that these genes and related GO terms may provide good candidates for robust biomarkers of neurodevelopmental toxicity in hESTn.  相似文献   

6.
7.
In this study, the authors propose a new feature selection scheme, the incremental forward feature selection, which is inspired by incremental reduced support vector machines. In their method, a new feature is added into the current selected feature subset if it will bring in the most extra information. This information is measured by using the distance between the new feature vector and the column space spanned by current feature subset. The incremental forward feature selection scheme can exclude highly linear correlated features that provide redundant information and might degrade the efficiency of learning algorithms. The method is compared with the weight score approach and the 1-norm support vector machine on two well-known microarray gene expression data sets, the acute leukemia and colon cancer data sets. These two data sets have a very few observations but huge number of genes. The linear smooth support vector machine was applied to the feature subsets selected by these three schemes respectively and obtained a slightly better classification results in the 1-norm support vector machine and incremental forward feature selection. Finally, the authors claim that the rest of genes still contain some useful information. The previous selected features are iteratively removed from the data sets and the feature selection and classification steps are repeated for four rounds. The results show that there are many distinct feature subsets that can provide enough information for classification tasks in these two microarray gene expression data sets.  相似文献   

8.
The use of genes for distinguishing classes of toxicity has become well established. In this paper we combine the reconstruction of a gene dysregulation network (GDN) with a classifier to assign unseen compounds to their appropriate class. Gene pairs in the GDN are dysregulated in the sense that they are linked by a common expression pattern in one class and differ in this pattern in another class. The classifier gives a quantitative measure on this difference by its prediction accuracy. As an in‐depth example, gene pairs were selected that were dysregulated between skin cells treated with either sensitizers or irritants. Pairs with known and novel markers were found such as HMOX1 and ZFAND2A, ATF3 and PPP1R15A, OXSR1 and HSPA1B, ZFP36 and MAFF. The resulting GDN proved biologically valid as it was well‐connected and enriched in known interactions, processes and common regulatory motifs for pairs. Classification accuracy was improved when compared with conventional classifiers. As the dysregulated patterns for heat shock responding genes proved to be distinct from those of other stress genes, we were able to formulate the hypothesis that heat shock genes play a specific role in sensitization, apart from other stress genes. In conclusion, our combined approach creates added value for classification‐based toxicogenomics by obtaining novel, well‐distinguishing and biologically interesting measures, suitable for the formulation of hypotheses on functional relationships between genes and their relevance for toxicity class differences. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

9.
A common objective in microarray experiments is to select genes that are differentially expressed between two classes (two treatment groups). Selection of differentially expressed genes involves two steps. The first step is to calculate a discriminatory score that will rank the genes in order of evidence of differential expressions. The second step is to determine a cutoff for the ranked scores. Summary indices of the receiver operating characteristic (ROC) curve provide relative measures for a ranking of differential expressions. This article proposes using the hypothesis-testing approach to compute the raw p-values and/or adjusted p-values for three ROC discrimination measures. A cutoff p-value can be determined from the (ranked) p-values or the adjusted p-values to select differentially expressed genes. To quantify the degree of confidence in the selected top-ranked genes, the conditional false discovery rate (FDR) over the selected gene set and the "Type I" (false positive) error probability for each selected gene are estimated. The proposed approach is applied to a public colon tumor data set for illustration. The selected gene sets from three ROC summary indices and the commonly used two-sample t-statistic are applied to the sample classification to evaluate the predictability of the four discrimination measures.  相似文献   

10.
11.
The automatic integration of information resources in the life sciences is one of the most challenging goals facing biomedical informatics today. Controlled vocabularies have played an important role in realizing this goal, by making it possible to draw together information from heterogeneous sources secure in the knowledge that the same terms will also represent the same entities on all occasions of use. One of the most impressive achievements in this regard is the Gene Ontology (GO), which is rapidly acquiring the status of a de facto standard in the field of gene and gene product annotations, and whose methodology has been much intimated in attempts to develop controlled vocabularies for shared use in different domains of biology. The GO Consortium has recognized, however, that its controlled vocabulary as currently constituted is marked by several problematic features - features which are characteristic of much recent work in bioinformatics and which are destined to raise increasingly serious obstacles to the automatic integration of biomedical information in the future. Here, we survey some of these problematic features, focusing especially on issues of compositionality and syntactic regimentation.  相似文献   

12.
13.
We previously reported that a 4.2 kb SacI-EcoRI DNA region from Streptomyces kasugaensis M338-M1, a kasugamycin (KSM) producer, included KSM transporter genes (kasKLM). As an extension of that study, a 3.7 kb Psti-SacI DNA region, located at 1.5 approximately 5.2 kb upstream of kasK, was cloned and sequenced, revealing three complete open reading frames, designated kasT, kasU and kasJ. The kasJ gene encodes a protein (KasJ) with a conserved dinucleotide (FAD)-binding motif Homology search for KasJ showed its similarity to NADH: N-amidino-scyllo-inosamine oxidoreductase (StsB) which is involved in biosynthesis of the streptidine moiety of streptomycin (SM) in S. griseus. The kasT gene encodes a DNA-binding protein (KasT), including a helix-turn-helix motif near the center of the sequence. This protein is similar in structure to a pathway-specific activator protein (StrR) that plays a role in regulating the SM biosynthesis gene cluster of S. griseus. A fusion protein (Trx-KasT) clearly showed DNA binding activity with the intergenic region of kasU-kasJ, suggesting that KasT is a pathway-specific regulator of the KSM biosynthesis gene cluster.  相似文献   

14.
The recent sequencing of mammalian genomes has driven the development of genomic technologies, including microarray-based gene expression profiling, that allow simultaneous measurement of the expression levels of thousands of genes. Gene expression profiling applied to toxicology (toxicogenomics) has the potential to reveal, holistically, the molecular pathways and cellular processes that mediate the adverse responses to a toxicant. However, the initial output of a toxicogenomics experiment consists of a list of genes whose expression is altered upon toxicant exposure. In order to interpret these data in a biological context, new bioinformatic methods must be developed to place gene expression changes in the context of the underlying pathways and processes affected. One emerging approach is the application of Gene Ontology (GO) mapping and pathway analysis to gene expression profiling data. The utility of this in mechanistic toxicology will be illustrated using examples in which GO mapping of toxicogenomic data has provided novel insights into the molecular mechanisms induced by exposure to xenoestrogens.  相似文献   

15.
BACKGROUND: Normalization and data quality control are two important aspects in microarray data analysis. Proper normalization and data quality control ensure that intensity ratios provide meaningful and accurate measurement of relative gene expression values. Control spots such as spikes and housekeeping genes with known concentrations in two channels are often used for calibrating experimental parameters. They provide valuable information about experimental variation which can be utilized for better normalization. They are also needed for proper normalization in cases that the most of the spots tend to change in one direction. In addition, it is desirable to include information on spot quality. Such information is available in a typical microarray data set, but is not fully utilized by existing normalization methods. RESULTS: We propose two extensions of the two-way semi-linear model (TW-SLM) for appropriately combining control genes and spot quality information in normalization. The first extension (TW-SLMC) is designed to systematically incorporate control spots in a semi-parametric model to calibrate estimated normalization curves so that the relative fold changes of gene expressions are accurately estimated. Extrapolation is not required in this approach. The second extension (TW-SLMQ) is proposed to incorporate spot quality measure into normalization. This approach down-weights spots with lower quality scores in normalization. These two extensions can be used simultaneously for normalizing a data set. Two microarray data sets are used to demonstrate the proposed methods. Availability: An R based computing package is developed for the proposed methods and available from the corresponding authors.  相似文献   

16.
Brand name confusion is one of the most common causes of drug-related errors. The aim of this study was to develop quantitative measures of similarity among brand names of drugs. We modified the fragmentary pattern-based measure, a measure of similarity for character strings based on the string resemblance system, to develop three novel measures of similarity, i.e., the head and tail-weighted fragmentary pattern-based measure (htfrag), visually weighted htfrag (vwhtfrag), and auditorily weighted htfrag (awhtfrag). The 227 pairs of brand names for which confusion errors have been reported were used as a positive control group. Ten sets of 2270 random pairs of brand names were generated as negative controls. Then we evaluated the measures developed by using the geometric mean of sensitivity and selectivity as an objective function, in comparison with two conventional measures of similarity based on the vector space model (cos1 and htco). The measures developed, htfrag, vwhtfrag, and awhtfrag, provided better discrimination with mean objective function values of 0.953, 0.962, and 0.940, respectively, which were higher than those for the conventional measures cos1 and htco (0.922 and 0.892, respectively). The rates of false-positives and false-negatives were 3.3-10.7% and 5.3-11.9% for cos1, respectively, while the rates for vwhtfrag were 4.8-5.9% and 2.2%, respectively. The measures of similarity developed may provide significant information to avoid drug-related errors associated with brand name confusion.  相似文献   

17.
18.
This study investigated the deleterious effects of the synthetic non-steroidal estrogen diethylstilbestrol (DES) on testicular Leydig cells and compared these effects with those of the natural estrogen 17β-estradiol (E2). For that purpose, we performed microarray analysis of a mouse Leydig cell line (TTE1) treated with these estrogens, followed by Gene Ontology (GO) analysis and parametric analysis of gene set enrichment (PAGE). Most notably, GO analysis revealed a significant decrease in the biological processes of the GO categories "DNA repair" and "apoptotic program" in DES-exposed cells. PAGE showed that "cell death," which is a superior GO category including apoptosis in the GO tree structure, significantly decreased in DES-exposed cells but significantly increased in E2-exposed cells. Interestingly, only 2 genes (Tia1 and Gas1) with altered expression patterns in the "cell death" category were common between DES- and E2-treated cells. The downregulation of apoptotic cell death pathways and DNA repair capability of DES-exposed cells implies that DES promotes carcinogenic processes more strongly than E2 does. These findings suggest that molecular events that occur following DES and E2 treatments differ substantially in Leydig cells, and that the effects of synthetic estrogen and natural estrogen differ more substantially than previously suspected.  相似文献   

19.
The continued success of genome sequencing projects has resulted in a wealth of information, but 40-50% of identified genes correspond to hypothetical proteins or proteins of unknown function. The functional annotation screening technology by NMR (FAST-NMR) screen was developed to assign a biological function for these unannotated proteins with a structure solved by the protein structure initiative. FAST-NMR is based on the premise that a biological function can be described by a similarity in binding sites and ligand interactions with proteins of known function. The resulting co-structure and functional assignment may provide a starting point for a drug discovery effort.  相似文献   

20.
Confusion of drug names is one of the most common causes of drug-related medical errors. A similarity measure of drug names, "vwhtfrag", was developed to discriminate whether drug name pairs are likely to cause confusion errors, and to provide information that would be helpful to avoid errors. The aim of the present study was to evaluate and improve vwhtfrag. Firstly, we evaluated the correlation of vwhtfrag with subjective similarity or error rate of drug name pairs in psychological experiments. Vwhtfrag showed a higher correlation to subjective similarity (college students: r=0.84) or error rate than did other conventional similarity measures (htco, cos1, edit). Moreover, name pairs that showed coincidences of the initial character strings had a higher subjective similarity than those which had coincidences of the end character strings and had the same vwhtfrag. Therefore, we developed a new similarity measure (vwhtfrag+), in which coincidence of initial character strings in name pairs is weighted by 1.53 times over coincidence of end character strings. Vwhtfrag+ showed a higher correlation to subjective similarity than did unmodified vwhtfrag. Further studies appear warranted to examine in detail whether vwhtfrag+ has superior ability to discriminate drug name pairs likely to cause confusion errors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号