首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Correspondence analysis is an explorative computational method for the study of associations between variables. Much like principal component analysis, it displays a low-dimensional projection of the data, e.g., into a plane. It does this, though, for two variables simultaneously, thus revealing associations between them. Here, we demonstrate the applicability of correspondence analysis to and high value for the analysis of microarray data, displaying associations between genes and experiments. To introduce the method, we show its application to the well-known Saccharomyces cerevisiae cell-cycle synchronization data by Spellman et al. [Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D. & Futcher, B. (1998) Mol. Biol. Cell 9, 3273-3297], allowing for comparison with their visualization of this data set. Furthermore, we apply correspondence analysis to a non-time-series data set of our own, thus supporting its general applicability to microarray data of different complexity, underlying structure, and experimental strategy (both two-channel fluorescence-tag and radioactive labeling).  相似文献   

2.
Coupled two-way clustering analysis of gene microarray data   总被引:27,自引:0,他引:27       下载免费PDF全文
We present a coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task. We present an algorithm, based on iterative clustering, that performs such a search. This analysis is especially suitable for gene microarray data, where the contributions of a variety of biological mechanisms to the gene expression levels are entangled in a large body of experimental data. The method was applied to two gene microarray data sets, on colon cancer and leukemia. By identifying relevant subsets of the data and focusing on them we were able to discover partitions and correlations that were masked and hidden when the full dataset was used in the analysis. Some of these partitions have clear biological interpretation; others can serve to identify possible directions for future research.  相似文献   

3.
In microarray data there are a number of biological samples, each assessed for the level of gene expression for a typically large number of genes. There is a need to examine these data with statistical techniques to help discern possible patterns in the data. Our technique applies a combination of mathematical and statistical methods to progressively take the data set apart so that different aspects can be examined for both general patterns and very specific effects. Unfortunately, these data tables are often corrupted with extreme values (outliers), missing values, and non-normal distributions that preclude standard analysis. We develop a robust analysis method to address these problems. The benefits of this robust analysis will be both the understanding of large-scale shifts in gene effects and the isolation of particular sample-by-gene effects that might be either unusual interactions or the result of experimental flaws. Our method requires a single pass and does not resort to complex "cleaning" or imputation of the data table before analysis. We illustrate the method with a commercial data set.  相似文献   

4.
We introduce a general technique for making statistical inference from clustering tools applied to gene expression microarray data. The approach utilizes an analysis of variance model to achieve normalization and estimate differential expression of genes across multiple conditions. Statistical inference is based on the application of a randomization technique, bootstrapping. Bootstrapping has previously been used to obtain confidence intervals for estimates of differential expression for individual genes. Here we apply bootstrapping to assess the stability of results from a cluster analysis. We illustrate the technique with a publicly available data set and draw conclusions about the reliability of clustering results in light of variation in the data. The bootstrapping procedure relies on experimental replication. We discuss the implications of replication and good design in microarray experiments.  相似文献   

5.
DNA microarrays represent an important new method for determining the complete expression profile of a cell. In "spotted" microarrays, slides carrying spots of target DNA are hybridized to fluorescently labeled cDNA from experimental and control cells and the arrays are imaged at two or more wavelengths. In this paper, we perform statistical analysis on images of microarrays and show that quantitating the amount of fluorescent DNA bound to microarrays is subject to considerable uncertainty because of large and small-scale intensity fluctuations within spots, nonadditive background, and fabrication artifacts. Pixel-by-pixel analysis of individual spots can be used to estimate these sources of error and establish the precision and accuracy with which gene expression ratios are determined. Simple weighting schemes based on these estimates are effective in improving significantly the quality of microarray data as it accumulates in a multiexperiment database. We propose that error estimates from image-based metrics should be one component in an explicitly probabilistic scheme for the analysis of DNA microarray data.  相似文献   

6.
Significance and statistical errors in the analysis of DNA microarray data   总被引:1,自引:0,他引:1  
DNA microarrays are important devices for high throughput measurements of gene expression, but no rational foundation has been established for understanding the sources of within-chip statistical error. We designed a specialized chip and protocol to investigate the distribution and magnitude of within-chip errors and discovered that, as expected from theoretical expectations, measurement errors follow a Lorentzian-like distribution, which explains the widely observed but unexplained ill-reproducibility in microarray data. Using this specially designed chip, we examined a data set of repeated measurements to extract estimates of the distribution and magnitude of statistical errors in DNA microarray measurements. Using the common "ratio of medians" method, we find that the measurements follow a Lorentzian-like distribution, which is problematic for subsequent analysis. We show that a method of analysis dubbed "median of ratios" yields a more Gaussian-like distribution of errors. Finally, we show that the bootstrap algorithm can be used to extract the best estimates of the error in the measurement. Quantifying the statistical error in such measurements has important applications for estimating significance levels, clustering algorithms, and process optimization.  相似文献   

7.
Coexpression patterns of gene expression across many microarray data sets may reveal networks of genes involved in linked processes. To identify factors involved in cellulose biosynthesis, we used a regression method to analyze 408 publicly available Affymetrix Arabidopsis microarrays. Expression of genes previously implicated in cellulose synthesis, as well as several uncharacterized genes, was highly coregulated with expression of cellulose synthase (CESA) genes. Four candidate genes, which were coexpressed with CESA genes implicated in secondary cell wall synthesis, were investigated by mutant analysis. Two mutants exhibited irregular xylem phenotypes similar to those observed in mutants with defects in secondary cellulose synthesis and were designated irx8 and irx13. Thus, the general approach developed here is useful for identification of elements of multicomponent processes.  相似文献   

8.
We introduce a method of functionally classifying genes by using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). SVMs are considered a supervised computer learning method because they exploit prior knowledge of gene function to identify unknown genes of similar function from expression data. SVMs avoid several problems associated with unsupervised clustering methods, such as hierarchical clustering and self-organizing maps. SVMs have many mathematical features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature spaces, and the ability to identify outliers. We test several SVMs that use different similarity metrics, as well as some other supervised learning methods, and find that the SVMs best identify sets of genes with a common function using expression data. Finally, we use SVMs to predict functional roles for uncharacterized yeast ORFs based on their expression data.  相似文献   

9.
We aimed to identify the potential genes related to blood pressure regulation and screen target genes for high blood pressure (BPH) and low blood pressure (BPL) treatment. The GSE19817 microarray dataset, which included the aorta, liver, heart, and kidney samples from BPH, BPL, and normotensive mice, was downloaded from the Gene Expression Omnibus. Principal component analysis (PCA) was performed based on the entire expression profile. Differentially expressed genes (DEGs) were screened, followed by pathway enrichment analysis. Finally, gene regulatory networks were constructed based on BPH-related and BPL-related DEGs in the aorta, liver, heart, and kidney samples. As a result, DEGs were screened within their respective tissues due to high heterogeneity of different tissues. Totally, 2,726 BPH-related DEGs and 2,472 BPL-related DEGs were screened, which were mainly enriched in pathways such as immune response. The topology data of gene regulatory networks constructed by DEGs in the heart, kidney, and liver were similar than that in aorta. Finally, among BPH-related DEGs, Sept6 and Pigx were found in the top 10 differentially regulated DEGs by comparing the BPH-related DEGs of the aorta with the DEGs of the other 3 tissues in the regulatory network. Although among the top 10 differentially regulated BPL-related DEGs, no common differentially regulated DEGs were found, Wif1, Urb2, and Gtf2ird1 were found among the top ten DEGs in the three tissues other than the kidney tissue. Sept6 and Pigx might participate in the pathogenesis of BPH, whereas Gtf2ird1, Urb2, and Wif1 might be critical target genes for BPL treatment.  相似文献   

10.
DNA microarrays provide great opportunity for discovery and development of predictive oncology but also great opportunity for developing false claims. The review of the literature of use of DNA microarrays in studies of cancer outcome by Dupuy and Simon indicated that about 50 percent of studies contained at least one major flaw in the analysis serious enough to raise questions about the claims. Dupuy and Simon developed guidelines for the analysis of DNA microarray data in conjunction with outcomes of cancer patients, illustrated by a list of Do's and Don'ts [1]. BRB-ArrayTools software is a resource for improving the analysis of microarray expression data that can be useful for both biomedical investigators and statisticians. There are currently about 9000 registered users of this software in over 65 countries. It is freely available for non-commercial purposes from the National Cancer Institute at http://linus.nci.nih.gov/brb.

Conflict of interest statement

None declared.  相似文献   

11.
12.
Precise classification of tumors is critically important for cancer diagnosis and treatment. It is also a scientifically challenging task. Recently, efforts have been made to use gene expression profiles to improve the precision of classification, with limited success. Using a published data set for purposes of comparison, we introduce a methodology based on classification trees and demonstrate that it is significantly more accurate for discriminating among distinct colon cancer tissues than other statistical approaches used heretofore. In addition, competing classification trees are displayed, which suggest that different genes may coregulate colon cancers.  相似文献   

13.
14.
This study compared the use of the original metric effect size with the standardized effect size for clinical data in meta-analysis. The example data set included 17 controlled clinical trials dealing with the effects of progressive resistance exercise on resting diastolic blood pressure in adults. Original metric effect size showed a decrease in resting diastolic blood pressure of −2.07 mm Hg (95% confidence interval, −3.60 to −0.54). From a clinical standpoint, this is considered a "small" effect. The standardized approach showed an average effect of −0.21, 95% CI= −0.39 to −0.02. This is also considered a "small" effect. When possible, use of the original metric is preferred because it can be more clinically meaningful and will enhance interpretation of blood pressure results for a wider range of readers.  相似文献   

15.
16.
17.
18.
This article presents the beginning of a metric functional analysis. A major notion is metric functionals which extends that of horofunctions in metric geometry. Applications of the main tools are found in a wide variety of subjects such as random walks on groups, complex dynamics, surface topology, deep learning, evolution equations, and game theory, thus branching well outside of pure mathematics. In several cases, linear notions fail to describe linear phenomena that are naturally captured by metric concepts. An extension of the mean ergodic theorem testifies to this. A general metric fixed-point theorem is also proved.

Linearity is a fundamental notion in science, with concepts like derivatives and linear regression. It is also the main property in the foundational subject of functional analysis, which started developing with a shift in viewpoint from differential and integral equations, and their solutions, to linear operators and vector spaces of functions. The theory of Banach spaces is a further abstraction where the elements are thought of less as functions but rather just as points in a linear space. In this article, I would like to argue for a further step of generalization: forgetting the linearity of the space and instead focusing on merely the metric structure (coming from the norm in the Banach space case). This philosophy has been featured prominently in the Ribe program, initiated by J. Bourgain, J. Lindenstrauss, and others, with important applications (1, 2). This program, in particular, translates subtle geometric properties of the Banach spaces to metric spaces.What I describe here is, while philosophically related, quite different; it is more basic and involves the operators too. A significant list of metric analogs for linear notions is recorded below. While this, in itself, is somewhat striking, what is more promising is that there are general tools that are remarkably powerful. In particular, I point out several phenomena within the linear theory that the metric notions describe better than what the linear notions can do.More important still is the application to many nonlinear problems. Mathematics and its applications surprisingly abound with transformations preserving metrics. One instance that will be mentioned is found in deep learning (3), where maps are not linear, and this nonlinearity, imitating the functioning of a brain, is of decisive importance. Ball wrote already in ref. 4 that metric geometry has become a staple of mathematical computer science and the theory of algorithms. A survey of all of the uses of metric geometry is impossible, so the focus in this article is necessarily relatively narrow. But, already, what is discussed here, I think, will have an increasing impact also in applied mathematics and other sciences.  相似文献   

19.
20.
DNA microarray analysis in malignant lymphomas   总被引:1,自引:0,他引:1  
Recently, DNA microarray technology has opened new avenues for the understanding of lymphomas. By hybridization of cDNA to arrays containing >10,000 different DNA fragments, this approach allows the simultaneous evaluation of the mRNA expression of thousands of genes in a single experiment. Using sophisticated bioinformatic tools, the huge amount of raw data can be clustered resulting in (1) tumor subclassification, (2) identification of pathogenetically relevant genes, or (3) biological predictors for the clinical course. This approach already has provided novel insights into different entities of B-cell non-Hodgkin's lymphomas. Genomic DNA chip hybridization (matrix-CGH) is a complementary approach focussing on genomic aberrations. In this review, we discuss the impact of this new technology both with regard to methodological aspects as well as to novel findings influencing our understanding of lymphomas.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号