首页 | 本学科首页   官方微博 | 高级检索  
检索        


Sparse multivariate factor analysis regression models and its applications to integrative genomics analysis
Authors:Yan Zhou  Pei Wang  Xianlong Wang  Ji Zhu  Peter X‐K Song
Institution:1. Merck & Co, North Wales, PA, USA;2. Icahn School of Medicine at Mount Sinai, New York, NY, USA;3. Fred Hutchinson Cancer Research Center, Seattle, WA, USA;4. University of Michigan, Ann Arbor, MI, USA
Abstract:The multivariate regression model is a useful tool to explore complex associations between two kinds of molecular markers, which enables the understanding of the biological pathways underlying disease etiology. For a set of correlated response variables, accounting for such dependency can increase statistical power. Motivated by integrative genomic data analyses, we propose a new methodology—sparse multivariate factor analysis regression model (smFARM), in which correlations of response variables are assumed to follow a factor analysis model with latent factors. This proposed method not only allows us to address the challenge that the number of association parameters is larger than the sample size, but also to adjust for unobserved genetic and/or nongenetic factors that potentially conceal the underlying response‐predictor associations. The proposed smFARM is implemented by the EM algorithm and the blockwise coordinate descent algorithm. The proposed methodology is evaluated and compared to the existing methods through extensive simulation studies. Our results show that accounting for latent factors through the proposed smFARM can improve sensitivity of signal detection and accuracy of sparse association map estimation. We illustrate smFARM by two integrative genomics analysis examples, a breast cancer dataset, and an ovarian cancer dataset, to assess the relationship between DNA copy numbers and gene expression arrays to understand genetic regulatory patterns relevant to the disease. We identify two trans‐hub regions: one in cytoband 17q12 whose amplification influences the RNA expression levels of important breast cancer genes, and the other in cytoband 9q21.32‐33, which is associated with chemoresistance in ovarian cancer.
Keywords:EM‐blockwise coordinate descent  high‐dimensional data  latent factors  regularization
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号