首页 | 本学科首页   官方微博 | 高级检索  
     


Bionimbus: a cloud for managing,analyzing and sharing large genomics datasets
Authors:Allison P Heath  Matthew Greenway  Raymond Powell  Jonathan Spring  Rafael Suarez  David Hanley  Chai Bandlamudi  Megan E McNerney  Kevin P White  Robert L Grossman
Affiliation:1.Institute for Genomics and Systems Biology, University of Chicago, Chicago, Illinois, USA;2.Department of Pathology, University of Chicago, Chicago, Illinois, USA;3.Computation Institute, University of Chicago, Chicago, Illinois, USA;4.Department of Human Genetics, University of Chicago, Chicago, Illinois, USA;5.Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago Illinois, USA
Abstract:

Background

As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it.

Methods

Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required.

Results

Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample.

Conclusions

Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics.
Keywords:cloud computing   biomedical clouds   genomic clouds
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号