Bionimbus: a cloud for managing,analyzing and sharing large genomics datasets |
| |
Authors: | Allison P Heath Matthew Greenway Raymond Powell Jonathan Spring Rafael Suarez David Hanley Chai Bandlamudi Megan E McNerney Kevin P White Robert L Grossman |
| |
Affiliation: | 1.Institute for Genomics and Systems Biology, University of Chicago, Chicago, Illinois, USA;2.Department of Pathology, University of Chicago, Chicago, Illinois, USA;3.Computation Institute, University of Chicago, Chicago, Illinois, USA;4.Department of Human Genetics, University of Chicago, Chicago, Illinois, USA;5.Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago Illinois, USA |
| |
Abstract: | BackgroundAs large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it.MethodsBionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required.ResultsBionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample.ConclusionsMost members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. |
| |
Keywords: | cloud computing biomedical clouds genomic clouds |
|
|