首页 | 本学科首页   官方微博 | 高级检索  
     


Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reveals personalized growth rates
Authors:Tyler A. Joseph  Philippe Chlenski  Aviya Litman  Tal Korem  Itsik Pe'er
Affiliation:1.Department of Computer Science, Columbia University, New York, New York 10027, USA;2.Department of Systems Biology, Columbia University Irving Medical Center, New York, New York 10032, SA;3.Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, New York 10032, SA;4.CIFAR Azrieli Global Scholars Program, CIFAR, Toronto, Ontario M5G 1M1, Canada;5.Data Science Institute, Columbia University, New York, New York 10027, USA
Abstract:Patterns of sequencing coverage along a bacterial genome—summarized by a peak-to-trough ratio (PTR)—have been shown to accurately reflect microbial growth rates, revealing a new facet of microbial dynamics and host–microbe interactions. Here, we introduce Compute PTR (CoPTR): a tool for computing PTRs from complete reference genomes and assemblies. Using simulations and data from growth experiments in simple and complex communities, we show that CoPTR is more accurate than the current state of the art while also providing more PTR estimates overall. We further develop a theory formalizing a biological interpretation for PTRs. Using a reference database of 2935 species, we applied CoPTR to a case-control study of 1304 metagenomic samples from 106 individuals with inflammatory bowel disease. We show that growth rates are personalized, are only loosely correlated with relative abundances, and are associated with disease status. We conclude by showing how PTRs can be combined with relative abundances and metabolomics to investigate their effect on the microbiome.

Dynamic changes in the human microbiome play a fundamental role in our health. Understanding how and why these changes occur can help uncover mechanisms of disease. In line with this goal, the Integrative Human Microbiome Project and others have generated longitudinal data sets from disease cohorts in which the microbiome has been observed to play a role (Buffie et al. 2015; DiGiulio et al. 2015; Lloyd-Price et al. 2019; Serrano et al. 2019; Zhou et al. 2019). Yet, investigating microbiome dynamics is challenging. On one hand, a promising line of investigation uses time-series or dynamical systems–based models to investigate community dynamics (Stein et al. 2013; Bucci et al. 2016; Gibbons et al. 2017; Gibson and Gerber 2018; Shenhav et al. 2019; Joseph et al. 2020). On the other hand, the resolution of such methods is limited by sampling frequency, which is often limited by physiological constraints on sample collection for DNA sequencing. Furthermore, although such methods accurately infer changes in abundance, they do not directly assess growth rates per sample.Korem et al. (2015) introduced a complementary approach to investigate microbiome dynamics. They showed that sequencing coverage of a given species in a metagenomic sample reflects its growth rate. They summarized growth rates by a metric called the peak-to-trough ratio (PTR): the ratio of sequencing coverage near the replication origin and near the replication terminus. Thus, PTRs provide a snapshot of growth at the time of sampling, and their resolution is not limited by sampling frequency.Their original method—PTRC—estimates PTRs using reads mapped to complete reference genomes. It has been used as a gold standard to evaluate other methods (Brown et al. 2016; Emiola and Oh 2018; Gao and Li 2018). However, most species lack complete reference genomes, reducing PTRC''s utility to researchers in the field. Therefore, follow-up work has focused on estimating PTRs from draft assemblies: short sections of contiguous sequences (contigs) in which the order of contigs along the genome is unknown. These approaches rely on reordering binned read counts or contigs by estimating their distance to the replication origin. Although less accurate than PTRC, they allow PTRs to be estimated for a larger number of species. iRep (Brown et al. 2016) sorts binned read counts along a 5-kb sliding window and then fits a log-linear model to the sorted bins to estimate a PTR. GRiD (Emiola and Oh 2018) sorts the contigs themselves by sequencing coverage. It fits a curve to the log sequencing coverage of the sorted contigs using Tukey''s biweight function. DEMIC (Gao and Li 2018) also sorts contigs. However, it uses sequencing coverage across multiple samples to infer a contig''s distance from the replication origin. Specifically, DEMIC performs a principal component analysis on the log contig coverage across samples. The investigators show that the scores along the first principal component correlate with distance from the replication origin. Ma et al. (2021) provide theoretical criteria for when such an approach is optimal. Finally, other estimators have focused on PTR estimation for specific strains (Emiola et al. 2020) or on estimation using circular statistics (Suzuki and Yamada 2020).Nonetheless, using PTRs has several limitations. From a theoretical perspective, it is not clear what PTRs estimate and how they should be interpreted. Bremer and Churchward (1977) showed that under exponential growth, PTRs measure the ratio of chromosome replication time to generation time, but this is not established under arbitrary models of dynamics. From a practical perspective, estimating PTRs at scale requires running multiple tools across multiple computational environments—a cumbersome task.In the present work, we seek to address these issues. Our contributions are threefold. First, we provide theory that shows PTRs measure the rate of DNA synthesis and generation time, regardless of the underlying dynamic model. Second, we derive two estimators for PTRs—one for complete reference genomes and one for draft assemblies. Third, we combine our estimators in an easy-to-use tool called Compute PTR (CoPTR). CoPTR provides extensive documentation, a tutorial, and precomputed reference databases for its users. We show that CoPTR is more accurate than the current state of the art and conclude with a large-scale application to a data set of 1304 metagenomic samples from a study of inflammatory bowel disease (IBD).
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号