文献阅读2.1 基因组结构预测微生物群落中的代谢物动力学

Genomic structure predicts metabolite dynamics in microbial communities

基因组结构预测微生物群落中的代谢物动力学

cell (66.850/Q1)

Highlights

Metabolite fluxes in microbial communities are predictable from individual genotypes

微生物群落中的代谢物通量可从个体基因型中预测

A diverse collection of 79 bacterial isolates was sequenced and phenotyped

对 79 种不同的细菌分离株进行了测序和表型分析

Gene presence and absence predict metabolic phenotypes of isolates via regression

基因存在和缺失通过回归预测分离株的代谢表型

A consumer-resource model predicts community metabolite fluxes from phenotypes

消费者资源模型从表型预测群落代谢物通量

In brief

The presence or absence of specifific genes within communities of wild bacterial isolates is suffificient to predict community-level metabolite dynamics without detailed knowledge of pathway regulation or complex ecological processes.

野生细菌群落中特定基因的存在与否足以预测群落水平的代谢物动力学,而不需要对途径调节或复杂的生态过程有详细了解。

The metabolic activities of microbial communities play a defifining role in the evolution and persistence of life on Earth, driving redox reactions that give rise to global biogeochemical cycles. Community metabolism emerges from a hierarchy of processes, including gene expression, ecological interactions, and environmental factors. In wild communities, gene content is correlated with environmental context, but predicting metabolite dynamics from genomes remains elusive. Here, we show, for the process of denitrifification, that metabolite dynamics of a community are predictable from the genes each member of the community possesses. A simple linear regression reveals a sparse and generalizable mapping from gene content to metabolite dynamics for genomically diverse bacteria. A consumer-resource model correctly predicts community metabolite dynamics from single-strain phenotypes. Our results demonstrate that the conserved impacts of metabolic genes can predict community metabolite dynamics, enabling the prediction of metabolite dynamics from metagenomes, designing denitrifying communities, and discovering how genome evolution impacts metabolism.

微生物群落的代谢活动在地球上生命的进化和持续存在中起着决定性的作用,驱动着引起全球生物地球化学循环的氧化还原反应。群落代谢产生于一系列过程,包括基因表达、生态相互作用和环境因素。在野生群落中,基因含量与环境背景相关,但从基因组预测代谢物动力学仍然难以捉摸。在这里,我们表明,对于反硝化过程,群落的代谢物动力学可以从群落每个成员拥有的基因中预测。一个简单的线性回归揭示了从基因含量到基因组多样性细菌的代谢物动力学的稀疏且可概括的映射。

The metabolism of microbial communities plays an essential role in sustaining life on Earth, impacting global nutrient cycles, wastewater treatment, and human health. A challenge in microbial ecology is understanding how community metabolism is determined by the taxa present, their metabolic traits, and the genes they possess. Addressing this challenge requires mapping the genotypes of each community member to its metabolic traits and then deciphering how complex interactions between each member impact the flflux of metabolites through the community. Complicating the prediction of metabolite fluxes from community composition, interactions can depend on extracellular metabolites, abiotic factors, cooperation, and higher-order effects. Despite these challenges, connecting genomic structure to the collective metabolism of a community is important for functionally interpreting community gene content, designing synthetic communities, and understanding how gene gain and loss impact community metabolism.

微生物群落的新陈代谢在维持地球生命、影响全球营养循环、废水处理和人类健康方面发挥着重要作用。微生物生态学的一个挑战是了解群落代谢如何由存在的分类群、它们的代谢特征和它们拥有的基因决定。解决这一挑战需要将每个群落成员的基因型映射到其代谢特征,然后破译每个成员之间复杂的相互作用如何影响代谢物在群落中的流动。使从群落组成预测代谢物通量的复杂化,相互作用可能取决于细胞外代谢物、非生物因素、合作和高阶效应。尽管存在这些挑战,但将基因组结构与群落的集体代谢联系起来对于从功能上解释群落基因含量、设计合成群落以及了解基因获得和丢失如何影响群落代谢非常重要。

Recent work suggests that the genes present in a community may be more informative about metabolic activity than the identity of strains or species making up the community. Sequencing studies of environmental and host-associated communities show that, while the individual strains or species present are often highly variable, the genes or pathways present are often observed to be stable across communities in similar environments. For example, aquatic communities native to bromeliads contain prokaryotes from several functional groups (e.g., methanogens, fermenters, and photoautotrophs). The strain or species representing each functional group varies widely from one plant to the next, but the relative abundance of each functional group is remarkably stable across plants. Similarly, studies in oceans and soils that measure both gene content and nutrient levels have found that the relative abundances of specifific metabolic genes are better predictors of nutrient levels than the abundances of specifific taxa. These results suggest that the availability of nutrients, such as organic carbon, oxygen, nitrate, carbon dioxide, and light, constrain the composition of the community in terms of the abundances of specifificmetabolic capabilities more so than they constrain the taxa possessing those capabilities. One implication of this fifinding is that communities with similar genomic composition, in terms of the metabolic pathways they possess, might exhibit similar rates and productivity of the associated metabolic process, but any such correspondence is yet to be demonstrated.


最近的研究表明,群落中存在的基因可能比构成群落的菌株或物种的身份更能提供关于代谢活动的信息。对环境和宿主相关群落的测序研究表明,虽然存在的个体菌株或物种通常具有高度可变性,但通常观察到存在的基因或途径在相似环境中的群落中是稳定的。例如,凤梨科植物原生的水生群落包含来自几个功能组(例如,产甲烷菌、发酵罐和光合自养生物)的原核生物。代表每个功能组的菌株或物种因植物而异,但每个功能组的相对丰度在植物中非常稳定。同样,在海洋和土壤中测量基因含量和营养水平的研究发现,特定代谢基因的相对丰度比特定分类群的丰度更能预测营养水平。这些结果表明,有机碳、氧、硝酸盐、二氧化碳和光等营养物质的可用性,在特定代谢能力的丰度方面限制了群落的组成,而不是限制拥有这些能力的分类群。这一发现的一个含义是,具有相似基因组组成的群落,就它们拥有的代谢途径而言,可能表现出相似的相关代谢过程的速率和生产力,但任何这种对应关系尚未得到证实。

Corroborating the idea that nutrient availability strongly determines community composition, experiments in fifixed nutrient conditions have shown that the metabolic traits of bacterial strains in assembled communities can be highly reproducible. To show this, several groups have sampled complex communities from natural environments and grown them under defifined nutrient conditions in the laboratory. Using this approach, Datta et al. showed that marine microbial communities degrading polysaccharide particles exhibit a succession of bacterial taxa. Succession on these particles arises from initial colonizers that cleave polysaccharides, followed by strains that compete for the resulting oligosaccharides or consume byproducts of sugar metabolism. Similarly, bacterial communities sampled from leaf surfaces and enriched in glucose minimal medium reproducibly yield communities comprising both consumers of glucose and consumers of glucose metabolic by-products.

在固定营养条件下进行的实验证实了营养物的有效性在很大程度上决定了群落的组成这一观点,结果表明,在聚集的群落中,菌株的代谢特征可以高度重现。为了证明这一点,几个小组从自然环境中采集了复杂的群落样本,并在实验室中将它们在规定的营养条件下生长。利用这种方法,Datta等人表明,降解多糖颗粒的海洋微生物群落表现出细菌类群的演替。这些粒子上的继承产生于切割多糖的最初定植体,随后是争夺产生的寡糖或消耗糖代谢副产物的菌株。同样,从叶片表面取样并富集在葡萄糖最低培养基中的细菌群落可重复产生由葡萄糖的消费者和葡萄糖代谢副产物的消费者组成的群落。

Here, we address the challenge of mapping gene content to metabolite dynamics by quantifying the flflux of metabolites in an ensemble of genomically diverse communities composed of non-model organisms (see Figure 1 for a summary of the approach). We used bacterial denitrifification, an essential metabolic process in the global nitrogen cycle that is performed by diverse and culturable bacterial taxa, as a model metabolic process. We isolated an ensemble of denitrififiers and measured the dynamics of metabolite consumption and productionfor each isolate under controlled conditions. We then parameterized metabolite dynamics using a consumer-resource model. The genomic diversity of the ensemble of isolates enabled a simple linear regression approach to mapping gene content to consumer-resource model parameters, which resulted in a sparse and generalizable mapping of gene presence and absence to metabolic phenotypes. Finally, the consumer-resource model captured interactions between strains mediated by resource competition, yielding predictions for community-level metabolite dynamics that we verifified experimentally.

在这里,我们通过量化由非模式生物组成的基因组多样化群落集合中代谢物的通量来解决将基因内容映射到代谢物动力学的挑战(参见图 1 中的方法摘要)。我们使用细菌反硝化,这是全球氮循环中的一个重要代谢过程,由多种可培养的细菌类群进行,作为模型代谢过程。我们分离了一组反硝化剂,并在受控条件下测量了每种分离物的代谢物消耗和产生的动态。然后,我们使用消费者资源模型参数化代谢物动力学。分离株集合的基因组多样性使一种简单的线性回归方法能够将基因含量映射到消费者资源模型参数,从而导致基因存在和不存在到代谢表型的稀疏且可概括的映射。最后,消费者资源模型捕获了由资源竞争介导的菌株之间的相互作用,从而对我们通过实验验证的社区级代谢物动力学产生了预测。

Figure 1

Denitrifification as a model metabolic process

We used denitrifification as a model metabolic process because it is performed by diverse bacterial taxa, it is well characterized at the molecular level, and the relevant metabolites are readily quantififiable. Because denitrififiers are easily isolated and cultured, we can capture substantial genomic diversity in an ensemble of natural isolates.

我们使用反硝化作用作为模型代谢过程,因为它是由多种细菌分类群执行的,它在分子水平上的表征很好,并且相关的代谢物易于量化。 由于反硝化剂很容易隔离和培养,因此我们可以在天然分离株的合奏中捕获实质性的基因组多样性。

Denitrifification is a form of anaerobic respiration whereby microbes use oxidized nitrogen compounds as electron acceptors, driving a cascade of four successive reduction reactions, NO3- ->NO2- -> NO->N2O->N2 (Figure 2A). As a biogeochemical process, denitrifification is essential to nitrogen cycling at a global scale through activity in soils, freshwater systems, and marine environments. In addition, denitrifification impacts human health through activity in wastewater treatment plants and in the human gut. The process is performed by taxonomically diverse bacteria that are typically facultative anaerobes. The denitrifification pathway is known to be modular, with some strains performing all four steps in the cascade and others performing one or a nearly arbitrary subset of reduction reactions (Figure 2A). Denitrifification in nature is, therefore, a collective process, wherein a given strain can produce electron acceptors that can be utilized by other strains.

Figure 2

Figure 2. Denitrifification as a model metabolic process (A) Denitrifification is a form of anaerobic respiration whereby oxidized nitrogen compounds are used as electron acceptors. The process results in a cascade of reactions from nitrate (NO3-) to di-nitrogen (N2). Some bacteria perform all four steps in the cascade (purple, ‘‘Nar/Nir’’), whereas others perform only a subset of reactions. Two examples of the latter are shown here: ‘‘Nar’’ strains (blue) perform only nitrate reduction, and ‘‘Nir’’ strains (red) perform nitrite (NO2-) reduction and potentially also subsequent steps (dashed lines). (B) A schematic representation of the molecular steps in the denitrifification process. Denitrifification serves as the terminal step in the electron transport chain (not shown) and, thereby, contributes to ATP generation. Reduction of nitrate to nitrite takes place either in the cytoplasm (via the enzyme Nar) or in the periplasm (Nap). Nitrate reduction in the cytoplasm via Nar requires nitrate and nitrite to be transported across the inner membrane (NarK1, NarK2, and NarK1K2). The subsequent three steps all occur in the periplasm and are encoded by the reductases Nir, Nor, and Nos as shown. There are two functionally equivalent types of Nir and Nor reductases: NirK/NirS and qNor/cNor, respectively.

Denitrification is well understood at the molecular level. The process couples the reduction of oxidized nitrogen compounds to the electron transport chain and, therefore, ATP production. The enzymes (reductases) that perform each step in the cascade are shown in Figure 2B. Reduction of nitrate to nitrite can occur either in the cytoplasm, by the Nar reductase, or the periplasm, using Nap. Inner membrane NarK transporters (NarK1, NarK2, and NarK1K2) facilitate the exchange of nitrate and nitrite between the cytoplasm and the periplasm. The remaining three reactions all occur exclusively in the periplasmic space (Figure 2B). The regulatory elements that control the expression of denitrifification genes are also well characterized and include two component systems that sense the oxidized nitrogen compounds and regulators that detect the loss of oxygen from the environment. Because most of these reactions occur in the periplasm, substrates can readily leak into the surrounding environment, enabling crossfeeding between denitrififiers.

We focused experimentally on the fifirst two steps of denitrifification: the conversion of nitrate (NO3-) to nitrite (NO2-) and subsequently nitric oxide (NO) (Figure 2A). Nitrate and nitrite are soluble, enabling high-throughput measurements of metabolite dynamics. To obtain a genomically diverse ensemble of non-model organisms, we isolated 78 bacterial strains spanning ()-, ()-, and ()-proteobacteria from local soils using established techniques (Tables S1–S3; STAR Methods). Each strain was obtained in axenic culture and was characterized as performing one or both of the fifirst two steps of denitrifification. Therefore, strains were classifified into one of three possible phenotypes (Figures 2A and 3A): (1) Nar/Nir strains that perform both nitrate and nitrite reduction (NO3- / NO2- / NO), (2) Nar strains that perform only nitrate reduction (NO3- / NO2- ), and (3) Nir strains that perform only nitrite reduction (NO2- / NO). In addition to these 78 isolates, our strain library also included the model denitrififier Paracoccus denitrificans (ATCC 19367).

Parameterizing metabolite dynamics

We first set out to quantify the metabolic phenotypes of each isolate in our diverse strain library (Step 1, Figure 1). We focused our efforts on quantifying the dynamics of the relevant metabolites, nitrate and nitrite. To accomplish this, strains were inoculated at low starting densities into 96-well plates containing a chemically defined, electron-acceptor-limited medium containing succinate as the sole non-fermentable carbon source (succinate-defifined medium, SDM; Table S4; STAR Methods), with either nitrate or nitrite provided as the sole electron acceptor. Cultures were then incubated under anaerobic conditions (STAR Methods). Small samples (10 mL) were taken at logarithmically spaced time intervals over a period of 64 h and assayed for nitrate and nitrite concentrations (STAR Methods). At the end of the time course, optical density was assayed. The measurement resulted in a time series of nitrate and nitrite production/consumption dynamics in batch culture (points, Figure 3A). Contamination between wells using this culturing and sampling approach was assessed to be low (STAR Methods).

Figure 3

Figure 3. Quantifying nitrate and nitrite dynamics in an ensemble of denitrififiers to map genomic structure to community metabolism (A) Example batch culture metabolite dynamics for Nar/Nir (purple), Nar (blue), and Nir (red) isolates. Nitrate (NO3-, blue points) and nitrite (NO2-, red points) dynamics are measured at logarithmically spaced intervals (circles) via sampling and colorimetric assay (STAR Methods), with ± 5% error bars shown. Biomass densities are only measured at the final time point. Curves show fits to a consumer-resource model shown in (B). (B) A consumer-resource model of nitrate and nitrite reduction by each strain describes the evolution of biomass density (x, OD), nitrate concentration (A, mM), and nitrite concentration (I, mM) with time. The model is parameterized by reduction rates rA and rI (mM/OD/h), and yields ()A and ()I (OD/mM), for growth on nitrate and nitrite, respectively. The affinity parameters KA and KI (mM) were not well constrained by the data and were fixed for all strains in the library (STAR Methods). (C) Phylogenetic tree and normalized consumer-resource parameters for 79 denitrifying strains (78 isolates and the model denitrififier Paracoccus denitrifificans). The strain library comprised 51 Nar/Nir, 24 Nar, and 4 Nir strains. Consumer-resource parameters were measured in a succinate-defifined medium (SDM). Phylogenetic tree constructed using the 16S rRNA gene, and scale bar represents the estimated number of substitutions per site. Darker colors indicate larger values of the normalized parameters. Nitrate and nitrite reduction parameters were not measured for Nir and Nar strains, respectively. Consumer-resource parameters measured across diverse isolates constituted a dataset for relating genomic diversity to metabolite dynamics. See also Figure S1 and Tables S1–S5.

To parameterize the metabolite dynamics of each strain within a common framework, we utilized a consumer-resource model, which explicitly relates the growth of each strain to the dynamics of metabolite production and consumption (Figure 3B; Equation 3; STAR Methods). The model contains up to six parameters: rates (r* , mM/OD/h), biomass yields (()* , OD/mM), and affinities (K*, mM), for the substrates nitrate (A) and nitrite (I). For each strain in monoculture, we parameterized the consumer-resource model using measured denitrification dynamics across a range of initial biomass densities and nitrate/nitrite concentrations (Figures 3A and S1A–S1E; STAR Methods). These data allowed us to quantify rates (r*) and biomass yields (g*) but not the affifinity parameters (K*), which require measuring growth rates at very low substrate concentrations. Because the results of parameter fits were not sensitive to the values of K* across a broad range (Figures S1F–S1I; STAR Methods), we fixed the affinity parameter to a small constant value. Therefore, we captured the phenotype of each strain in the library using at most four parameters: rA, rI, ()A, and ()I (the models for Nar and Nir strains correspond to setting rI; ()I = 0 or rA;()A = 0, respectively). Yields (()*) were inferred using optical density measurements at t = 64 h, and rates (r*) were inferred by fitting the observed nitrate and nitrite dynamics to the consumer-resource model (Figure 3B). For the majority of strains in our library (62 out of 79), a single set of parameters quantitatively described metabolite dynamics across a range of initial biomass densities and nitrate/nitrite concentrations. The consumer-resource model captured metabolite dynamics over a restricted set of initial conditions for the remaining 17 strains (Figures S1J–S1L; Table S5; STAR Methods). Using a representative subset of four strains, we confirmed that biomass density dynamics were well predicted by the consumer-resource parameters, despite the fact that biomass density was not directly measured over time (Figure S1M; STAR Methods).

Fitting our consumer-resource model to data for each strain yielded a quantitative description of the metabolic traits (i.e., denitrification rates and yields) of each strain in the library (Figure 3C). We observed large variability between taxa, with coeffificients of variation for rate constants (rA, rI) around 70% and yields (()A, ()I) around 100%. We also observed some patterns of phylogenetic conservation, for example, a-proteobacteria produced generally higher yields than ()- or ()-proteobacteria did, and a clade of Pseudomonas sp. isolates showed consistently higher rates of nitrite reduction than most other strains (PDM17-23, Figure 3C). Despite these patterns, the prevalence of each of the three qualitative phenotypes is not strongly dependent on phylogeny, with each present across the tree (Figure 3C). The latter observation is consistent with pervasive horizontal gene transfer of denitrifying enzymes. Finally, neither did we observe a correlation between rates and yields, nor was there an obvious bound on these parameters, suggesting that they are not subjected to a trade-off.

Predicting metabolite dynamics from genomes

Understanding how genomic variation impacts metabolite dynamics at the community level requires first learning how genomic variation impacts the metabolic traits of individual strains. Therefore, we sought to determine how genomic variation across the strains in our library is related to variation in denitrification rates and yields (Figure 3C). One common approach to the problem of relating genomes to metabolite dynamics is constraint-based modeling. Constraint-based models infer the set of all metabolic reactions performed by an organism from an annotated genome, and then predict growth rates and metabolite fluxes, assuming the metabolic network is in steady state and is subject to biologically motivated constraints. Constraint-based methods have found some success in predicting collective metabolism from genomes, but these methods require significant manual refinement, complicating the prospect of making predictions from the genomes of non-model organisms. As a result, successfully constructing constraint-based models of denitrification for all strains in our library is a daunting task.

We took an alternative approach to the problem of mapping genomes to metabolite dynamics. We asked whether the variation in metabolic phenotypes across strains in our library can be quantitatively predicted simply from knowledge of the genes possessed by each strain. Our conjecture was motivated by two observations. First the metabolic traits of bacteria correlate strongly with environmental variables in marine microbial communities. For example, the relative abundance of taxa capable of nitrate reduction are strongly correlated with local temperature, phosphate, and nitrate levels, suggesting that the presence of genes responsible for those traits might also be predictable from nutrient levels and temperature. Second, the statistics of gene presence and absence across large numbers of sequenced genomes provides insights into the functional roles that genes play in pathways, such as the coupling between dihydrofolate reductase and thymidylate synthase activity in the folate metabolism pathway. Together, these observations suggest that the genes a strain possesses could allow for predictions of metabolic traits. Therefore, rather than building constraint-based metabolic models for all of our strains, each of which would require significant manual refinement, we took a simple regression approach.

We used linear regression to predict the consumer-resource model parameters (Figure 3C) of each strain from gene presence and absence (Step 2, Figure 1). To accomplish this, we performed whole genome sequencing on all 79 strains in the library. Then, we assembled and annotated each genome (STAR Methods) and determined the complement of 17 denitrification-related genes possessed by each strain (Table S6), exploiting the fact that the molecular and genetic basis of denitrification is well understood. We identified not only the reductases that perform the reduction of the oxidized nitrogen compounds but also the sensors/regulators and transporters known to be involved in denitrification (STAR Methods). We intentionally excluded genes encoding structural subunits and chaperones required for the functioning of any reductase (Table S7) because such genes have the same presence/absence pattern as the corresponding reductases and, therefore, would have identical predictive power. The presence and absence the denitrification-related genes in each genome are presented in Figure 4A. Patterns of gene presence and absence agree well with known features of the denitrification pathway, including the mutual exclusion (Pearson correlation 1.0 among nitrite reducers) of the two reductases performing nitrite reduction, NirS and NirK.

Next, we showed that the presence and absence of denitrification genes in each strain were sufficient to quantitatively predict metabolite dynamics in monoculture. Specifically, we constructed a linear regression where the measured phenotypic parameters of our consumer-resource model were predicted on the basis of gene presence and absence (Figure 4B). Consistent with the observation that bacterial genomes are streamlined, almost all strains possessing nitrate and/or nitrite reductase performed the associated reactions in culture (the only exception being the Nar strain Acidovorax sp. ACV01, which possesses both nitrate and nitrite reductase, Figure 4A). Therefore, we carried out independent regressions for each consumer-resource model parameter using only strains that performed the associated reaction (i.e., Nar and Nar/Nir strains for the rA and gA regressions, and Nir and Nar/Nir strains for the rI and gI regressions). The regression coefficients for each gene quantify the impact of the presence of the gene on a given phenotypic parameter. We used L1-regularized regression (least absolute shrinkage and selection operator, LASSO) to avoid overfitting, performing independent regressions for each of the phenotypic parameters in our consumer-resource model (Figures 4C–4J; STAR Methods). By design, LASSO searches for a level of sparsity that optimizes predictive power, often selecting a few variables to make predictions while forcing other coefficients (bj) to zero. The result can then be a sparse model that makes predictions using a handful of variables. It is important to note that LASSO does not first presume that a few variables are sufficient to make a prediction (in contrast to forward stepwise and best subset regression approaches). In the situation where strong predictive power does not exist, e.g., a phenotypic parameter cannot be predicted well from gene presence and absence, LASSO would effectively fail to identify a predictive model by returning bj = 0 for all genes.

Figure 4

Figure 4. A statistical mapping from gene presence and absence to metabolite dynamics of individual strains (A) The presence and absence of genes in the denitrification pathway for the 79 denitrifying strains in our library. The color of each circle corresponds to the gene function as indicated in the legend further on. (B) Observed consumer-resource phenotypic parameters for each strain in SDM (e.g., nitrate reduction rate rA, Figure 3C) were linearly regressed against gene presence and absence via L1-regularized regression, resulting in regression coefficients bj for each gene j, an intercept b0, and a noise term εi for each observation i. Coefficient bj captures the impact of possessing gene j on the corresponding phenotypic parameter. Independent regressions were performed for each phenotypic parameter. (C–F) Predicted values of rA, gA, rI, and gI, respectively, plotted against measured values. The dashed line indicates perfect agreement between observations and predictions. The in-sample coefficients of determination for these data ðR 2 fit Þ and the out-of-sample coefficients of determination estimated via iterated 4-fold cross-validation R 2 CV are shown. N indicates the number strains in each regression. Strains that do not perform a particular reaction were omitted from the corresponding regression (e.g., Nir strains were excluded from the regression for rA). (G–J) Estimates of b for each gene and b0 for rA, gA, rI, and gI, respectively. Asterisks indicate signifificance level for each b (*: p%0:05, **: p%10 2, ***: p% 10 3, and ****: p%10 4; STAR Methods). See also Figure S2 and Tables S6 and S7. 

Performing LASSO regressions on our dataset revealed that the presence and absence of a small set of genes is highly predictive of the consumer-resource parameters for all strains in our library (Figures 4C–4J). The in-sample coefficients of determination ðR 2 fit Þ of our regressions were between 0.55 and 0.74 depending on the phenotypic parameter. Crucially, our regression approach generalized out-of-sample, as determined by iterated 4-fold cross-validation (13104 iterations; STAR Methods), albeit with a slightly lower predictive power (R 2 CV between 0.36 and 0.56). Therefore, across a diverse set of natural isolates, knowledge of the full complement of genes a denitrifying strain possesses is sufficient to accurately predict the rates and biomass yields of that strain on nitrate and/or nitrite.

Validating regression approach to predicting traits from gene presence and absence

Our regression approach leveraged biological knowledge of the denitrification pathway to predict metabolite dynamics, in effect presuming that denitrification gene content is the only significant genomic feature for prediction. To investigate whether this assumption is correct, we asked whether other genomic properties could better predict metabolite dynamics and also examined the role that phylogenetic correlations played in our predictions.

First, we tested the predictive capability of sets of randomly selected genes. To do this, we chose sets of 17 random genes that were not strongly correlated with any denitrification genes but retained the same marginal frequency distribution in the population as the denitrification genes. We found that regressions using these randomly selected genes had, on average, much less predictive power than regressions using the denitrification genes (Figures S2A–S2C; STAR Methods). We also tested augmented sets of up to 2,048 predictors that were generated by adding varying numbers of randomly selected genes to the 17 denitrification genes. We found that the prediction quality changed remarkably little as more genes were added and that even sets of 2,048 predictors (representing approximately 30%–50% of genes in each genome) contained about as much predictive power as the regressions using the 17 denitrification genes alone (Figure S2D; STAR Methods). This result indicates that the 17 denitrification genes harbor the majority of gene presence and absence predictive power.

Second, we tested whether 16S rRNA copy number, genome size, or GC-content improves the predictive ability of denitrification gene presence/absence regressions. We tested these genomic features because: (1) 16S rRNA copy number has been observed to correlate positively with maximal growth rate in nutrient-rich conditions, (2) smaller genomes are associated with faster growth, and (3) GC-content has been investigated as a correlate for numerous bacterial phenotypes, such as optimal growth temperature, and can serve as a baseline for spurious phylogenetic correlations because it is a slowly evolving genomic property that exhibits a high degree of phylogenetic correlation. We found that including these additional predictors in our regressions alongside the 17 denitrification genes did not meaningfully improve predictive ability or alter the inferred coefficients (STAR Methods). Thus, denitrification gene presence and absence outperformed these coarse genomic features.

Third, we examined the role of correlations in consumer-resource parameters between closely related strains in the success of our regressions. We quantified the extent of phylogenetic correlation in our 79-strain library by computing the autocorrelation (Moran’s I) for each consumer-resource parameter as a function of phylogenetic distance (STAR Methods). We observed that the rate parameter rA was correlated to a small degree ðmaxðIÞ = 0:16 Þ over short a phylogenetic distance (16S distance 0.01), whereas the parameters gA, rI, and gI showed a modest degree of correlation (maxðIÞ = 0:33, 0.27, and 0.48, respectively) over relatively longer distances (16S distance 0.16, 0.06, and 0.12, respectively). Pruning clades of closely related strains (e.g., ENS01–08, PDM20–23, Figure 3C) from the dataset decreased the correlation of gA, rI, and gI (maxðIÞ = 0:30, 0.21, and 0.39, respectively; 16S distance 0.05, 0.06, and 0.09, respectively) but had little impact on the correlation of rA. Thus, some of the phylogenetic correlation is attributable to the over-representation of close relatives. Finally, we showed that the presence of these close relatives in our dataset did not skew the results of our regressions. We performed regressions on the pruned dataset (comprising 64 strains) and found that the predictive power and regression coefficients were similar to those for the full dataset (STAR Methods). From this, we concluded that the over-representation of close relatives did not have a large impact on the results of our regressions on the consumer-resource parameters.

Generalizing the regression approach to an alternative medium condition

Having mapped gene content to metabolite dynamics in a medium with succinate supplied as the carbon source, we next asked whether our regression approach would generalize to other media conditions. Of the 79 strains in our library, 64 grew on a defined medium with acetate supplied as the sole (non-fermentable) carbon source (acetate-defifined medium, ADM; Table S1; STAR Methods). We assayed nitrate and nitrite dynamics for the 64 strains in this medium and inferred consumer-resource parameters. We observed that the consumer-resource parameters in the SDM and ADM conditions were strongly correlated (Pearson correlations 0.52–0.93, Figure 5A). Furthermore, LASSO regressions to predict consumer-resource model parameters measured in ADM from gene presence and absence achieved predictive power similar to what we observed in SDM (STAR Methods). The regression coefficients were correlated between nutrient conditions (Figure 5B), suggesting that the impacts of genes on phenotypes were conserved between conditions. We note, however, that rates and yields in ADM were systematically lower relative to SDM (Figure 5A), consistent with what has been observed previously for relative growth rates on these carbon sources. Consequently, the magnitudes of regression coefficients were generally smaller in ADM than in SDM (Figure 5B). This indicates that, while conserved genotype to phenotype relationships may generally underlie predictive power across different environments and media conditions, predictions for a particular environment will be more accurate when trained using data measured in that environment.

Figure 5

Figure 5. Metabolite dynamics of individual strains are predictable from gene presence and absence in an alternate carbon source All strains in the 79-strain library were screened for growth on an acetate-defined medium (ADM), and consumer-resource parameters were measured for the 64 strains that grew in this medium. (A) Observed consumer-resource parameters on succinate-defined medium (SDM) are plotted against observed parameters on ADM. The dashed line indicates perfect agreement between the values observed on SDM and ADM. The Pearson correlations between the observed values are shown, and p< 10 -4 for all correlations (permutation test). (B) The consumer-resource parameters on ADM were regressed against gene presence and absence via L1-regularized linear regression. The resulting regression coefficients, bADM, are plotted against the coefficients for regressions on parameters measured in SDM, bSDM (shown also in Figures 4G–4J). The dashed line indicates perfect agreement between each pair of regression coefficients. Pearson correlations are shown, and p = 0:008, 0.01, < 10 -4, and < 10 -4 for rA, gA, rI, and gI, respectively (permutation test). The color of each point corresponds to the gene function, as indicated in the legend further on. See also Table S1.

Mechanistic interpretation of regression coefficients

Why did gene presence and absence alone hold such strong predictive power for metabolite dynamics, and why did the regressions select specific genes in the denitrification pathway as informative predictors? We propose that by characterizing metabolic phenotypes in terms of rates and yields, we captured the salient features of the metabolic process for each strain and that this enabled the regressions to succeed by exploiting the conserved correlations between the presence of specific genes and metabolic phenotypes. In some cases, these correlations appear to be related to the functional roles of specific genes in the pathway. We found that, for some genes, the sign and magnitude of the regression coefficients agree qualitatively with known properties of the associated enzymes. For example, previous comparisons between membrane-bound and periplasmic nitrate reductases (encoded by narG and napA, respectively; Figure 2B) in multiple bacterial strains showed that the membrane bound enzyme exhibits higher nitrate reduction activity in vitro than the periplasmic enzyme. This accords with the large positive coefficient for narG in the nitrate reduction rate regression (Figure 4G). Similarly, in the nitrite reduction rate regression, we observed a large positive coefficient for the gene encoding the copper based nitrite reductase (nirK) (Figure 4I), which in previous studies, showed markedly higher activity in vitro and in vivo compared with the alternate nitrite reductase enzyme encoded by nirS. Further, our regression coefficients showed larger contributions of narG versus napA to yield on nitrate (Figure 4H) and, similarly, cnor versus qnor to yield on nitrite (Figure 4J). Both of these observations are consistent with the fact that the genes encoded by narG and cnor contribute more to the proton motive force (and, therefore, to ATP generation) than their alternatives (napA and qnor, respectively) do. Finally, the transporter encoded by the gene narK1K2 (Figure 2B) is a fusion of the nitrate/H+ symporter NarK1 and the nitrate/nitrite antiporter NarK2, the latter of which is crucial for exchanging nitrate and nitrite between the cytoplasm and periplasm during denitrification when the membrane-bound nitrate reductase is utilized. In Paracoccus denitrifificans, this fusion has been shown to have substantially higher affinity for nitrate than NarK2 alone, resulting in higher growth rates under denitrifying conditions. This agrees with what we found in the nitrate and nitrite reduction rate regressions, in which we observed large positive contributions of narK1K2 (Figures 4G and 4I).

Taken together, these observations suggest that the regressions exploited conserved correlations between gene presence and metabolic traits that reflect known mechanistic properties of the denitrification pathway. It is important to note, however, that for many nonzero coefficients in our regressions, notably those corresponding to regulators, there is no clear mechanistic interpretation. Further, given that our regressions were trained on genomes of wild isolates and not on phenotypes of deletion mutants, we do not expect that the regression can be reliably used to predict mutant phenotypes. Instead, we expect that the regressions exploited the tendency for strains possessing specific genes to have specific traits on average (e.g., strains with NarG tend to have high rA; gA). These correlations between the presence of specific genes and metabolic traits qualitatively agree with the mechanistic details of some genes in the pathway, but we do not expect the regression coeffificients to make causal predictions about the loss of a single gene.

Implications of a statistical approach to mapping genomic structure to metabolic traits

Our statistical approach took two important steps toward mapping genomic structure to metabolic dynamics at the single strain level. First, by making quantitative measurements in the laboratory, we removed the confounding environmental factors present in sequencing and metabolomic studies of natural communities to reveal that gene content has a conserved impact on dynamic metabolic phenotypes. Second, our results suggest that a statistical approach could be used to discover the key genomic features of pathways that determine other metabolic phenotypes, complementing direct genetic investigation of model organisms. Finally, our predictions of metabolic phenotypes from genomes apply across a range of conditions and generalized well out-of-sample, suggesting that this approach can predict metabolite dynamics in settings for strains where only genome sequence data are available. These insights were made possible by parameterizing metabolic phenotypes across a genomically diverse strain library of non-model organisms, thereby exploiting genomic variation to learn the mapping from genotype to metabolic phenotypes.

Predicting metabolite dynamics in communities

Predicting community metabolite dynamics from genomic structure requires mapping single-strain phenotypes to collective behavior. Previous studies have found some success in predicting metabolite dynamics in consortia from knowledge of the monoculture metabolite consumption dynamics. These approaches used simple assumptions, such as a fixed rate of metabolite production or consumption for each strain, rather than a dynamic model of metabolites. To predict community metabolite dynamics, we used the consumer-resource modeling formalism that describes metabolite dynamics for each strain to make quantitative predictions for metabolite dynamics in communities of multiple strains (Step 3, Figure 1). Since the consumer-resource parameters were sparsely encoded by the genomes of each strain (Figure 4), predicting community metabolite dynamics from the consumer-resource model would provide a mapping from gene content to community metabolism.

Therefore, we extended to our modeling formalism to N-strain communities by adding the rate contributions of each strain to the dynamics of nitrate and nitrite (Figure 6B; Equation 10; STAR Methods). This ‘‘additive’’ model assumes that strains interact only via cross-feeding and resource competition for electron acceptors. This model also assumes that the rates and yields on nitrate and nitrite for strains in pair culture are the same as in monoculture. As a result, the model provides predictions for N-strain community metabolite dynamics given the consumer-resource model parameters for individual strains without any free parameters.

To evaluate the ability of our consumer-resource model to make predictions of metabolite dynamics in communities, we used measured consumer-resource parameter values (Figure 3C) and not the values predicted by gene presence and absence (Figures 4C–4F). This allowed us to disambiguate the errors associated with the failure of the model to predict metabolite dynamics from the errors associated with predicting phenotypic parameters from genomes. However, as we subsequently discuss, using consumer-resource model parameters predicted from genomes has, at most, a modest impact on errors in our predictions of community metabolite dynamics.

Predicting metabolite dynamics in two-strain communities

We tested the ability of this approach to predict metabolite dynamics in all pair combinations of 12 strains from our library (4 Nar/Nir, 4 Nar, and 4 Nir). We assembled communities in 96-well plates containing SDM, supplying either nitrate or nitrite initially in two separate experimental conditions and then sampled over a 64-h period to measure concentrations of nitrate and nitrite (STAR Methods). Remarkably, we found that the additive model accurately predicted the metabolic dynamics for most 2-strain communities (Figures 6, S3A, and S3B) using only the measured consumer-resource parameters for individual strains. Specifically, the third column of Figure 6A shows the zero-free-parameter predictions (curves) of denitrification dynamics in 2-strain communities, which agreed well with measurements (points). The 2-strain community predictions include non-trivial dynamics, such as a transient increase in nitrite for a Nar/Nir + Nar community. In addition, we observed that the additive model accurately predicted total endpoint optical densities and community compositions (Figure S4; STAR Methods) in most cases, indicating that the model generally captures strain abundance dynamics in communities.

We quantified the quality of the additive model predictions for metabolite dynamics by computing a normalized root-meansquare error (NRMSE; see caption of Figure 6; Equation 12; STAR Methods). We found that most 2-strain communities have NRMSE between 0 and 2, indicating that our model successfully predicted metabolite dynamics given only the measured consumer-resource parameters for each strain. Predictions of metabolite dynamics in pair cultures were also accurate when using consumer-resource parameters predicted from genomes via regression (Figures S5A and S5B; STAR Methods). Further, the success or failure of the model predictions depended on the phenotypes of the strains present. The model successfully predicted 2-strain metabolite dynamics for most combinations of phenotypes (e.g., Nar/Nir + Nar or Nar + Nar) but failed only in the case where Nar strains were cultured with Nir strains (Figures 6A, 6C, S3C, and S3D). The failure of our model predictions in Nar + Nir communities followed the common pattern that the rate of nitrate reduction was slower than expected (bottom row, Figures 6A and S6). We speculate that this failure of the model to predict metabolite dynamics in Nar + Nir communities was caused by excretion of nitric oxide by the Nir strain. Nitric oxide can be cytotoxic, which may explain slower rates of nitrate reduction for Nar strains. For further exploration of this phenomenon, see the discussion section.

Figure 6

Figure 6. Metabolite dynamics in two-strain communities are predictable from monocultures (A) Examples of pair culture dynamics for all combinations of the three denitrification phenotypes (Nar/Nir, purple; Nar, blue; Nir; red). The first two columns show metabolite dynamics for each of two strains cultured individually. The third column shows the metabolite dynamics for pair cultures of the two strains (points) with zero-free-parameter predictions using the consumer-resource model (curves, see model in B). All cultures were performed in SDM, and predictions were based on measured monoculture consumer-resource parameters in SDM, not those inferred from genomes. Errors in pair culture predictions are shown in each panel in the third column as quantified by the normalized root-mean-square error (NRMSE). For pair cultures, we defined NRMSEij = RMSEij=ððRMSE2i +RMSE2j Þ=2Þ1=2, where RMSEij is the root-mean-square error between model predictions and observed metabolite concentrations of strains i and j in pair culture, and RMSEi and RMSE j are the RMSEs of strains i and j in monoculture. NRMSE in the range 0–2 indicates errors in 2-strain communities that are within 2-fold of fits associated with their constituent monocultures. (B) An N-strain consumer-resource model (based on the model in Figure 3B) was used to predict pair culture metabolite dynamics ðN = 2Þ. A and I are nitrate and nitrite concentrations, respectively. xi denotes the biomass density of strain i with parameters r i A, g i A, r i I, and g i I, which were determined from monoculture experiments (Figure 3C). The K values were fixed at 0.01 mM for all strains. (C) A matrix of NRMSE values quantifying the quality of model predictions for all pairs of 12 strains: 4 Nar/Nir, 4 Nar, and 4 Nir. NRMSE values are shown for communities cultured in SDM with nitrate initially supplied, with the exception of Nir + Nir pairs for which nitrite was initially supplied. Only Nar + Nir communities are poorly predicted by the consumer-resource model (permutation test, p<1310 -5, Figures S3C and S3D). See also Figures S3–S6.

Predicting metabolite dynamics in larger communities

Next, we asked whether dynamical metabolic phenotypes measured from monocultures could be used to predict metabolite dynamics in 3–5-strain communities. We applied the additive model to predicting the nitrate and nitrite dynamics in 81 combinations of 3 strains, 21 combinations of 4 strains, and 6 combinations of 5 strains from the 12-strain subset (STAR Methods). As with pair cultures, 3–5-strain communities were cultured in SDM with either nitrate or nitrite supplied initially in two separate experimental conditions. In communities that did not contain a Nar + Nir pair (e.g., Figure 7A), we found that prediction accuracy was high (gray symbols, Figures 7B and S3E). This again indicated that in most combinations of phenotypes, community dynamics were predictable from the consumer-resource parameters of each strain in the community. However, in communities that contained a Nar + Nir pair, predictions were relatively poor (yellow symbols, Figures 7B and S3E), suggesting that interactions between Nar and Nir phenotypes that were not captured in the additive model were again driving low prediction accuracy. Finally, we note that the additional error in community metabolite dynamics predictions associated with predicting phenotypes from genomes was typically modest (median increase in NRMSE z 0:5–1.4) for 3–5-strain communities (Figure S5C; STAR Methods).

Correcting for interactions between Nar and Nir strains

To address the impact of interactions between Nar and Nir strains not accounted for by our additive model in 3–5-strain communities, we took a coarse-graining approach. We asked whether the metabolic contributions of Nar + Nir pairs could be treated as modules within larger communities. To do this, we re-fitted nitrate and nitrite reduction rates (rA, rI) to pair culture data (cultured in SDM with nitrate) for each Nar + Nir pair, leaving yields fixed (Figures 7C and S7A; STAR Methods). This resulted in effective nitrate and nitrite reduction rates (r~A, r~I) for each Nar + Nir pair. In every case, we observed that the re-fitted nitrate reduction rates ~rA were lower than the monoculture nitrate reduction rates (Figure S7B), demonstrating quantitatively that Nar strains were consistently slowed by the presence of Nir strains. This observation is consistent with the hypothesis of excretion of cytotoxic nitric oxide by the Nir strain.

We then used the re-fitted rates for Nar + Nir pairs to make predictions for communities (cultured in SDM with nitrate) that included such pairs (e.g., Figure 7D). For communities that included multiple Nar + Nir pairs, we developed a simple averaging rule for determining the effective rates from the rates for each Nar + Nir pair present (STAR Methods). For example, in a Nar + Nar + Nir community, there are two sets of Nar + Nir pair interactions, with a different effective nitrite reduction rate ~r I measured for the Nir strain in its interactions with the two Nar strains. In this example, we would take the mean of these two effective reduction rates as the value used for prediction. We found that the metabolite dynamics in 3–5-strain communities containing Nar + Nir pairs were quantitatively well predicted by this coarse-graining approach (yellow symbols, Figure 7B). We concluded that treating Nar + Nir pairs as effective modules within larger communities recovered the predictive power of the additive consumer-resource model.

DISCUSSION

Quantifying the metabolic phenotypes of a diverse library of natural isolates using a consumer-resource model allowed us to take a statistical approach to connecting genotypes to dynamical metabolic phenotypes. The outcome was a sparse mapping from gene content to single-strain metabolite dynamics that exploited conserved correlations between metabolic traits and gene presence, some of which reflect the known mechanistic properties of enzymes in the denitrification pathway. The resource-based modeling formalism then permitted quantitative predictions of community-level metabolite dynamics. As a result, the approach yielded a mapping from genomic structure to metabolite dynamics at the community level for denitrifying bacterial communities.

A key contribution of this study is the demonstration of a quantitative mapping between gene content and metabolic traits for a model metabolic process. One might expect that gene presence and absence is too coarse a genomic feature to predict dynamic metabolic traits and that other genomic features, such as promoter sequences, synteny, or allelic variation, would be necessary to make predictions. We instead found that the association between gene presence/absence and metabolic traits is strong. This result suggests that selection for specific metabolic traits in bacteria may primarily favor genomes with specific complements of genes and that more granular details of the genome, such as promoter sequences or allelic variation, are less important.

At the community level, we found that interactions beyond those described by the additive consumer-resource model are not idiosyncratic but instead exhibit a general pattern (i.e., they occur only when Nar and Nir strains are both present). This suggests that interactions beyond resource competition may exhibit patterns that can be discovered in the laboratory. The fact that community-level metabolite dynamics departed from the additive model in Nar + Nir communities suggests that such interactions may be more likely to occur when specific metabolic processes, such as facilitation via the exchange of a metabolite, are at work.

Improving predictions of community metabolism from genomes

There are some important caveats that apply to our prediction of single-strain metabolic traits from genomes and community level metabolism from monocultures. For one, by parameterizing metabolite dynamics using a consumer-resource model, we assumed that the model could approximate the metabolic phenotypes of wild isolates. For most of our library (62/79 strains), this approximation worked well, but in some cases (17/79 strains), the model failed for at least some initial conditions (Figures S1J–S1L; Table S5; STAR Methods). These failures may have occurred because the model does not capture phenomena such as the inhibition of reduction rates by reaction products. Going forward, the assumptions of the model could be relaxed by applying methods to learn the appropriate phenotypic parameters directly from the data.

Although we set out to obtain a diverse strain library for the purpose of mapping genomic variation to dynamic metabolic phenotypes, it is important to note that our library is composed solely of Proteobacteria and does not contain representatives from other phyla. This limitation means that it is unclear whether our regression approach can predict phenotypes of distantly related strains (e.g., gram-positive bacteria). In addition to the 79 strains described in this study, we attempted to assay the denitrification dynamics for three gram-positive Nar strains from the phylum Actinobacteria. We found their reduction rates to be slower than any strain in our library (~ 0.1 mM/OD/h), resulting in almost negligible nitrate reduction over 64 h. This observation suggests that denitrification phenotypes in clades distant from Proteobacteria may be distinct, with rates that are potentially much slower than what we observed for Proteobacteria. Supporting this idea, denitrification in gram-positive bacteria is poorly understood, and previous studies that collected phenotypic data similar to ours characterized only Proteobacteria. Therefore, extending our results to more diverse strains would require phenotyping a phylogenetically expanded library.

Considering the broader applicability of our statistical approach, there are some limitations to the types of metabolic processes and interactions that can be readily studied. Denitrification is a well-studied metabolic process, with the relevant enzymes known and easily annotated. Extending our method to less well-studied metabolic traits would require new approaches to learn the appropriate genomic features from data, since it may be challenging in those contexts to choose genes based on mechanistic knowledge. High-throughput mutant screens on wild isolates, for instance, via barcoded transposon mutant libraries, could be used to discover unannotated or poorly annotated genes that are important for metabolic traits and potentially useful as predictors for metabolic phenotypes.

Bridging the gap between the synthetic communities studied here and communities in the wild will require engaging with the chemical and spatial complexity of natural denitrifying communities. First, it is unclear whether the additive and non-additive interactions described here are relevant to wild communities. One way to determine the relevance of these interactions would be to measure co-occurrence between genotypes in natural contexts. Second, it remains to be seen how our approach generalizes to the complex nutrient environments, such as mixtures of organic carbon sources, that are characteristic of natural communities. One approach to this problem would be to quantify nitrate and nitrite dynamics directly in soils and ask whether gene content can predict metabolite dynamics in this context. Finally, denitrification in nature occurs in the presence of other metabolic processes, where it often depends on nitrate from nitrifiers and competes with dissimilatory nitrate reduction to ammonia for electron acceptors. Extending the approach taken here to a broader ecological context that includes other metabolic fluxes is an important avenue to pursue.

Applying predictions of community metabolism from genomes

At the single-strain level, the apparent mechanistic relevance of the regression coefficients in this study suggests that a statistical approach, coupled with large-scale culturing and phenotyping on libraries of isolates, can be exploited to discover the salient features of genomes that determine community metabolism. Higher-throughput measurements will enable a more detailed investigation of genomic features, allowing us to extend our statistical approach to variation in gene sequences and synteny.

Further, statistical predictions similar to those employed here could be used to help specify constraint-based metabolic models. Constraint-based metabolic models are refined using experimental measurements of metabolic traits, but measuring these traits is challenging, especially for unculturable taxa or strains that are difficult to isolate from complex communities. Since our approach enables the prediction of metabolic phenotypes from genomes, these predictions could be used to refine constraint-based models of metabolic networks using genomic data alone and, thus, circumventing the need to experimentally measure metabolic phenotypes.

At the community level, our approach could eventually enable the prediction of metabolite dynamics in communities where gene presence and absence for individual genomes is known. Soils and host-associated communities typically contain hundreds of bacterial taxa; therefore, it may be necessary to test the predictive power of the consumer-resource formalism in communities of many taxa. However, data from soils suggest that denitrification may occur locally, on 10–20 mm grains. At this small scale, it is possible that communities are composed of just a few strains. If this is indeed the case, our results for communities of 2–5 strains (Figures 6 and 7) might apply to denitrifying communities in soil.

Departures from model predictions in Nar + Nir communities

It is striking that communities containing both Nar and Nir phenotypes departed from the expectation of an additive consumer-resource model (Figures 6C and 7B). We proposed that the inhibition of nitrate reduction in Nar + Nir communities may be caused by nitric oxide produced by the Nir strains. Consistent with this hypothesis, the most strongly inhibited Nar strains (PDM12 and PNT03, Figure S7) lack nitric oxide reductase (Figure 4A); therefore, they likely cannot alleviate this toxicity. In addition, the strongly inhibited Nar strains possess the periplasmic nitrate reductase (Figure 2B), which is exposed to the toxic effects of extracellular nitric oxide, whereas the weakly inhibited Nar strain ACV02 possesses the membrane-bound nitrate reductase, which is shielded from nitric oxide in the cytoplasm. Although Nir strains possess nitric oxide reductase and, therefore, could alleviate toxicity by reducing nitric oxide to nitrous oxide, Nir strains often transiently accumulate nitric oxide transiently. Consistent with this idea, when we measured relative abundances of Nar and Nir strains in co-culture, we observed smaller fractions of Nar strains relative to our model predictions in most cases (Figure S4B).

To describe metabolite dynamics in communities where both Nar and Nir strains were present, we chose not to expand our modeling formalism to include our hypothesized mechanism of Nar strain inhibition. Instead, we used measurements from Nar + Nir pair cultures to describe community-level metabolite dynamics (Figure 7). The advantage of this approach was to maintain a small number of model parameters, but it came at the expense of mechanistic interpretation. Another possible disadvantage of our approach was the challenge of modeling communities with multiple Nar and Nir pairs. However, we found that a simple averaging method (STAR Methods) succeeded in describing community metabolite dynamics, even when multiple Nar + Nir pairs were present in communities of 3–5 strains (Figure 7B).

We note that Nar + Nir pair cultures are metabolically distinct from Nar/Nir monocultures, in that the former splits the denitrification pathway across two genomes resulting in obligate cross-feeding. It is notable that our model fails only in the case where cross-feeding is required, suggesting that our formalism is most relevant for competitive interactions and that accurately predicting obligate cross-feeding from monoculture information alone may require additional parameters. The ecological context of denitrification pathway splitting at nitrite reduction is believed to be associated with environmental pH, with low pH favoring a split pathway. This hypothesis comes from a previous study showing that the transient accumulation of nitrite during denitrification can be reduced by segregating the processes of nitrate and nitrite reduction across genomes. Reducing transient nitrite accumulation is advantageous in low pH environments, where nitrite forms toxic intermediates. Because we observe Nar + Nir communities escaping the transient accumulation of nitrite (Figures 6A and S6), our results are consistent with splitting of the denitrification pathway at nitrite reduction as an adaptation to acidic environments.

CONCLUSION

We find it striking that a statistical approach can uncover a simple relationship between gene content and metabolite dynamics in communities of diverse wild isolates. It is our hope that future work can leverage this approach to understand and predict the metabolic activity of microbial communities in natural settings.

Limitations of the study

We assumed that metabolic phenotypes can be captured by a consumer-resource model, an assumption that breaks down for a fraction of our isolates and limits the direct applicability of our approach to strains and processes that can be modeled using a simple phenomenology. For example, our modeling formalism works well when the electron acceptor is limiting but may fail when the donor (organic carbon) is limiting.

Our regression approach exploits correlations between genotype and phenotype to make predictions. To some extent, these correlations reflflect conserved phenotypic impact of certain genes, but phylogenetic correlation also plays a role. Therefore, we do not expect the regression to make causal predictions of the impact of single-gene knockout mutations on phenotypes.

Our library of isolates comprises strains from the phylum Proteobacteria. We do not expect our results to generalize to distantly related denitrififiers in other phyla, such as gram-positive bacteria. Expanding the library is likely necessary to predict phenotypes of distantly related strains.

Our approach has been demonstrated for comparatively simple nutrient conditions in well-mixed conditions. It remains to be seen how well this statistical approach will work in natural contexts, where spatial structure and complex chemical environments are present.

STAR+METHODS

Detailed methods are provided in the online version of this paper

and include the following:

d KEY RESOURCES TABLE

d RESOURCE AVAILABILITY

B Lead contact

B Materials availability

B Data and code availability

d EXPERIMENTAL MODEL AND SUBJECT DETAILS

B Strains

B Isolation of denitrifying bacteria from soils

B Defifined growth medium

B Denitrifying conditions

d METHOD DETAILS

B Assay of nitrate and nitrite

B Denitrifification dynamics experiments

B Whole genome sequencing and annotation

B Phylogenetic classifification of strains

B Measuring relative abundances and contamination

d QUANTIFICATION AND STATISTICAL ANALYSIS

B Consumer-resource model for metabolite dynamics

B Inferring phenotypic parameters from data

B Regressing SDM phenotypes onto denitrifification gene

content

B Characterizing phylogenetic correlation

B Evaluating randomly-selected genes as predictors

B Evaluating alternative genomic predictors

B Regressing ADM phenotypes onto denitrifification gene

content

B Predicting community metabolic dynamics

B Correcting for Nar + Nir interactions

Figure S1

Figure S1. Fitting consumer-resource parameters, related to Figure 3 (A) Histogram showing the distribution of yield intercept ðg0Þ values for 79 strains cultured in SDM. Units are dimensionless absorbances at 600 nm, path length normalized to 1 cm. (B–D) Examples of yield parameter fits using data obtained in different growth conditions. Points show observed values of DOD600 = OD60064 – OD6000 from different conditions where different amounts of nitrate and nitrite are reduced (DA= A0 A64 and DI = A0 A64 + I0 I64, respectively) with 4 replicates used in each condition. The plane/lines show the least-squares fits of the data to Equation 6 (STAR Methods). (B) shows the yield fit for the Nar/Nir strain P. denitrificans ATCC 19367 in SDM. (C) shows the yield fit for the Nar strain Raoultella sp. RLT01 in SDM. (D) shows the yield fit for the Nir strain Pseudomonas sp. PDM13 in SDM. (E) Example global fits of the consumer-resource model (Equation 3; STAR Methods) to nitrate and nitrite dynamics data for the Nar/Nir strain P. denitrificans ATCC 19367 cultured in SDM. Points show measured concentrations of nitrate and nitrite, and curves show optimal model fits. Four replicates were used in each experimental condition. (F and G) Distributions of fractional errors (%) for (F) the affinity parameters KA and KI and (G) the rate parameters rA and rI, fit using SDM monocultures. Distributions were computed via nonparametric bootstrap with KA; KI constrained during fitting between 0.001 and 10. Fractional error is defined as the ratio of the interquartile range obtained via bootstrapping to the value of the parameter obtained using a standard fit to all experimental data. (H) Example nitrate and nitrite dynamics for the Nar strain Raoultella sp. RLT01 cultured in SDM. Solid lines show the fit to Equation 3 (STAR Methods) holding KA fixed at 0.001, while dashed lines show fits holding KA = 0:1. (I) Comparison of model fit errors (RMSE) for N = 79 denitrifying strains cultured in SDM. Fits that hold KA = KI = 0:01 and fits that take KA and KI as free fitting parameters are compared. A two-sample Kolomogorov-Smirnov test accepts the null hypothesis that underlying distributions for the two samples are the same ðp = 0:97Þ. Boxplots indicating quartiles of each distribution are shown. (J) Example Nar/Nir strain PDM25 cultured in SDM for which reduction rates were fit using all experimental conditions (dashed lines, RMSE = 0:177) and using only conditions where nitrate was initially supplied (solid lines, RMSE = 0:138). (K) Example Nar strain ENT03 cultured in SDM for which the reduction rate rA was fit using all experimental conditions (dashed lines, RMSE = 0:254) and using only conditions where OD6000 = 0:01 (solid lines, RMSE = 0:095). (L) Example Nar strain PDM27 cultured in SDM for which nitrate concentrations appear to asymptotically approach a nonzero value. (M) The biomass densities of four Nar/Nir strains (Achromobacter sp. ACM01, Ensifer sp. ENS09, Paracoccus denitrifificans ATCC 19367, and Pseudomonas sp. PDM21) grown in SDM were measured in denitrifying conditions over 64 h (points) to validate the predictions (curves) of the consumer-resource model (Equation 3; STAR Methods). The parameters for each strain were inferred in previous SDM monocultures (STAR Methods). 2 mM NO3 was initially supplied to each culture, and three experimental replicates were used for each time point. The median relative errors in biomass density predictions at t = 64 h were 5.2%, 7.5%, 9.4%, and 6.4% for ACM01, ENS09, PAR19367, and PDM21, respectively

Figure S2

Figure S2. Randomly selected genes as alternative predictors for consumer-resource parameters, related to Figure 4 (A) Distributions of gene presence frequency (fraction of strains that possess a given gene) for denitrification-related genes (red) and the distribution of gene presence frequency for all other annotated genes (black), both in the ensemble of the 75 nitrate-reducing strains. Solid lines show empirical cumulative distribution functions (CDFs) of gene frequencies, and dashed lines show kernel density estimates of these distributions (bandwidth = 0.4). (B and C) Distributions of R 2 CV values obtained by regressing each of the SDM consumer-resource parameters onto the presence and absence of sets of randomly selected genes (13103 sets of random genes per consumer-resource parameter). (B) shows results for genes randomly selected from the set of all annotated genes across strains in our library, while (C) shows results for genes randomly selected from the set of all annotated genes excluding those that have large and significant correlation (jrjR 0.5) with any denitrification genes. Dashed lines indicate R 2 CV values obtained in regressions onto the presence and absence of denitrification-related genes (the same values are shown in both panels) with the corresponding quantile values (q). (D) Points show distributions of R 2 CV obtained by regressing the SDM consumer-resource parameters onto sets of the 17 denitrification genes plus P 17 additional randomly selected genes (10 predictor sets per consumer-resource parameter), and the solid lines pass through the median values of these distributions as a function of P. The values of R 2 CV shown at P = 17 are the same as the values indicated by dashed lines in (B) and (C).

后面还有很多,弄不了了。神刊,神文,看不太懂其实。。。想哭。。。

你可能感兴趣的:(文献阅读2.1 基因组结构预测微生物群落中的代谢物动力学)