The growing collection of complete genome sequences presents a major opportunity to understand how genetic variation (and conservation) maps to phenotypic diversity. That is, how do changes at the genome level influence how an organism appears and behaves? By studying evolutionary statistics across many genomes we hope to decipher the global pattern of functional constraints between genes, and distinguish interactions critical to core cellular function from those that occur idiosyncratically within particular species.

Evolutionary conservation and co-evolution can be quantified for many properties, and at varying spatial scales or resolutions. For example, correlations in gene presence or absence (co-occurrence), genomic location (synteny), and protein sequence (amino acid frequency) provide three separate but potentially complementary mappings of genetic interaction. In all cases, the underlying goal is to understand the organization and function of cellular systems quantitatively and at a depth that allows rational manipulation and de novo design.

This objective is currently manifested as three main projects within our lab:

I. Allosteric regulation and communication between proteins.
A network of co-evolving amino acid positions links regulatory sites on the protein surface to the active site.

Regulation and communication between individual protein domains is a basic building block for the assembly of larger cellular systems. Analysis of amino acid co-evolution indicates a general architecture for natural proteins in which sparse networks of amino acids underlie basic aspects of structure and function. These networks, termed sectors, are spatially organized such that active sites are physically linked to particular surface sites distributed throughout the protein structure. We showed that perturbations at specific sector-connected surface positions are able to rapidly initiate conformational control over protein function [Reynolds et al, 2011, Cell]. This suggests practical strategies for engineering synthetic regulation [Pincus et al, 2017, Phys Biology], and provides an explanation for the evolution of regulatory diversity in the Eukaryotic protein kinases [Pincus et al, 2017, BioRxiv]. Current projects in the lab include deep mutational scanning studies to understand the structural basis of allostery, computational analyses of the relationship between natural allosteric sites and sectors, and forward evolution experiments to “watch” the evolution of new allosteric systems.

II. Modularity in metabolism
A schematic of central folate metabolism, a model metabolic pathway frequently studied in the lab.

Central metabolism is a universal biological process in which the collective action of many genes provides the energy and raw material for cell growth and division. While we know the parts list of metabolic enzymes and connectivity of chemical reactions, coupling (also known as epistasis) between genes makes it non-trivial to predict metabolic behaviors based on knowledge of the activity of the parts taken independently.

For example, if one metabolic enzyme becomes substantially less active (through mutation or inhibition), how does this effect the remainder of the pathway? Is the effect on fitness negligible, does the pathway locally adapt, or are global compensations in gene expression level/activity necessary?

To address this question, we use co-evolution across species to infer the pattern of functional coupling between genes. In recent work, we used this approach to identify evolutionary modules embedded inside of larger cellular systems [Schober et al, 2017, BioRxiv]. These modules are little groups of proteins that co-evolve with each other, but are relatively independent from the remainder of the cell. Initial experiments indicate that these evolutionary modules may represent cooperative functional units within the cell. We are now more comprehensively testing the relationship between evolutionary modules and functional modules in vivo by forward evolution experiments, quantitative measurements of epistasis (using CRISPR-i), and transplanting rationally chosen groups of genes between bacterial species and assaying for the complementation of function or phenotype. If successful, this line of research will suggest new strategies for the engineering of synthetic pathways, and provide insight into how complex cellular systems evolve.

III. tools for the analysis of evolutionary correlations

We are developing new methods to analyze correlations in amino acid sequence between proteins. This includes developing measures for the co-evolution of individual (and small groups of) amino acids, methods to appropriately correct for the effects of phylogeny, and tools for hierarchically analyzing the resulting correlation matrices. While these challenges will be addressed in the context of analyzing bacterial genome sequence data, we anticipate that the resulting conceptual and technical advances will allow our methodology to be applied to any genome-scale dataset collected over multiple species and/or conditions.