Gene duplication is a common and powerful mechanism by which cells create new signalling pathways1,2, but recently duplicated proteins typically must become insulated from each other and from other paralogues to prevent unwanted crosstalk3. A similar challenge arises when new sensors or synthetic signalling pathways are engineered within cells or transferred between genomes. How easily new pathways can be introduced into cells depends on the density and distribution of paralogous pathways in the sequence space that is defined by their specificity-determining residues4,5. Here we directly investigate how crowded this sequence space is, by generating novel two-component signalling proteins in Escherichia coli using cell sorting coupled to deep sequencing to analyse large libraries designed on the basis of coevolutionary patterns. We produce 58 insulated pathways comprising functional kinase–substrate pairs that have different specificities than their parent proteins, and demonstrate that several of these new pairs are orthogonal to all 27 paralogous pathways in E. coli. Additionally, from the kinase–substrate pairs generated, we identify sets consisting of six pairs that are mutually orthogonal to each other, which considerably increases the two-component signalling capacity of E. coli. These results indicate that sequence space is not densely occupied. The relative sparsity of paralogues in sequence space suggests that new insulated pathways can arise easily during evolution, or be designed de novo. We demonstrate the latter by engineering a signalling pathway in E. coli that responds to a plant cytokinin, without crosstalk to extant pathways. Our work also demonstrates how coevolution-guided mutagenesis and the mapping of sequence space can be used to design large sets of orthogonal protein–protein interactions.
This is a preview of subscription content
Subscription info for Chinese customers
We have a dedicated website for our Chinese customers. Please go to naturechina.com to subscribe to this journal.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Datasets generated during this study have been deposited in GEO. Raw reads and processed sort-seq analysis of each mutant can be found under accession numbers GSE120780 (degenerate PhoQ–PhoP library) and GSE120786 (combinatorial library of 79 PhoQ*–PhoP* variants). Raw reads and reads per kilobase of transcript per million mapped reads (RPKM) for all E. coli genes from RNA-seq are deposited with accession number GSE128611.
Python scripts for analysis are available at https://github.com/mcclune/nature2019.
Alm, E., Huang, K. & Arkin, A. The evolution of two-component systems in bacteria reveals different strategies for niche adaptation. PLOS Comput. Biol. 2, e143 (2006).
Capra, E. J. & Laub, M. T. Evolution of two-component signal transduction systems. Annu. Rev. Microbiol. 66, 325–347 (2012).
Capra, E. J., Perchuk, B. S., Skerker, J. M. & Laub, M. T. Adaptive mutations that prevent crosstalk enable the expansion of paralogous signaling protein families. Cell 150, 222–232 (2012).
Zarrinpar, A., Park, S. H. & Lim, W. A. Optimization of specificity in a cellular protein interaction network by negative selection. Nature 426, 676–680 (2003).
Stiffler, M. A. et al. PDZ domain binding selectivity is optimized across the mouse proteome. Science 317, 364–369 (2007).
Brentjens, R. J. et al. CD19-targeted T cells rapidly induce molecular remissions in adults with chemotherapy-refractory acute lymphoblastic leukemia. Sci. Transl. Med. 5, 177ra38 (2013).
Riglar, D. T. & Silver, P. A. Engineering bacteria for diagnostic and therapeutic applications. Nat. Rev. Microbiol. 16, 214–225 (2018).
Morsut, L. et al. Engineering customized cell sensing and response behaviors using synthetic Notch receptors. Cell 164, 780–791 (2016).
Bashor, C. J., Helman, N. C., Yan, S. & Lim, W. A. Using engineered scaffold interactions to reshape MAP kinase pathway signaling dynamics. Science 319, 1539–1543 (2008).
Sockolosky, J. T. et al. Selective targeting of engineered T cells using orthogonal IL-2 cytokine-receptor complexes. Science 359, 1037–1042 (2018).
Skerker, J. M., Prasol, M. S., Perchuk, B. S., Biondi, E. G. & Laub, M. T. Two-component signal transduction pathways regulating growth and cell cycle progression in a bacterium: a system-level analysis. PLoS Biol. 3, e334 (2005).
Creixell, P. et al. Unmasking determinants of specificity in the human kinome. Cell 163, 187–201 (2015).
Thompson, K. E., Bashor, C. J., Lim, W. A. & Keating, A. E. SYNZIP protein interaction toolbox: in vitro and in vivo specifications of heterospecific coiled-coil interaction domains. ACS Synth. Biol. 1, 118–129 (2012).
Boyken, S. E. et al. De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity. Science 352, 680–687 (2016).
Reinke, A. W., Grant, R. A. & Keating, A. E. A synthetic coiled-coil interactome provides heterospecific modules for molecular engineering. J. Am. Chem. Soc. 132, 6025–6031 (2010).
Stock, A. M., Robinson, V. L. & Goudreau, P. N. Two-component signal transduction. Annu. Rev. Biochem. 69, 183–215 (2000).
Groban, E. S., Clarke, E. J., Salis, H. M., Miller, S. M. & Voigt, C. A. Kinetic buffering of cross talk between bacterial two-component sensors. J. Mol. Biol. 390, 380–393 (2009).
Siryaporn, A. & Goulian, M. Cross-talk suppression between the CpxA–CpxR and EnvZ–OmpR two-component systems in E. coli. Mol. Microbiol. 70, 494–506 (2008).
Capra, E. J. et al. Systematic dissection and trajectory-scanning mutagenesis of the molecular interface that ensures specificity of two-component signaling pathways. PLoS Genet. 6, e1001220 (2010).
Skerker, J. M. et al. Rewiring the specificity of two-component signal transduction systems. Cell 133, 1043–1054 (2008).
Podgornaia, A. I. & Laub, M. T. Pervasive degeneracy and epistasis in a protein–protein interface. Science 347, 673–677 (2015).
Casino, P., Rubio, V. & Marina, A. Structural insight into partner specificity and phosphoryl transfer in two-component signal transduction. Cell 139, 325–336 (2009).
Yamada, H. et al. The Arabidopsis AHK4 histidine kinase is a cytokinin-binding receptor that transduces cytokinin signals across the membrane. Plant Cell Physiol. 42, 1017–1023 (2001).
Nielsen, A. A. et al. Genetic circuit design automation. Science 352, aac7341 (2016).
Miyashiro, T. & Goulian, M. Stimulus-dependent differential regulation in the Escherichia coli PhoQ PhoP system. Proc. Natl Acad. Sci. USA 104, 16305–16310 (2007).
Mutalik, V. K. et al. Precise and reliable gene expression via standard transcription and translation initiation elements. Nat. Methods 10, 354–360 (2013).
Ashenberg, O., Keating, A. E. & Laub, M. T. Helix bundle loops determine whether histidine kinases autophosphorylate in cis or in trans. J. Mol. Biol. 425, 1198–1209 (2013).
Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).
Fowler, D. M. & Fields, S. Deep mutational scanning: a new style of protein science. Nat. Methods 11, 801–807 (2014).
Starr, T. N., Picton, L. K. & Thornton, J. W. Alternative evolutionary histories in the sequence space of an ancient protein. Nature 549, 409–413 (2017).
Diss, G. & Lehner, B. The genetic landscape of a physical interaction. eLife 7, e32472 (2018).
Balakrishnan, S., Kamisetty, H., Carbonell, J. G., Lee, S. I. & Langmead, C. J. Learning generative models for protein fold families. Proteins 79, 1061–1078 (2011).
Culviner, P. H. & Laub, M. T. Global analysis of the E. coli toxin MazF reveals widespread cleavage of mRNA and the inhibition of rRNA maturation and ribosome biogenesis. Mol. Cell 70, 868–880 (2018).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Eddy, S. R. Accelerated profile HMM searches. PLOS Comput. Biol. 7, e1002195 (2011).
Jacomy, M., Venturini, T., Heymann, S. & Bastian, M. ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software. PLoS ONE 9, e98679 (2014).
Ohlendorf, R., Schumacher, C. H., Richter, F. & Möglich, A. Library-aided probing of linker determinants in hybrid photoreceptors. ACS Synth. Biol. 5, 1117–1126 (2016).
Wang, B., Zhao, A., Novick, R. P. & Muir, T. W. Activation and inhibition of the receptor histidine kinase AgrC occurs through opposite helical transduction motions. Mol. Cell 53, 929–940 (2014).
Marina, A., Mott, C., Auyzenberg, A., Hendrickson, W. A. & Waldburger, C. D. Structural and mutational analysis of the PhoQ histidine kinase catalytic domain. Insight into the reaction mechanism. J. Biol. Chem. 276, 41182–41190 (2001).
We thank I. Nocedal, P. Culviner, D. Ding and A. Podgornaia for helpful discussions. M.T.L. is an Investigator of the Howard Hughes Medical Institute. This work was also supported by a grant from the Office of Naval Research (N000141310074) to M.T.L. and C.A.V. and by the NIH pre-doctoral training grant T32GM007287.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Peer review information Nature thanks Tamar Friedlander and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
a, Schematic illustrating the challenge of identifying HK*–RR* pairs that are orthogonal to all endogenous histidine kinases and response regulators. For both histidine kinases and response regulators, the specificity-determining residues define a finite sequence space. The specificity-determining residues of each histidine kinase determine the set of response regulators with which it can interact. These sets, or niches in sequence space, are depicted as ovals, and each cognate response regulator is represented by a black dot (bottom left). A similar representation is shown for each response regulator and the set of histidine kinases with which it can interact (top right). The two sequence spaces are connected, as depicted with coloured cones for a single histidine kinase–response regulator pair. The establishment of a new signalling pathway that is orthogonal to existing systems requires that the two new proteins are compatible with each other, but occupy regions of histidine-kinase and response-regulator specificity space that are incompatible with all of the paralogues that are already present. b, Schematic summarizing the endogenous two-component pathways in E. coli to which a new orthogonal pathway must avoid crosstalk. c, Diagram of the DHp domain of the histidine kinase TM0853 (blue) in complex with its cognate response regulator RR0468 (green). Residues that dictate specificity, and which were randomized in our libraries, are space-filled in orange (kinase) and red (substrate). d, Plot summarizing the number of histidine kinases and response regulators in bacterial genomes. e, Visualization of the GREMLIN model, representing the coevolutionary dependencies between the residues of cognate histidine kinases and response regulators. Blue nodes indicate PhoQ residues, green nodes indicate PhoP residues and the darker nodes are the 11 residues that were randomized in the dual PhoQ–PhoP library. Edge widths indicate the strength of the coevolutionary signal, and the node size of each residue represents the total coevolutionary signal to residues on the other protein.
a, Schematic summary of library design. b, Left, histogram of the read counts for the pre-selection PhoQ–PhoP library. The vast majority of reads are unique, which indicates that the library size is larger than Illumina sequencing coverage and no variants are overrepresented. Right, histogram of the read counts for one replicate of the PhoQ–PhoP library after overnight growth in low Mg2+ conditions. c, Counts of cells sorted into each bin, for each replicate and growth condition. d, The cells sorted into each bin were grown overnight, diluted back to mid-exponential phase, shifted to medium with 10 μM Mg2+ and their YFP levels were verified by flow cytometry. n = 2 independent biological replicates. e, As in d, but with cells retained in medium with 50 mM Mg2+. n = 2 independent biological replicates. f, Scatter plots displaying the correlations between the bin frequencies of individual variant pairs measured in independent replicates. Only 106 data points are shown for clarity. R2 values indicate the Pearson correlation coefficients, calculated using all data points. g, As in f, but displaying only the 10,595 variants with sufficient sequencing coverage and fit quality (Methods) to be included in the analysis. h, Scatter plot displaying the Pearson correlation between the YFP fold induction measured by sort-seq and that measured individually by flow cytometry for 32 individual variant pairs. i, Sequence logos summarizing the amino acid frequencies at each position varied in the pre-selected library (top), set of pairs with >20-fold induction (middle) and all native histidine kinase–response regulator pairs (bottom). The residues found at these positions in wild-type PhoQ and PhoP are listed below. j, The FACS gating strategy for isolating single live cells for quantification of YFP expression.
a, Histogram of the total read counts for the 10,595 variants with sufficient coverage and fit quality to be included in analysis. b, A schematic example of the downsampling method used to simulate low read coverage using variants with high read coverage. c, Fold induction of high-coverage functional PhoQ*–PhoP* variants (fold induction >20) after simulating lower read coverage using downsampling and refitting. n = 100 independent downsampling simulations. d, A quantification of how read coverage in c affects the calculated fold induction of functional variants. The fraction of functional variants that continue to display high fold induction at lower read coverage is plotted with respect to read coverage. e, As in c, but for nonfunctional (fold induction < 20) PhoQ*–PhoP* pairs with high read coverage. n = 100 independent downsampling simulations. f, As in d, but for nonfunctional variants. The fraction of nonfunctional variants that continue to display low fold induction at lower read coverage is plotted with respect to read coverage.
a, Phosphotransfer reactions for PhoQ*–PhoP* variants, as well as PhoQ* with wild-type PhoP (middle) and wild-type PhoQ with PhoP* (right). These experiments were repeated independently twice with similar results. b, Quantification of phosphatase activity for PhoQ1*,PhoQ5*,PhoQ12* and PhoQ14*, as in Fig. 2c. Lines indicate mean ± s.d. from n = 3 biological replicates. c, Phosphotransfer profiles of wild-type PhoQ (top) and three PhoQ* variants, as in Fig. 2d, but for 60 min and 5 min instead of just 5 min. In each case, the kinase was autophosphorylated and then incubated for 5 or 60 min individually with each of 27 response regulators from E. coli, and its selected PhoP* variant, followed by SDS–PAGE and autoradiography. The position of the autophosphorylated kinase and the approximate positions of any phosphorylated regulators are indicated by arrowheads on the left. These experiments were repeated independently twice with similar results.
a–e, Phosphotransfer profiles of five histidine kinases endogenous to E. coli: EnvZ (a), RstB (b), CreC (c), CpxA (d) and PhoR (e). In each case, the kinase was autophosphorylated, then incubated for 5 or 60 min. with its cognate response regulator, with wild-type PhoP, or with one of eleven PhoP* variants, and analysed as in Extended Data Fig. 4. n = 1 independent experiment.
Extended Data Fig. 6 Insulation of PhoQ* variants from E. coli response regulators, with respect to phosphatase activity.
a, Phosphatase activity of PhoQ was assessed by measuring the decay of phosphorylated response regulators. Twelve E. coli response regulators were selected for their ability to be stably phosphorylated in vitro by a cocktail of six E. coli histidine kinases (CreC, RstA, PhoR, PhoP, EnvZ and CpxA, each at 250 nM). After 2 h of pre-incubation with radiolabelled ATP and this kinase cocktail, each regulator was combined with 2 mM PhoQ and the phosphorylation state of the regulators was measured after 0, 60 and 120 min. n = 1 independent experiment. b, Phosphatase profiles conducted as in a for PhoP* variants. Quantification of wild-type PhoP and PhoP* variant phosphorylation (normalized to 0 h (t = 0) to display decay) is plotted on the right. n = 1 independent experiment. c, Phosphatase profiles conducted as in a for PhoQ* variants. n = 1 independent experiment. d, Ratio of response regulator phosphorylation between phosphatase profiles with PhoQ* variants (c) and wild-type PhoQ (a, b).
a, RNA-seq analysis of strains containing wild-type PhoQ–PhoP or the indicated variant pair, measured after 30 min with 10 μM or 50 mM Mg2+. Each strain displays a similar Mg2+-limitation induction of three genes (mgtA, mgtL and mgrB) in the PhoP regulon, as well as the PhoP-dependent reporter gene yfp. b, The expression change of each response-regulator and histidine-kinase gene in E. coli. Colours represent the fold change (expressed in log2) in the response to low extracellular Mg2+. rstA and rstB are part of the PhoP regulon, and so show changes in transcription after activation of PhoQ or of several PhoQ* variants. Otherwise, most two-component pathways show little induction by the wild-type PhoQ–PhoP and the PhoQ*–PhoP* pairs. c, The same data as in b, but with the fold change of each variant pair normalized to the fold change seen with wild-type PhoQ–PhoP, in which the latter is the geometric mean of two wild-type replicates. d, P values of the z-score calculated for each value in c. For each gene and each variant, z-scores represent the deviation of the variant/wild-type ratio of that gene, when compared to the distribution of the variant/wild-type ratio of every gene. Using all E. coli genes with reads across multiple samples (n = 3,477), P values were calculated with a one-tailed z-test to identify genes that induced more strongly with the variant pairs than with the wild-type pair. The statistical significance threshold after correcting for multiple hypothesis testing is indicated on the colour legend that encodes the P values. None of the other two-component signalling genes in E. coli is significantly induced by the variant PhoQ*–PhoP* pairs that we tested.
a, Reproducibility of replicates for the combinatorial library in Fig. 3a. Correlations between bin frequencies of individual variant pairs measured in independent replicates. R2 values indicate the Pearson correlation coefficients, calculated using all data points (n = 210,319 variant pairs). b, Counts of cells sorted into each bin, for each replicate and growth condition. c, Functional PhoQ*–PhoP* variants that are orthogonal to wild-type PhoQ and PhoP. The fold-induction values, taken from the matrix in Fig. 3a, measured by sort-seq for the variant pairs indicated in each row, either together (far right column of the heat map) or with a wild-type protein (middle two columns), compared to the wild-type pair (far left column). d, Number of unique sets of various sizes of orthogonal PhoQ*–PhoP* pairs, for various thresholds of activity and crosstalk. e, Frequency of each PhoQ* (top) or PhoP* (bottom) variant within the orthogonal sets of various sizes (fold induction >15, crosstalk <6). f, Phosphatase- and kinase-locked variants of PhoQ were identified by fusing the catalytic DHp–CA domains to the leucine zipper GCN4 at different fusion sites (Methods). g, Phosphatase-locked PhoQ15* is sufficient to suppress nonspecific phosphotransfer from wild-type PhoQ to PhoP15*. h, Heat map as in Fig. 3a, but restricted to variants that retain an interaction (fold induction > 10) with wild-type PhoQ and PhoP, which are shown in red. i, j, Orthogonal sets of PhoQ* and PhoP* variants that, similar to the set in Fig. 3h, comprise exclusively proteins that retain interactions with the parent PhoQ and PhoP. k, Number of unique sets of various sizes of orthogonal PhoQ*–PhoP* pairs in which all variants retain an interaction (fold induction > 10) with wild-type PhoP and PhoQ.
Extended Data Fig. 9 Specificity residues of novel PhoQ*–PhoP* pathways are distinct from all extant two-component signalling interfaces.
a, Hamming distance was calculated between the 11 specificity residues of each PhoQ*–PhoP* variant pair, and the 11 specificity residues of wild-type PhoQ–PhoP, the two-component signalling paralogues in E. coli or all extant two-component signalling proteins. When comparing two sets, all distances between all members of both sets are plotted. b, Force-directed graph representing the sequence space of histidine kinases. Each node represents a single histidine kinase, with the relative positions reflecting the similarity of their interface residues (Methods). Coloured nodes highlight specific sets of kinases, as indicated in the legend.
a, Phosphotransfer profile of PhoQ4*. PhoQ4* was autophosphorylated and then incubated for 5 and 60 min with each of 27 response regulators from E. coli and with PhoP4*, as in Extended Data Fig. 4c. This experiment was repeated independently twice with similar results. b, c, RNA-seq analysis, as in Extended Data Fig. 7b, c, of strains that express AQ4* and either wild-type PhoP or PhoP4* (measured after 30 min induction with 0 mM or 1 mM trans-zeatin). b, The PhoP-regulated genes yfp, mgtL and mgrB are induced by trans-zeatin only when AQ4* is paired with PhoP4*. c, The expression change of all response regulators and histidine kinases (fold change in response to 10 mM Mg2+ or 1 mM trans-zeatin).
About this article
Cite this article
McClune, C.J., Alvarez-Buylla, A., Voigt, C.A. et al. Engineering orthogonal signalling pathways reveals the sparse occupancy of sequence space. Nature 574, 702–706 (2019). https://doi.org/10.1038/s41586-019-1639-8