Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Biophysical prediction of protein–peptide interactions and signaling networks using machine learning


In mammalian cells, much of signal transduction is mediated by weak protein–protein interactions between globular peptide-binding domains (PBDs) and unstructured peptidic motifs in partner proteins. The number and diversity of these PBDs (over 1,800 are known), their low binding affinities and the sensitivity of binding properties to minor sequence variation represent a substantial challenge to experimental and computational analysis of PBD specificity and the networks PBDs create. Here, we introduce a bespoke machine-learning approach, hierarchical statistical mechanical modeling (HSM), capable of accurately predicting the affinities of PBD–peptide interactions across multiple protein families. By synthesizing biophysical priors within a modern machine-learning framework, HSM outperforms existing computational methods and high-throughput experimental assays. HSM models are interpretable in familiar biophysical terms at three spatial scales: the energetics of protein–peptide binding, the multidentate organization of protein–protein interactions and the global architecture of signaling networks.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: PBDs and modeling frameworks.
Fig. 2: Model performance and newly predicted PPIs.
Fig. 3: Predicted mechanisms for newly predicted interactions.
Fig. 4: Mechanistic analysis of SH3 domain binding.
Fig. 5: Energy surface of SH3–peptide cocomplex.
Fig. 6: Hierarchical organization of the human PBD-mediated PPI network.

Data availability

The domain–peptide and PPI predictions are made available through a custom website ( The protein–peptide interaction data are also made available in figshare with the identifiers Data used in training the model are available as Supplementary Dataset 2.

Code availability

All code and data used for training and testing HSM are available in a public repository at


  1. Gao, A. et al. Evolution of weak cooperative interactions for biological specificity. Proc. Natl Acad. Sci. USA 115, E11053–E11060 (2018).

    CAS  PubMed  Google Scholar 

  2. Perkins, J. R., Diboun, I., Dessailly, B. H., Lees, J. G. & Orengo, C. Transient protein–protein interactions: structural, functional, and network properties. Structure 18, 1233–1243 (2010).

    CAS  PubMed  Google Scholar 

  3. Mayer, B. J. The discovery of modular binding domains: building blocks of cell signalling. Nat. Rev. Mol. Cell Biol. 16, 691–698 (2015).

    CAS  PubMed  Google Scholar 

  4. Tompa, P., Davey, N. E., Gibson, T. J. & Babu, M. M. A million peptide motifs for the molecular biologist. Mol. Cell 55, 161–169 (2014).

    CAS  PubMed  Google Scholar 

  5. Scott, J. D. & Pawson, T. Cell signaling in space and time: where proteins come together and when they’re apart. Science 326, 1220–1224 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Cesareni, G., Gimona, M., Sudol, M. & Yaffe, M. Modular Protein Domains (John Wiley & Sons, 2006).

  7. Yang, F. et al. Protein domain-level landscape of cancer-type-specific somatic mutations. PLoS Comput. Biol. 11, e1004147 (2015).

    PubMed  PubMed Central  Google Scholar 

  8. Miller, M. L. et al. Pan-cancer analysis of mutation hotspots in protein domains. Cell Syst. 1, 197–209 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Rual, J.-F. et al. Towards a proteome-scale map of the human protein–protein interaction network. Nature 437, 1173–1178 (2005).

    CAS  PubMed  Google Scholar 

  10. Beck, M., Claassen, M. & Aebersold, R. Comprehensive proteomics. Curr. Opin. Biotechnol. 22, 3–8 (2011).

    CAS  PubMed  Google Scholar 

  11. Hein, M. Y. et al. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell 163, 712–723 (2015).

    CAS  PubMed  Google Scholar 

  12. Levinson, N. M., Seeliger, M. A., Cole, P. A. & Kuriyan, J. Structural basis for the recognition of c-Src by its inactivator Csk. Cell 134, 124–134 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Waksman, G., Shoelson, S. E., Pant, N., Cowburn, D. & Kuriyan, J. Binding of a high affinity phosphotyrosyl peptide to the Src SH2 domain: crystal structures of the complexed and peptide-free forms. Cell 72, 779–790 (1993).

    CAS  PubMed  Google Scholar 

  14. Demers, J.-P. & Mittermaier, A. Binding mechanism of an SH3 domain studied by NMR and ITC. J. Am. Chem. Soc. 131, 4355–4367 (2009).

    CAS  PubMed  Google Scholar 

  15. Tinti, M. et al. The SH2 domain interaction landscape. Cell Rep. 3, 1293–1305 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Hou, T., Chen, K., McLaughlin, W. A., Lu, B. & Wang, W. Computational analysis and prediction of the binding motif and protein interacting partners of the Abl SH3 domain. PLoS Comput. Biol. 2, e1 (2006).

    PubMed  PubMed Central  Google Scholar 

  17. Kundu, K., Mann, M., Costa, F. & Backofen, R. MoDPepInt: an interactive web server for prediction of modular domain–peptide interactions. Bioinformatics 30, 2668–2669 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Mignon, D., Panel, N., Chen, X., Fuentes, E. J. & Simonson, T. Computational design of the Tiam1 PDZ domain and its ligand binding. J. Chem. Theory Comput. 13, 2271–2289 (2017).

    CAS  PubMed  Google Scholar 

  19. Kaneko, T. et al. Loops govern SH2 domain specificity by controlling access to binding pockets. Sci. Signal 3, ra34 (2010).

    PubMed  PubMed Central  Google Scholar 

  20. AlQuraishi, M., Koytiger, G., Jenney, A., MacBeath, G. & Sorger, P. K. A multiscale statistical mechanical framework integrates biophysical and genomic data to assemble cancer networks. Nat. Genet. 46, 1363–1372 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Schroeder, D. V. An Introduction to Thermal Physics (Addison-Wesley, 2000).

  22. Goldstein, H., Poole Jr., C. P. & Safko, J. L. Classical Mechanics (Addison-Wesley, 2001).

  23. AlQuraishi, M. & McAdams, H. H. Direct inference of protein–DNA interactions using compressed sensing methods. Proc. Natl Acad. Sci. USA 108, 14819–14824 (2011).

    CAS  PubMed  Google Scholar 

  24. Zarrinpar, A., Bhattacharyya, R. P. & Lim, W. A. The structure and function of proline recognition domains. Sci. STKE 2003, re8 (2003).

    PubMed  Google Scholar 

  25. Denu, J. M. & Dixon, J. E. Protein tyrosine phosphatases: mechanisms of catalysis and regulation. Curr. Opin. Chem. Biol. 2, 633–641 (1998).

    CAS  PubMed  Google Scholar 

  26. Wagner, M. J., Stacey, M. M., Liu, B. A. & Pawson, T. Molecular mechanisms of SH2- and PTB-domain-containing proteins in receptor tyrosine kinase signaling. Cold Spring Harb. Perspect. Biol. 5, a008987 (2013).

    PubMed  PubMed Central  Google Scholar 

  27. Harris, B. Z. & Lim, W. A. Mechanism and role of PDZ domains in signaling complex assembly. J. Cell Sci. 114, 3219–3231 (2001).

    CAS  PubMed  Google Scholar 

  28. Kolodny, R., Koehl, P., Guibas, L. & Levitt, M. Small libraries of protein fragments model native protein structures accurately. J. Mol. Biol. 323, 297–307 (2002).

    CAS  PubMed  Google Scholar 

  29. Nepomnyachiy, S., Ben-Tal, N. & Kolodny, R. Global view of the protein universe. Proc. Natl Acad. Sci. USA 111, 11691–11696 (2014).

    CAS  PubMed  Google Scholar 

  30. Stormo, G. D., Schneider, T. D., Gold, L. & Ehrenfeucht, A. Use of the ‘Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res. 10, 2997–3011 (1982).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Miller, M. L. et al. Linear motif atlas for phosphorylation-dependent signaling. Sci. Signal 1, ra2 (2008).

    PubMed  PubMed Central  Google Scholar 

  32. Chatr-aryamontri, A. et al. The BioGRID interaction database: 2017 update. Nucleic Acids Res. 45, D369–D379 (2017).

    CAS  PubMed  Google Scholar 

  33. Orchard, S. et al. The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 42, D358–D363 (2014).

    CAS  PubMed  Google Scholar 

  34. Huttlin, E. L. et al. The bioplex network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Huttlin, E. L. et al. Architecture of the human interactome defines protein communities and disease networks. Nature 545, 505–509 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Rolland, T. et al. A proteome-scale map of the human interactome network. Cell 159, 1212–1226 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Yoo, J., Lee, T.-S., Choi, B., Shon, M. J. & Yoon, T.-Y. Observing extremely weak protein–protein interactions with conventional single-molecule fluorescence microscopy. J. Am. Chem. Soc. 138, 14238–14241 (2016).

    CAS  PubMed  Google Scholar 

  38. Lee, C. H. et al. A single amino acid in the SH3 domain of Hck determines its high affinity and specificity in binding to HIV-1 Nef protein. EMBO J. 14, 5006–5015 (1995).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Fernandez-Ballester, G., Blanes-Mira, C. & Serrano, L. The tryptophan switch: changing ligand-binding specificity from type I to type II in SH3 domains. J. Mol. Biol. 335, 619–629 (2004).

    CAS  PubMed  Google Scholar 

  40. Schmidt, H. et al. Solution structure of a Hck SH3 domain ligand complex reveals novel interaction modes. J. Mol. Biol. 365, 1517–1532 (2007).

    CAS  PubMed  Google Scholar 

  41. Teyra, J. et al. Comprehensive analysis of the human SH3 domain family reveals a wide variety of non-canonical specificities. Structure 25, 1598–1610.e3 (2017).

    CAS  PubMed  Google Scholar 

  42. Ma’ayan, A. et al. Formation of regulatory patterns during signal propagation in a mammalian cellular network. Science 309, 1078–1083 (2005).

    PubMed  PubMed Central  Google Scholar 

  43. Goodfellow, I, Bengio, Y. & Courville, A. Deep Learning. (MIT Press, 2016).

  44. Bengio, Y. Deep learning of representations for unsupervised and transfer learning. in Proc. ICML Workshop on Unsupervised and Transfer Learning Vol. 27 (eds Guyon, I. et al.) 17–36 (PMLR, 2012).

  45. Snell, J., Swersky, K. & Zemel, R. S. in Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.) 4077–4087 (Curran Associates, Inc., 2017).

  46. AlQuraishi, M. End-to-end differentiable learning of protein structure. Cell Syst. 8, 292–301.e3 (2019).

    CAS  PubMed  Google Scholar 

  47. Xu, J. Distance-based protein folding powered by deep learning. Proc. Natl Acad. Sci. USA 116, 16856–16865 (2019).

    CAS  PubMed  Google Scholar 

  48. Senior, A. W. et al. Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13). Proteins 87, 1141–1148 (2019).

    CAS  PubMed  Google Scholar 

  49. Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K. & Moult, J. Critical assessment of methods of protein structure prediction (CASP)—Round XIII. Proteins Struct. Funct. Bioinforma. 87, 1011–1020 (2019).

    CAS  Google Scholar 

  50. Wilson, D. et al. SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res. 37, D380–D386 (2009).

    CAS  PubMed  Google Scholar 

  51. Sokal, R. R. & Michener, C. D. A statistical method for evaluating relationships. Univ. Kans. Sci. Bull. 38, 1409–1448 (1958).

  52. DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988).

    CAS  Google Scholar 

  53. Hornbeck, P. V. et al. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 43, D512–D520 (2015).

    CAS  Google Scholar 

  54. Peng, J. & Xu, J. Raptorx: exploiting structure information for protein alignment by statistical inference. Proteins Struct. Funct. Bioinforma. 79, 161–171 (2011).

    CAS  Google Scholar 

  55. Dinkel, H. et al. ELM—the database of eukaryotic linear motifs. Nucleic Acids Res. 40, D242–D251 (2012).

    CAS  PubMed  Google Scholar 

  56. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. 57, 289–300 (1995).

    Google Scholar 

  57. Peixoto, T. P. The Graph-Tool Python Library (2017).

Download references


This work was funded by NIH grants (nos. U54-CA225088 and P50-GM107618) and by DARPA/DOD (grant no. W911NF-14-1-0397) to P.K.S.

Author information




J.M.C., P.K.S. and M.A. conceived and designed the model, analysis and computational experiments. J.M.C. implemented the model and carried out the analysis and experiments. G.K. collected and processed binding and structural data and contributed to the analysis. All authors wrote and reviewed the manuscript.

Corresponding authors

Correspondence to Peter K. Sorger or Mohammed AlQuraishi.

Ethics declarations

Competing interests

P.K.S. is a member of the SAB or Board of Directors of Merrimack Pharmaceutical, Glencoe Software, Applied Biomath and RareCyte Inc. and has equity in these companies. P.K.S. declares that none of these relationships are directly or indirectly related to the content of this manuscript.

Additional information

Peer review information Rita Strack was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–7, Tables 1–5 and Notes 1 and 2.

Reporting Summary

Supplementary Dataset 1

Domain sequences and multiple sequence alignments used in both HSM/D and HSM/P.

Supplementary Dataset 2

Raw domain-peptide training data used for training HSM/D.

Supplementary Dataset 3

Potential peptidic sites used in predictions generated with HSM/P.

Supplementary Dataset 4

Assessment of HSM/D relative to other domain models. Contains source data for Fig. 2a and Supplementary Fig. 3.

Supplementary Dataset 5

PyMOL structural data associated with analysing HSM/D inferred energy profiles. Includes source data for Figs. 4 and 5 and Supplementary Figs. 5 and 6.

Supplementary Dataset 6

Assessment of HSM/P. Contains source data for Figs. 2b, 3 and 6 and Supplementary Fig. 7.

Source data

Source Data Fig. 2

Source Data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cunningham, J.M., Koytiger, G., Sorger, P.K. et al. Biophysical prediction of protein–peptide interactions and signaling networks using machine learning. Nat Methods 17, 175–183 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing