T cells acquire a regulatory phenotype when their T cell antigen receptors (TCRs) experience an intermediate- to high-affinity interaction with a self-peptide presented via the major histocompatibility complex (MHC). Using TCRβ sequences from flow-sorted human cells, we identified TCR features that promote regulatory T cell (Treg) fate. From these results, we developed a scoring system to quantify TCR-intrinsic regulatory potential (TiRP). When applied to the tumor microenvironment, TiRP scoring helped to explain why only some T cell clones maintained the conventional T cell (Tconv) phenotype through expansion. To elucidate drivers of these predictive TCR features, we then examined the two elements of the Treg TCR ligand separately: the self-peptide and the human MHC class II molecule. These analyses revealed that hydrophobicity in the third complementarity-determining region (CDR3β) of the TCR promotes reactivity to self-peptides, while TCR variable gene (TRBV gene) usage shapes the TCR’s general propensity for human MHC class II-restricted activation.
Your institute does not have access to this article
Subscription info for Chinese customers
We have a dedicated website for our Chinese customers. Please go to naturechina.com to subscribe to this journal.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Data analyzed in this study were previously deposited in the following locations: immuneACCESS: https://doi.org/10.21417/B73S3K, https://doi.org/10.21417/B7C88S, https://doi.org/10.21417/AMT2019EJI, https://doi.org/10.21417/CS2020CR and https://doi.org/10.21417/B7001Z; GEO: GSE158769, GSE123813 and GSE114724; GitHub: https://github.com/aleksobrad/humanized-mouse-data; Zenodo: https://doi.org/10.5281/zenodo.3711134; ArrayExpress: E-MTAB-8581; 10x Genomics: https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCh38-2020-A.tar.gz; McPAS-TCR: http://friedmanlab.weizmann.ac.il/McPAS-TCR and VDJdb: https://vdjdb.cdr3.net.
Custom analysis scripts are available on GitHub (https://github.com/immunogenomics/TiRP).
Jordan, M. S. et al. Thymic selection of CD4+CD25+ regulatory T cells induced by an agonist self-peptide. Nat. Immunol. 2, 301–306 (2001).
Yun, T. J. & Bevan, M. J. The Goldilocks conditions applied to T cell development. Nat. Immunol. 2, 13–14 (2001).
Sakaguchi, S., Yamaguchi, T., Nomura, T. & Ono, M. Regulatory T cells and immune tolerance. Cell 133, 775–787 (2008).
Klein, L., Hinterberger, M., Wirnsberger, G. & Kyewski, B. Antigen presentation in the thymus for positive selection and central tolerance induction. Nat. Rev. Immunol. 9, 833–844 (2009).
Romagnoli, P. & van Meerwijk, J. P. M. Thymic selection and lineage commitment of CD4+Foxp3+ regulatory T lymphocytes. Prog. Mol. Biol. Transl. Sci. 92, 251–277 (2010).
Moran, A. E. et al. T cell receptor signal strength in Treg and iNKT cell development demonstrated by a novel fluorescent reporter mouse. J. Exp. Med. 208, 1279–1289 (2011).
Ohkura, N. et al. T cell receptor stimulation-induced epigenetic changes and Foxp3 expression are independent and complementary events required for Treg cell development. Immunity 37, 785–799 (2012).
Li, M. O. & Rudensky, A. Y. T cell receptor signalling in the control of regulatory T cell differentiation and function. Nat. Rev. Immunol. 16, 220–233 (2016).
Sidwell, T. et al. Attenuation of TCR-induced transcription by Bach2 controls regulatory T cell differentiation and homeostasis. Nat. Commun. 11, 252 (2020).
Bolotin, D. A. et al. Antigen receptor repertoire profiling from RNA-seq data. Nat. Biotechnol. 35, 908–911 (2017).
Seay, H. R. et al. Tissue distribution and clonal diversity of the T and B cell repertoire in type 1 diabetes. JCI Insight 1, e88242 (2016).
Gomez-Tourino, I., Kamra, Y., Baptista, R., Lorenc, A. & Peakman, M. T cell receptor β-chains display abnormal shortening and repertoire sharing in type 1 diabetes. Nat. Commun. 8, 1792 (2017).
Park, J.-E. et al. A cell atlas of human thymic development defines T cell repertoire formation. Science 367, eaay3224 (2020).
Khosravi-Maharlooei, M. et al. Cross-reactive public TCR sequences undergo positive selection in the human thymic repertoire. J. Clin. Invest. 129, 2446–2462 (2019).
Joller, N. & Kuchroo, V. Good guys gone bad: exTreg cells promote autoimmune arthritis. Nat. Med. 20, 15–17 (2014).
Sharon, E. et al. Genetic variation in MHC proteins is associated with T cell receptor expression biases. Nat. Genet. 48, 995–1002 (2016).
Reche, P. A. & Reinherz, E. L. Sequence variability analysis of human class I and class II MHC molecules: functional and structural correlates of amino acid polymorphisms. J. Mol. Biol. 331, 623–641 (2003).
Stadinski, B. D. et al. Hydrophobic CDR3 residues promote the development of self-reactive T cells. Nat. Immunol. 17, 946–955 (2016).
Azizi, E. et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell 174, 1293–1308 (2018).
Yost, K. E. et al. Clonal replacement of tumor-specific T cells following PD-1 blockade. Nat. Med. 25, 1251–1259 (2019).
Samstein, R. M., Josefowicz, S. Z., Arvey, A., Treuting, P. M. & Rudensky, A. Y. Extrathymic generation of regulatory T cells in placental mammals mitigates maternal-fetal conflict. Cell 150, 29–38 (2012).
Cebula, A. et al. Thymus-derived regulatory T cells contribute to tolerance to commensal microbiota. Nature 497, 258–262 (2013).
Zhou, X. et al. Instability of the transcription factor Foxp3 leads to the generation of pathogenic memory T cells in vivo. Nat. Immunol. 10, 1000–1007 (2009).
Setoguchi, R., Hori, S., Takahashi, T. & Sakaguchi, S. Homeostatic maintenance of natural Foxp3+CD25+CD4+ regulatory T cells by interleukin (IL)-2 and induction of autoimmune disease by IL-2 neutralization. J. Exp. Med. 201, 723–735 (2005).
Komatsu, N. et al. Pathogenic conversion of Foxp3+ T cells into TH17 cells in autoimmune arthritis. Nat. Med. 20, 62–68 (2014).
Zemmour, D. et al. Single-cell gene expression reveals a landscape of regulatory T cell phenotypes shaped by the TCR. Nat. Immunol. 19, 291–301 (2018).
Kang, J. B. et al. Efficient and precise single-cell reference atlas mapping with Symphony. Nat. Commun. 12, 5890 (2021).
Nathan, A. et al. Multimodally profiling memory T cells from a tuberculosis cohort identifies cell state associations with demographics, environment and disease. Nat. Immunol. 22, 781–793 (2021).
Jorgensen, J. L., Esser, U., Fazekas de St Groth, B., Reay, P. A. & Davis, M. M. Mapping T-cell receptor–peptide contacts by variant peptide immunization of single-chain transgenics. Nature 355, 224–230 (1992).
Garcia, K. C. et al. An αβ T cell receptor structure at 2.5 Å and its orientation in the TCR–MHC complex. Science 274, 209–219 (1996).
Thornton, A. M. et al. Helios+ and Helios– Treg subpopulations are phenotypically and functionally distinct and express dissimilar TCR repertoires. Eur. J. Immunol. 49, 398–412 (2019).
Soto, C. et al. High frequency of shared clonotypes in human T cell receptor repertoires. Cell Rep. 32, 107882 (2020).
Tickotsky, N., Sagiv, T., Prilusky, J., Shifrut, E. & Friedman, N. McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences. Bioinformatics 33, 2924–2929 (2017).
Shugay, M. et al. VDJdb: a curated database of T-cell receptor sequences with known antigen specificity. Nucleic Acids Res. 46, D419–D427 (2018).
Lee, Y. K., Mukasa, R., Hatton, R. D. & Weaver, C. T. Developmental plasticity of TH17 and Treg cells. Curr. Opin. Immunol. 21, 274–280 (2009).
Daley, S. R. et al. Cysteine and hydrophobic residues in CDR3 serve as distinct T-cell self-reactivity indices. J. Allergy Clin. Immunol. 144, 333–336 (2019).
Košmrlj, A., Jha, A. K., Huseby, E. S., Kardar, M. & Chakraborty, A. K. How the thymus designs antigen-specific and self-tolerant T cell receptor sequences. Proc. Natl Acad. Sci. USA 105, 16671–16676 (2008).
Miyazawa, S. & Jernigan, R. L. Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules 18, 534–552 (1985).
Witten, I. H. & Frank, E. Data Mining: Practical Machine Learning Tools and Techniques 2nd edn (Morgan Kaufmann, 2005).
Shannon, C. E. & Weaver, W. The Mathematical Theory of Communication (University of Illinois Press, 1998).
Ihara, S. Information Theory for Continuous Systems (World Scientific, 1993).
Zarembka, P. & Harcourt Brace & Company (1993–1999). Frontiers in Econometrics (Academic Press, 1974).
Fox, J. & Monette, G. Generalized collinearity diagnostics. J. Am. Stat. Assoc. 87, 178–183 (1992).
Wimley, W. C. & White, S. H. Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nat. Struct. Biol. 3, 842–848 (1996).
Lide, D. R. CRC Handbook of Chemistry & Physics 72nd edn (CRC Press, 1991).
Zamyatnin, A. A. Protein volume in solution. Prog. Biophys. Mol. Biol. 24, 107–123 (1972).
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
Schuldt, N. J. & Binstadt, B. A. Dual TCR T cells: identity crisis or multitaskers? J. Immunol. 202, 637–644 (2019).
We thank M.B. Brenner for helpful scientific conversations regarding this work. K.A.L. and J.B.K. are each supported by award number T32GM007753 from the National Institute of General Medical Sciences. A.N. is supported by award number T32AR007530 from the National Institute of Arthritis and Musculoskeletal and Skin Diseases. D.A.R. is supported by National Institutes of Health (NIH) NIAMS K08 AR072791 and a Career Award for Medical Sciences from the Burroughs Wellcome Fund. A.H.S. is supported by NIH P01 AI039671, P01 CA236749 and P01 AI108545. S.R. is supported by NIH grants U19-AI111224-01, P01AI148102-01A1, U01-HG009379-04S1, 1R01AR063759 and UH2-AR067677.
The authors declare no competing interests.
Peer review information
Nature Immunology thanks the anonymous reviewers for their contribution to the peer review of this work. Zoltan Fehervari was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
(a) – (e) Heatmap depicting the mutual information structure of the CDR3β amino acid sequence for CDR3βs of length 12 (a), 13 (b), 14 (c), 16 (d), and 17(e) in the discovery dataset. The lower diagonal features normalized mutual information (NMI) between each pair of TCR positions, while the upper diagonal features the maximum mutual information achieved by conditioning on any other TCR position. NMI color scale for (a)-(e) is provided in (a). (f) Probability of each amino acid in each TCR position depicted by a sequence logo. (g) Heatmap as in (a) – (e) for CDR1β and CDR2β loop positions as well as TCR features derived from the flanking regions of CDR3β (Methods). (h) Categorization of amino acids by isoelectric point and interfacial hydrophobicity (Methods).
(a) Treg odds ratio per standard deviation increase in CDR3βmr occupancy by each of the 14 relevant amino acids, estimated separately for the T1D cases in the discovery cohort (y axis) and the controls (x axis) (b) Treg odds ratio per standard deviation increase in CDR3βmr occupancy by each of the 15 relevant amino acids, estimated separately in each donor. (c) Treg odds ratio for the usage of each TRBV gene relative to the reference gene TRBV05-01, estimated separately for the T1D cases in the discovery cohort (y axis) and the controls (x axis) (d) Treg odds ratio for the usage of each TRBV gene relative to the reference gene TRBV05-01, estimated separately in each donor. P values in (a) and (c) are calculated by a two-sided t-test with Fischer transformation on Pearson’s R.
(a)-(c) Maximum Pearson’s correlation observed between each pair of TCR features in the discovery dataset, for all possible combinations of amino acid-based TCR feature values (Methods). Heatmaps are separated by TCR region: (a) CDR3βmr, (b) TRBV-encoded (CDR1β loop, CDR2β loop, and the V-region of CDR3β) and, (c) TRBJ-encoded. (d) Feature selection for the V-region model based on variance inflation in estimated regression coefficients (Methods); each plot represents a candidate mixed effects logistic regression model jointly modeling the effects of TCR features on the x-axis. Black arrow denotes improvement from the first model to the second model via reduction of the variance inflation factor (VIF). Black horizontal line denotes the ideal VIF: zero inflation compared to a model with uncorrelated features. (e) Same as (d), for candidate J-region models.
Thymic selection rates for each TRBV and TRBJ gene in each donor in the discovery cohort and in a reference cohort of 666 healthy donors, inferred by relative gene usage in productive reads versus nonproductive reads (Supplementary Note).
Extended Data Fig. 5 Estimated effects of physicochemical features at each TCRβ position, stratified by CDR3β length.
(a) Estimated log odds ratio for Treg fate per standard deviation of each physicochemical feature at each CDRβ(1-3) loop position in each CDR3β length; features with an estimate > 0 are positively associated with Treg fate while features with an estimate < 0 are negatively associated. For each CDR3β length, all effects were estimated jointly in an L2-regularized logistic regression with a penalty weight tuned via 10-fold cross-validation (Methods). (b) Treg odds ratio per standard deviation increase in each physicochemical feature at each CDR3βmr position for each CDR3 length (Methods, Supplementary Table 9). Error bars denote 95% confidence interval for the estimated odds ratio.
(a) scRNAseq thymic dataset13 cells arranged in a 2-dimensional embedding by UMAP and colored by normalized expression level of select transcripts; gray (low) to red (high). (b) Transcriptional cluster assignments (c) Average normalized expression of cell-type-relevant transcripts per cluster.
Extended Data Fig. 7 Cell type identification for tumor microenvironment T cells and reference T cells.
(a) Log-normalized CD8A, CD4 and FOXP3 mRNA expression in T cells from breast tumor biopsies in Azizi et al. 2018, organized into a 2-dimensional embedding by Uniform Maniform Approximation and Projection (UMAP). (b) Louvain clustering of breast tumor microenvironment T cells. Broad cell type labels are indicated for each cluster in the surrounding legend. (c) Levels of key surface proteins measured by CITE-seq in the CD4 + reference single cell dataset26 (low = purple, high = light green). Protein levels are normalized by the centered log-ratio (CLR) transformation (Methods). (d) LogCP10K-normalized expression levels of key mRNA transcripts in the CD4 + reference single cell dataset26 (low = purple, high = light green).
(a) Tumor microenvironment T cells mapped into the reference embedding by Symphony, colored by donor to reveal successful integration of donors. (b) same as (a), colored by cancer type to reveal successful integration of cohorts. (c) Tumor microenvironment T cells mapped into the reference embedding by Symphony, colored by cell types derived from internal clustering (by Yost et al. for the SCC and BCC samples, and as depicted in Extended Data Fig. 7a-b for the BRCA samples) to show the extent of concordance with Symphony’s cell type solutions. (d) same as (a), colored by the TiRP score of their TCR. TiRP is scaled such that 0 corresponds to the mean score and one unit corresponds to one standard deviation of held-out bulk sequencing TCRs (Fig. 5c). (e) FOXP3 expression differences between Tregs and Tconvs within mixed clones of three representative donor samples. Each mixed clone is represented by a line connecting the average FOXP3 expression of Tregs within the clone to the average FOXP3 expression of Tconvs within the clone. Each P value is computed by a two-sided paired t-test comparing the mean FOXP3 expression in Tregs to that in Tconvs within each mixed clone.
Extended Data Fig. 9 Further analysis of principal components, murine Tregs, and human memory Tconv.
(a) 67 samples from the replication cohort colored by donor ID and arranged by principal component space according to variation in TCR sequence feature frequencies. (b) Same as (a), colored by donor clinical phenotype. (c) Replication of CDR3βmr percent composition of amino acid effects in mice. Error bars correspond to 95% confidence intervals for ORs. Amino acids are colored by physicochemical categories defined in Extended Data Fig. 1h. (d) Lack of mouse-human correspondence for position-specific TCR feature effects. TCR features are colored by type; error bars denote OR 95% confidence intervals. Murine TRBV genes were mapped to their human homologs for comparison, only those with a human homolog are shown (Methods). (e) Mean TiRP component scores for CD4+ expanded pure Tconv, pure Treg, and mixed clones in the tumor microenvironment16,17. Error bars denote standard error of the mean. Tconv mTiRP compared to mixed clone mTiRP two-sided Wald test P = 2.9 × 10−4, all other comparisons nonsignificant. (f) Overall lack of correspondence between Treg-Tconv OR and memory-naïve OR for CDR3βmr percent composition of amino acids. Error bars correspond to 95% confidence intervals, and amino acids are colored by the scheme in (c). (g) Replication of memory Tconv – naive Tconv TRBV gene odds ratios in an independent dataset of sorted memory and naïve T cells from 4 healthy donors32. TRBV genes are colored by their Treg-Tconv odds ratios. For (c), (d), (f), and (h), R = Pearson’s correlation coefficient and P values are computed by a two-sided t-test with Fischer transformation. For (e)-(g), human Treg-Tconv ORs result from fixed-effect meta-analysis across the discovery and replication cohorts.
TiRP scores of McPAS and VDJdb autoimmune TCRs (points) compared to memory Tconvs and Tregs from the replication dataset held out for testing (boxplots). Each point in the autoimmune category represents one TCR from McPAS or VDJdb, colored by disease Error bar denotes standard error of the mean TiRP for autoreactive TCRs, which is higher than reference memory Tconvs (P = 1.5 × 10−9, two-sided Wald test), but not significantly different from reference Tregs (P = 0.43, two-sided Wald test). Within each boxplot, the horizontal lines reflect the median, the top and bottom of each box reflect the interquartile range (IQR), and the whiskers reflect the maximum and minimum values within each grouping no further than 1.5 x IQR from the hinge. T1D = Type 1 Diabetes. CD = Celiac Disease. IBD = Inflammatory Bowel Disease. MS = Multiple Sclerosis.
About this article
Cite this article
Lagattuta, K.A., Kang, J.B., Nathan, A. et al. Repertoire analyses reveal T cell antigen receptor sequence features that influence T cell fate. Nat Immunol 23, 446–457 (2022). https://doi.org/10.1038/s41590-022-01129-x