Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Repertoire analyses reveal T cell antigen receptor sequence features that influence T cell fate


T cells acquire a regulatory phenotype when their T cell antigen receptors (TCRs) experience an intermediate- to high-affinity interaction with a self-peptide presented via the major histocompatibility complex (MHC). Using TCRβ sequences from flow-sorted human cells, we identified TCR features that promote regulatory T cell (Treg) fate. From these results, we developed a scoring system to quantify TCR-intrinsic regulatory potential (TiRP). When applied to the tumor microenvironment, TiRP scoring helped to explain why only some T cell clones maintained the conventional T cell (Tconv) phenotype through expansion. To elucidate drivers of these predictive TCR features, we then examined the two elements of the Treg TCR ligand separately: the self-peptide and the human MHC class II molecule. These analyses revealed that hydrophobicity in the third complementarity-determining region (CDR3β) of the TCR promotes reactivity to self-peptides, while TCR variable gene (TRBV gene) usage shapes the TCR’s general propensity for human MHC class II-restricted activation.

Your institute does not have access to this article

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Study design.
Fig. 2: TCR sequence structure.
Fig. 3: Broad differences exist between the TCRs of Treg cells and Tconv cells.
Fig. 4: Treg cells exhibit position-specific TCR sequence features.
Fig. 5: Treg TCR sequence biases replicate in independent cohorts.
Fig. 6: TiRP helps to explain clonal plasticity in the tumor microenvironment.
Fig. 7: Two axes of TCR-driven cell states.
Fig. 8: Isolating the drivers of TiRP.

Data availability

Data analyzed in this study were previously deposited in the following locations: immuneACCESS:,,, and; GEO: GSE158769, GSE123813 and GSE114724; GitHub:; Zenodo:; ArrayExpress: E-MTAB-8581; 10x Genomics:; McPAS-TCR: and VDJdb:

Code availability

Custom analysis scripts are available on GitHub (


  1. Jordan, M. S. et al. Thymic selection of CD4+CD25+ regulatory T cells induced by an agonist self-peptide. Nat. Immunol. 2, 301–306 (2001).

    CAS  PubMed  Google Scholar 

  2. Yun, T. J. & Bevan, M. J. The Goldilocks conditions applied to T cell development. Nat. Immunol. 2, 13–14 (2001).

    CAS  PubMed  Google Scholar 

  3. Sakaguchi, S., Yamaguchi, T., Nomura, T. & Ono, M. Regulatory T cells and immune tolerance. Cell 133, 775–787 (2008).

    CAS  PubMed  Google Scholar 

  4. Klein, L., Hinterberger, M., Wirnsberger, G. & Kyewski, B. Antigen presentation in the thymus for positive selection and central tolerance induction. Nat. Rev. Immunol. 9, 833–844 (2009).

    CAS  PubMed  Google Scholar 

  5. Romagnoli, P. & van Meerwijk, J. P. M. Thymic selection and lineage commitment of CD4+Foxp3+ regulatory T lymphocytes. Prog. Mol. Biol. Transl. Sci. 92, 251–277 (2010).

  6. Moran, A. E. et al. T cell receptor signal strength in Treg and iNKT cell development demonstrated by a novel fluorescent reporter mouse. J. Exp. Med. 208, 1279–1289 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Ohkura, N. et al. T cell receptor stimulation-induced epigenetic changes and Foxp3 expression are independent and complementary events required for Treg cell development. Immunity 37, 785–799 (2012).

    CAS  PubMed  Google Scholar 

  8. Li, M. O. & Rudensky, A. Y. T cell receptor signalling in the control of regulatory T cell differentiation and function. Nat. Rev. Immunol. 16, 220–233 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Sidwell, T. et al. Attenuation of TCR-induced transcription by Bach2 controls regulatory T cell differentiation and homeostasis. Nat. Commun. 11, 252 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Bolotin, D. A. et al. Antigen receptor repertoire profiling from RNA-seq data. Nat. Biotechnol. 35, 908–911 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Seay, H. R. et al. Tissue distribution and clonal diversity of the T and B cell repertoire in type 1 diabetes. JCI Insight 1, e88242 (2016).

    PubMed  PubMed Central  Google Scholar 

  12. Gomez-Tourino, I., Kamra, Y., Baptista, R., Lorenc, A. & Peakman, M. T cell receptor β-chains display abnormal shortening and repertoire sharing in type 1 diabetes. Nat. Commun. 8, 1792 (2017).

    PubMed  PubMed Central  Google Scholar 

  13. Park, J.-E. et al. A cell atlas of human thymic development defines T cell repertoire formation. Science 367, eaay3224 (2020).

  14. Khosravi-Maharlooei, M. et al. Cross-reactive public TCR sequences undergo positive selection in the human thymic repertoire. J. Clin. Invest. 129, 2446–2462 (2019).

    PubMed  PubMed Central  Google Scholar 

  15. Joller, N. & Kuchroo, V. Good guys gone bad: exTreg cells promote autoimmune arthritis. Nat. Med. 20, 15–17 (2014).

    CAS  PubMed  Google Scholar 

  16. Sharon, E. et al. Genetic variation in MHC proteins is associated with T cell receptor expression biases. Nat. Genet. 48, 995–1002 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Reche, P. A. & Reinherz, E. L. Sequence variability analysis of human class I and class II MHC molecules: functional and structural correlates of amino acid polymorphisms. J. Mol. Biol. 331, 623–641 (2003).

    CAS  PubMed  Google Scholar 

  18. Stadinski, B. D. et al. Hydrophobic CDR3 residues promote the development of self-reactive T cells. Nat. Immunol. 17, 946–955 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Azizi, E. et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell 174, 1293–1308 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Yost, K. E. et al. Clonal replacement of tumor-specific T cells following PD-1 blockade. Nat. Med. 25, 1251–1259 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Samstein, R. M., Josefowicz, S. Z., Arvey, A., Treuting, P. M. & Rudensky, A. Y. Extrathymic generation of regulatory T cells in placental mammals mitigates maternal-fetal conflict. Cell 150, 29–38 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Cebula, A. et al. Thymus-derived regulatory T cells contribute to tolerance to commensal microbiota. Nature 497, 258–262 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Zhou, X. et al. Instability of the transcription factor Foxp3 leads to the generation of pathogenic memory T cells in vivo. Nat. Immunol. 10, 1000–1007 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Setoguchi, R., Hori, S., Takahashi, T. & Sakaguchi, S. Homeostatic maintenance of natural Foxp3+CD25+CD4+ regulatory T cells by interleukin (IL)-2 and induction of autoimmune disease by IL-2 neutralization. J. Exp. Med. 201, 723–735 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Komatsu, N. et al. Pathogenic conversion of Foxp3+ T cells into TH17 cells in autoimmune arthritis. Nat. Med. 20, 62–68 (2014).

    CAS  PubMed  Google Scholar 

  26. Zemmour, D. et al. Single-cell gene expression reveals a landscape of regulatory T cell phenotypes shaped by the TCR. Nat. Immunol. 19, 291–301 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Kang, J. B. et al. Efficient and precise single-cell reference atlas mapping with Symphony. Nat. Commun. 12, 5890 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Nathan, A. et al. Multimodally profiling memory T cells from a tuberculosis cohort identifies cell state associations with demographics, environment and disease. Nat. Immunol. 22, 781–793 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Jorgensen, J. L., Esser, U., Fazekas de St Groth, B., Reay, P. A. & Davis, M. M. Mapping T-cell receptor–peptide contacts by variant peptide immunization of single-chain transgenics. Nature 355, 224–230 (1992).

    CAS  PubMed  Google Scholar 

  30. Garcia, K. C. et al. An αβ T cell receptor structure at 2.5 Å and its orientation in the TCR–MHC complex. Science 274, 209–219 (1996).

    CAS  PubMed  Google Scholar 

  31. Thornton, A. M. et al. Helios+ and Helios Treg subpopulations are phenotypically and functionally distinct and express dissimilar TCR repertoires. Eur. J. Immunol. 49, 398–412 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Soto, C. et al. High frequency of shared clonotypes in human T cell receptor repertoires. Cell Rep. 32, 107882 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Tickotsky, N., Sagiv, T., Prilusky, J., Shifrut, E. & Friedman, N. McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences. Bioinformatics 33, 2924–2929 (2017).

    CAS  PubMed  Google Scholar 

  34. Shugay, M. et al. VDJdb: a curated database of T-cell receptor sequences with known antigen specificity. Nucleic Acids Res. 46, D419–D427 (2018).

    CAS  PubMed  Google Scholar 

  35. Lee, Y. K., Mukasa, R., Hatton, R. D. & Weaver, C. T. Developmental plasticity of TH17 and Treg cells. Curr. Opin. Immunol. 21, 274–280 (2009).

    CAS  PubMed  Google Scholar 

  36. Daley, S. R. et al. Cysteine and hydrophobic residues in CDR3 serve as distinct T-cell self-reactivity indices. J. Allergy Clin. Immunol. 144, 333–336 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Košmrlj, A., Jha, A. K., Huseby, E. S., Kardar, M. & Chakraborty, A. K. How the thymus designs antigen-specific and self-tolerant T cell receptor sequences. Proc. Natl Acad. Sci. USA 105, 16671–16676 (2008).

    PubMed  PubMed Central  Google Scholar 

  38. Miyazawa, S. & Jernigan, R. L. Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules 18, 534–552 (1985).

    CAS  Google Scholar 

  39. Witten, I. H. & Frank, E. Data Mining: Practical Machine Learning Tools and Techniques 2nd edn (Morgan Kaufmann, 2005).

  40. Shannon, C. E. & Weaver, W. The Mathematical Theory of Communication (University of Illinois Press, 1998).

  41. Ihara, S. Information Theory for Continuous Systems (World Scientific, 1993).

  42. Zarembka, P. & Harcourt Brace & Company (1993–1999). Frontiers in Econometrics (Academic Press, 1974).

  43. Fox, J. & Monette, G. Generalized collinearity diagnostics. J. Am. Stat. Assoc. 87, 178–183 (1992).

    Google Scholar 

  44. Wimley, W. C. & White, S. H. Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nat. Struct. Biol. 3, 842–848 (1996).

    CAS  PubMed  Google Scholar 

  45. Lide, D. R. CRC Handbook of Chemistry & Physics 72nd edn (CRC Press, 1991).

  46. Zamyatnin, A. A. Protein volume in solution. Prog. Biophys. Mol. Biol. 24, 107–123 (1972).

    CAS  PubMed  Google Scholar 

  47. Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. Schuldt, N. J. & Binstadt, B. A. Dual TCR T cells: identity crisis or multitaskers? J. Immunol. 202, 637–644 (2019).

    CAS  PubMed  Google Scholar 

Download references


We thank M.B. Brenner for helpful scientific conversations regarding this work. K.A.L. and J.B.K. are each supported by award number T32GM007753 from the National Institute of General Medical Sciences. A.N. is supported by award number T32AR007530 from the National Institute of Arthritis and Musculoskeletal and Skin Diseases. D.A.R. is supported by National Institutes of Health (NIH) NIAMS K08 AR072791 and a Career Award for Medical Sciences from the Burroughs Wellcome Fund. A.H.S. is supported by NIH P01 AI039671, P01 CA236749 and P01 AI108545. S.R. is supported by NIH grants U19-AI111224-01, P01AI148102-01A1, U01-HG009379-04S1, 1R01AR063759 and UH2-AR067677.

Author information

Authors and Affiliations



K.A.L., K.I. and S.R. conceived the study. K.A.L. performed computational analyses with support from J.B.K. and A.N. K.A.L., K.I., S.R., J.B.K., A.N, K.E.P., A.H.J., A.H.S. and D.A.R. contributed to data interpretation. K.A.L., K.E.P., K.I. and S.R. contributed to writing the manuscript. All authors reviewed the manuscript. K.I. and S.R. supervised the study.

Corresponding authors

Correspondence to Kazuyoshi Ishigaki or Soumya Raychaudhuri.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Immunology thanks the anonymous reviewers for their contribution to the peer review of this work. Zoltan Fehervari was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Mutual information structure of the TCRβ sequence.

(a) – (e) Heatmap depicting the mutual information structure of the CDR3β amino acid sequence for CDR3βs of length 12 (a), 13 (b), 14 (c), 16 (d), and 17(e) in the discovery dataset. The lower diagonal features normalized mutual information (NMI) between each pair of TCR positions, while the upper diagonal features the maximum mutual information achieved by conditioning on any other TCR position. NMI color scale for (a)-(e) is provided in (a). (f) Probability of each amino acid in each TCR position depicted by a sequence logo. (g) Heatmap as in (a) – (e) for CDR1β and CDR2β loop positions as well as TCR features derived from the flanking regions of CDR3β (Methods). (h) Categorization of amino acids by isoelectric point and interfacial hydrophobicity (Methods).

Extended Data Fig. 2 Consistency of TCR feature effects across individuals and clinical phenotypes.

(a) Treg odds ratio per standard deviation increase in CDR3βmr occupancy by each of the 14 relevant amino acids, estimated separately for the T1D cases in the discovery cohort (y axis) and the controls (x axis) (b) Treg odds ratio per standard deviation increase in CDR3βmr occupancy by each of the 15 relevant amino acids, estimated separately in each donor. (c) Treg odds ratio for the usage of each TRBV gene relative to the reference gene TRBV05-01, estimated separately for the T1D cases in the discovery cohort (y axis) and the controls (x axis) (d) Treg odds ratio for the usage of each TRBV gene relative to the reference gene TRBV05-01, estimated separately in each donor. P values in (a) and (c) are calculated by a two-sided t-test with Fischer transformation on Pearson’s R.

Extended Data Fig. 3 Multicollinearity analysis.

(a)-(c) Maximum Pearson’s correlation observed between each pair of TCR features in the discovery dataset, for all possible combinations of amino acid-based TCR feature values (Methods). Heatmaps are separated by TCR region: (a) CDR3βmr, (b) TRBV-encoded (CDR1β loop, CDR2β loop, and the V-region of CDR3β) and, (c) TRBJ-encoded. (d) Feature selection for the V-region model based on variance inflation in estimated regression coefficients (Methods); each plot represents a candidate mixed effects logistic regression model jointly modeling the effects of TCR features on the x-axis. Black arrow denotes improvement from the first model to the second model via reduction of the variance inflation factor (VIF). Black horizontal line denotes the ideal VIF: zero inflation compared to a model with uncorrelated features. (e) Same as (d), for candidate J-region models.

Extended Data Fig. 4 Thymic selection rates for TRBV and TRBJ genes.

Thymic selection rates for each TRBV and TRBJ gene in each donor in the discovery cohort and in a reference cohort of 666 healthy donors, inferred by relative gene usage in productive reads versus nonproductive reads (Supplementary Note).

Extended Data Fig. 5 Estimated effects of physicochemical features at each TCRβ position, stratified by CDR3β length.

(a) Estimated log odds ratio for Treg fate per standard deviation of each physicochemical feature at each CDRβ(1-3) loop position in each CDR3β length; features with an estimate > 0 are positively associated with Treg fate while features with an estimate < 0 are negatively associated. For each CDR3β length, all effects were estimated jointly in an L2-regularized logistic regression with a penalty weight tuned via 10-fold cross-validation (Methods). (b) Treg odds ratio per standard deviation increase in each physicochemical feature at each CDR3βmr position for each CDR3 length (Methods, Supplementary Table 9). Error bars denote 95% confidence interval for the estimated odds ratio.

Extended Data Fig. 6 Cell type identification for thymic T cells.

(a) scRNAseq thymic dataset13 cells arranged in a 2-dimensional embedding by UMAP and colored by normalized expression level of select transcripts; gray (low) to red (high). (b) Transcriptional cluster assignments (c) Average normalized expression of cell-type-relevant transcripts per cluster.

Extended Data Fig. 7 Cell type identification for tumor microenvironment T cells and reference T cells.

(a) Log-normalized CD8A, CD4 and FOXP3 mRNA expression in T cells from breast tumor biopsies in Azizi et al. 2018, organized into a 2-dimensional embedding by Uniform Maniform Approximation and Projection (UMAP). (b) Louvain clustering of breast tumor microenvironment T cells. Broad cell type labels are indicated for each cluster in the surrounding legend. (c) Levels of key surface proteins measured by CITE-seq in the CD4 + reference single cell dataset26 (low = purple, high = light green). Protein levels are normalized by the centered log-ratio (CLR) transformation (Methods). (d) LogCP10K-normalized expression levels of key mRNA transcripts in the CD4 + reference single cell dataset26 (low = purple, high = light green).

Extended Data Fig. 8 Symphony mapping details.

(a) Tumor microenvironment T cells mapped into the reference embedding by Symphony, colored by donor to reveal successful integration of donors. (b) same as (a), colored by cancer type to reveal successful integration of cohorts. (c) Tumor microenvironment T cells mapped into the reference embedding by Symphony, colored by cell types derived from internal clustering (by Yost et al. for the SCC and BCC samples, and as depicted in Extended Data Fig. 7a-b for the BRCA samples) to show the extent of concordance with Symphony’s cell type solutions. (d) same as (a), colored by the TiRP score of their TCR. TiRP is scaled such that 0 corresponds to the mean score and one unit corresponds to one standard deviation of held-out bulk sequencing TCRs (Fig. 5c). (e) FOXP3 expression differences between Tregs and Tconvs within mixed clones of three representative donor samples. Each mixed clone is represented by a line connecting the average FOXP3 expression of Tregs within the clone to the average FOXP3 expression of Tconvs within the clone. Each P value is computed by a two-sided paired t-test comparing the mean FOXP3 expression in Tregs to that in Tconvs within each mixed clone.

Extended Data Fig. 9 Further analysis of principal components, murine Tregs, and human memory Tconv.

(a) 67 samples from the replication cohort colored by donor ID and arranged by principal component space according to variation in TCR sequence feature frequencies. (b) Same as (a), colored by donor clinical phenotype. (c) Replication of CDR3βmr percent composition of amino acid effects in mice. Error bars correspond to 95% confidence intervals for ORs. Amino acids are colored by physicochemical categories defined in Extended Data Fig. 1h. (d) Lack of mouse-human correspondence for position-specific TCR feature effects. TCR features are colored by type; error bars denote OR 95% confidence intervals. Murine TRBV genes were mapped to their human homologs for comparison, only those with a human homolog are shown (Methods). (e) Mean TiRP component scores for CD4+ expanded pure Tconv, pure Treg, and mixed clones in the tumor microenvironment16,17. Error bars denote standard error of the mean. Tconv mTiRP compared to mixed clone mTiRP two-sided Wald test P = 2.9 × 10−4, all other comparisons nonsignificant. (f) Overall lack of correspondence between Treg-Tconv OR and memory-naïve OR for CDR3βmr percent composition of amino acids. Error bars correspond to 95% confidence intervals, and amino acids are colored by the scheme in (c). (g) Replication of memory Tconv – naive Tconv TRBV gene odds ratios in an independent dataset of sorted memory and naïve T cells from 4 healthy donors32. TRBV genes are colored by their Treg-Tconv odds ratios. For (c), (d), (f), and (h), R = Pearson’s correlation coefficient and P values are computed by a two-sided t-test with Fischer transformation. For (e)-(g), human Treg-Tconv ORs result from fixed-effect meta-analysis across the discovery and replication cohorts.

Extended Data Fig. 10 TiRP scoring of autoreactive T cell receptors.

TiRP scores of McPAS and VDJdb autoimmune TCRs (points) compared to memory Tconvs and Tregs from the replication dataset held out for testing (boxplots). Each point in the autoimmune category represents one TCR from McPAS or VDJdb, colored by disease Error bar denotes standard error of the mean TiRP for autoreactive TCRs, which is higher than reference memory Tconvs (P = 1.5 × 10−9, two-sided Wald test), but not significantly different from reference Tregs (P = 0.43, two-sided Wald test). Within each boxplot, the horizontal lines reflect the median, the top and bottom of each box reflect the interquartile range (IQR), and the whiskers reflect the maximum and minimum values within each grouping no further than 1.5 x IQR from the hinge. T1D = Type 1 Diabetes. CD = Celiac Disease. IBD = Inflammatory Bowel Disease. MS = Multiple Sclerosis.

Supplementary information

Supplementary Information

Supplementary discussion.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–16.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lagattuta, K.A., Kang, J.B., Nathan, A. et al. Repertoire analyses reveal T cell antigen receptor sequence features that influence T cell fate. Nat Immunol 23, 446–457 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing