Mosaic loss of chromosome Y (LOY) in circulating white blood cells is the most common form of clonal mosaicism1,2,3,4,5, yet our knowledge of the causes and consequences of this is limited. Here, using a computational approach, we estimate that 20% of the male population represented in the UK Biobank study (n = 205,011) has detectable LOY. We identify 156 autosomal genetic determinants of LOY, which we replicate in 757,114 men of European and Japanese ancestry. These loci highlight genes that are involved in cell-cycle regulation and cancer susceptibility, as well as somatic drivers of tumour growth and targets of cancer therapy. We demonstrate that genetic susceptibility to LOY is associated with non-haematological effects on health in both men and women, which supports the hypothesis that clonal haematopoiesis is a biomarker of genomic instability in other tissues. Single-cell RNA sequencing identifies dysregulated expression of autosomal genes in leukocytes with LOY and provides insights into why clonal expansion of these cells may occur. Collectively, these data highlight the value of studying clonal mosaicism to uncover fundamental mechanisms that underlie cancer and other ageing-related diseases.
Subscription info for Chinese customers
We have a dedicated website for our Chinese customers. Please go to naturechina.com to subscribe to this journal.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
All data used in discovery analyses are available from the UK Biobank on request (https://www.ukbiobank.ac.uk).
All software used in this project is publicly available: BOLT-LMM (v.2.3.2), MoChA (v.1.0), LDSC (v.1.0), MAGENTA (v.2.4), GoShifter (v.1.0), g-chromVAR (v.0.3), SMR (v.0.712), GCTA (v.1.91.6beta), String (v.11.0), FUSION-TWAS (v.1.0), Cell Ranger (v.2.0.2), Seurat (v.2.3.1), FINEMAP (v.1.3) and METASOFT (v.2.0.1).
Jacobs, P. A., Brunton, M., Court Brown, W. M., Doll, R. & Goldstein, H. Change of human chromosome count distribution with age: evidence for a sex difference. Nature 197, 1080–1081 (1963).
Jacobs, P. A., Court Brown, W. M. & Doll, R. Distribution of human chromosome counts in relation to age. Nature 191, 1178–1180 (1961).
Zhou, W. et al. Mosaic loss of chromosome Y is associated with common variation near TCL1A. Nat. Genet. 48, 563–568 (2016).
Wright, D. J. et al. Genetic variants associated with mosaic Y chromosome loss highlight cell cycle genes and overlap with cancer susceptibility. Nat. Genet. 49, 674–679 (2017).
Forsberg, L. A., Gisselsson, D. & Dumanski, J. P. Mosaicism in health and disease — clones picking up speed. Nat. Rev. Genet. 18, 128–142 (2017).
Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).
Jacobs, K. B. et al. Detectable clonal mosaicism and its relationship to aging and cancer. Nat. Genet. 44, 651–658 (2012).
Zink, F. et al. Clonal hematopoiesis, with and without candidate driver mutations, is common in the elderly. Blood 130, 742–752 (2017).
Vattathil, S. & Scheet, P. Extensive hidden genomic mosaicism revealed in normal tissue. Am. J. Hum. Genet. 98, 571–578 (2016).
Loh, P.-R. et al. Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations. Nature 559, 350–355 (2018).
Forsberg, L. A. et al. Mosaic loss of chromosome Y in leukocytes matters. Nat. Genet. 51, 4–7 (2019).
Laurie, C. C. et al. Detectable clonal mosaicism from birth to old age and its relationship to cancer. Nat. Genet. 44, 642–650 (2012).
Machiela, M. J. et al. Characterization of large structural genetic mosaicism in human autosomes. Am. J. Hum. Genet. 96, 487–497 (2015).
Dumanski, J. P. et al. Smoking is associated with mosaic loss of chromosome Y. Science 347, 81–83 (2015).
Loftfield, E. et al. Predictors of mosaic chromosome Y loss and associations with mortality in the UK Biobank. Sci. Rep. 8, 12316 (2018).
Forsberg, L. A. et al. Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer. Nat. Genet. 46, 624–628 (2014).
Noveski, P. et al. Loss of Y chromosome in peripheral blood of colorectal and prostate cancer patients. PLoS One 11, e0146264 (2016).
Machiela, M. J. et al. Mosaic chromosome Y loss and testicular germ cell tumor risk. J. Hum. Genet. 62, 637–640 (2017).
Ganster, C. et al. New data shed light on Y-loss-related pathogenesis in myelodysplastic syndromes. Genes Chromosom. Cancer 54, 717–724 (2015).
Loftfield, E. et al. Mosaic Y loss is moderately associated with solid tumor risk. Cancer Res. 79, 461–466 (2019).
Persani, L. et al. Increased loss of the Y chromosome in peripheral blood cells in male patients with autoimmune thyroiditis. J. Autoimmun. 38, J193–J196 (2012).
Lleo, A. et al. Y chromosome loss in male patients with primary biliary cirrhosis. J. Autoimmun. 41, 87–91 (2013).
Grassmann, F. et al. Y chromosome mosaicism is associated with age-related macular degeneration. Eur. J. Hum. Genet. 27, 36–41 (2019).
Haitjema, S. et al. Loss of Y chromosome in blood is associated with major cardiovascular events during follow-up in men after carotid endarterectomy. Circ. Cardiovasc. Genet. 10, e001544 (2017).
Dumanski, J. P. et al. Mosaic loss of chromosome Y in blood is associated with Alzheimer disease. Am. J. Hum. Genet. 98, 1208–1219 (2016).
Ulirsch, J. C. et al. Interrogation of human hematopoiesis at single-cell and single-variant resolution. Nat. Genet. 51, 683–693 (2019).
Schmidt, M. K. et al. Age- and tumor subtype-specific breast cancer risk estimates for CHEK2*1100delC carriers. J. Clin. Oncol. 34, 2750–2760 (2016).
Wang, Z. et al. Imputation and subset-based association analysis across different cancer types identifies multiple independent risk loci in the TERT–CLPTM1L region on chromosome 5p15.33. Hum. Mol. Genet. 23, 6616–6633 (2014).
Day, F. R. et al. Large-scale genomic analyses link reproductive aging to hypothalamic signaling, breast cancer susceptibility and BRCA1-mediated DNA repair. Nat. Genet. 47, 1294–1303 (2015).
Titus, S. et al. Impairment of BRCA1-related DNA double-strand break repair leads to ovarian aging in mice and humans. Sci. Transl. Med. 5, 172ra21 (2013).
Laine, J., Künstle, G., Obata, T., Sha, M. & Noguchi, M. The protooncogene TCL1 is an Akt kinase coactivator. Mol. Cell 6, 395–407 (2000).
Hirota, T., Gerlich, D., Koch, B., Ellenberg, J. & Peters, J.-M. Distinct functions of condensin I and II in mitotic chromosome assembly. J. Cell Sci. 117, 6435–6445 (2004).
Petry, S. Mechanisms of mitotic spindle assembly. Annu. Rev. Biochem. 85, 659–683 (2016).
Godek, K. M., Kabeche, L. & Compton, D. A. Regulation of kinetochore-microtubule attachments through homeostatic control during mitosis. Nat. Rev. Mol. Cell Biol. 16, 57–64 (2015).
London, N. & Biggins, S. Signalling dynamics in the spindle checkpoint response. Nat. Rev. Mol. Cell Biol. 15, 736–747 (2014).
Cory, S. & Adams, J. M. The Bcl2 family: regulators of the cellular life-or-death switch. Nat. Rev. Cancer 2, 647–656 (2002).
Zaremba, T. et al. Poly(ADP-ribose) polymerase-1 (PARP-1) pharmacogenetics, activity and expression analysis in cancer patients and healthy volunteers. Biochem. J. 436, 671–679 (2011).
Bolcun-Filas, E., Rinaldi, V. D., White, M. E. & Schimenti, J. C. Reversal of female infertility by Chk2 ablation reveals the oocyte DNA damage checkpoint pathway. Science 343, 533–536 (2014).
Lin, W., Titus, S., Moy, F., Ginsburg, E. S. & Oktay, K. Ovarian aging in women with BRCA germline mutations. J. Clin. Endocrinol. Metab. 102, 3839–3847 (2017).
Weinberg-Shukron, A. et al. Essential role of BRCA2 in ovarian development and function. N. Engl. J. Med. 379, 1042–1049 (2018).
Xue, A. et al. Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. Nat. Commun. 9, 2941 (2018).
He, L. M. et al. Cyclin D2 protein stability is regulated in pancreatic β-cells. Mol. Endocrinol. 23, 1865–1875 (2009).
Bonnefond, A. et al. Association between large detectable clonal mosaicism and type 2 diabetes with vascular complications. Nat. Genet. 45, 1040–1043 (2013).
Case, L. K. et al. The Y chromosome as a regulatory element shaping immune cell transcriptomes and susceptibility to autoimmune disease. Genome Res. 23, 1474–1485 (2013).
Maan, A. A. et al. The Y chromosome: a blueprint for men’s health? Eur. J. Hum. Genet. 25, 1181–1188 (2017).
Diskin, S. J. et al. Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms. Nucleic Acids Res. 36, e126 (2008).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).
Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).
Day, F. R. et al. Causal mechanisms and balancing selection inferred from genetic associations with polycystic ovary syndrome. Nat. Commun. 6, 8464 (2015).
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
The UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).
Gudbjartsson, D. F. et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 47, 435–444 (2015).
Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 50, 390–400 (2018).
Trynka, G. et al. Disentangling the effects of colocalizing genomic annotations to functionally prioritize non-coding variants within complex-trait loci. Am. J. Hum. Genet. 97, 139–152 (2015).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
Võsa, U. et al. Unraveling the polygenic architecture of complex traits using blood eQTL meta-analysis. Preprint at bioRxiv https://doi.org/10.1101/447367 (2018).
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
Szklarczyk, D. et al. The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res. 45, D362–D368 (2017).
Segrè, A. V., Groop, L., Mootha, V. K., Daly, M. J. & Altshuler, D. Common inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits. PLoS Genet. 6, e1001058 (2010).
Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
Casper, J. et al. The UCSC Genome Browser database: 2018 update. Nucleic Acids Res. 46, D762–D769 (2018).
Machiela, M. J. & Chanock, S. J. LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics 31, 3555–3557 (2015).
Burgess, S., Dudbridge, F. & Thompson, S. G. Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods. Stat. Med. 35, 1880–1906 (2016).
Bowden, J., Davey Smith, G., Haycock, P. C. & Burgess, S. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 40, 304–314 (2016).
Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525 (2015).
Schumacher, F. R. et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nat. Genet. 50, 928–936 (2018).
Turnbull, C. et al. Variants near DMRT1, TERT and ATF7IP are associated with testicular germ cell cancer. Nat. Genet. 42, 604–607 (2010).
Litchfield, K. et al. Identification of 19 new risk loci and potential regulatory mechanisms influencing susceptibility to testicular germ cell tumor. Nat. Genet. 49, 1133–1140 (2017).
Scelo, G. et al. Genome-wide association study identifies multiple risk loci for renal cell carcinoma. Nat. Commun. 8, 15724 (2017).
He, Y. et al. Exploring causality in the association between circulating 25-hydroxyvitamin D and colorectal cancer risk: a large Mendelian randomisation study. BMC Med. 16, 142 (2018).
May-Wilson, S. et al. Pro-inflammatory fatty acid profile and colorectal cancer risk: a Mendelian randomisation analysis. Eur. J. Cancer 84, 228–238 (2017).
McKay, J. D. et al. Large-scale association analysis identifies new lung cancer susceptibility loci and heterogeneity in genetic susceptibility across histological subtypes. Nat. Genet. 49, 1126–1132 (2017).
Melin, B. S. et al. Genome-wide association study of glioma subtypes identifies specific differences in genetic susceptibility to glioblastoma and non-glioblastoma tumors. Nat. Genet. 49, 789–794 (2017).
Atkins, I. et al. Transcriptome-wide association study identifies new candidate susceptibility genes for glioma. Cancer Res. 79, 2065–2071 (2019).
Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017).
Milne, R. L. et al. Identification of ten variants associated with risk of estrogen-receptor-negative breast cancer. Nat. Genet. 49, 1767–1778 (2017).
Phelan, C. M. et al. Identification of 12 new susceptibility loci for different histotypes of epithelial ovarian cancer. Nat. Genet. 49, 680–691 (2017).
O’Mara, T. A. et al. Identification of nine new susceptibility loci for endometrial cancer. Nat. Commun. 9, 3166 (2018).
Law, P. J. et al. Genome-wide association analysis implicates dysregulation of immunity genes in chronic lymphocytic leukaemia. Nat. Commun. 8, 14175 (2017).
Southam, L. et al. Whole genome sequencing and imputation in isolated populations identify genetic associations with medically-relevant complex traits. Nat. Commun. 8, 15606 (2017).
Han, B. et al. A general framework for meta-analyzing dependent studies with overlapping subjects in association mapping. Hum. Mol. Genet. 25, 1857–1866 (2016).
Han, B. & Eskin, E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am. J. Hum. Genet. 88, 586–598 (2011).
This research was conducted using the UK Biobank Resource under applications 9905 and 19808. The work was supported by the Medical Research Council (unit programme no. MC_UU_12015/2) and the European Research Council (ID no. 679744). J.R.B.P. is grateful to his incredible wife S. Perry, without whose unwavering support his contribution to this work would not be possible. Full study-specific and individual acknowledgements can be found in the Supplementary Information.
L.A.F. and J.P.D. are cofounders and shareholders in Cray Innovation AB.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Peer review information Nature thanks Don Conrad, Yasminka Jakubek, Paul Scheet and John Witte for their contribution to the peer review of this work.
Extended data figures and tables
The number of genotyped variants on each chromosome is used as a proxy measure for chromosome size.
Extended Data Fig. 2 Distribution of allele frequency and effect size for the 156 identified LOY loci.
Estimates of individual SNP effects are taken from the UK Biobank discovery sample.
Extended Data Fig. 3 Comparison of estimates of SNP β coefficients for the 156 LOY loci in discovery analyses including or excluding cancer cases.
Effect estimates were compared between a LOY discovery GWAS analysis either including cancer cases (n = 205,011 individuals analysed) or excluding cancer cases (n = 187,953 individuals analysed). The squared Pearson correlation coefficient (R2) is shown.
Plot shows the effect of sample size and the ratio of Y chromosome PAR1 to non-PAR on PAR-LOY power over mLRR-Y.
a–d, All analyses were performed on genome-wide summary statistic data from the UK Biobank discovery analysis (n = 205,011). Two-tailed P values for enrichment were calculated using GoShifter. a, The posterior expected number of causal variants (top), as well as the best fine-mapped variant (bottom) in each region. b, Genomic enrichments for variants, stratified by posterior probability (PP). Fine-mapped variants were enriched for accessible chromatin in haematopoiesis, as well as in exons, promoters and untranslated regions (UTRs) of protein-coding genes, but not for introns. c, Cell-type enrichments (from g-chromVAR analyses) across the haematopoietic tree for LOY. HSCs, multipotent progenitor cells (MPPs) and common myeloid progenitor cells (CMPs) meet the Bonferroni threshold (α = 0.05/18). CD4, CD4+ T cell; CD8, CD8+ T cell; CLP, common lymphoid progenitor cell; ery, erythrocyte; GMP, granulocyte–macrophage progenitor cell; gran, granulocyte; LMPP, lymphoid-primed multipotent progenitor cell; mDC, myeloid dendritic cell; mega, megakaryocyte; MEP, megakaryocyte–erythroid progenitor cell; mono, monocyte; NK, natural killer cell; pDC, plasmacytoid dendritic cell. d, Developmental patterns of accessible chromatin for variants with a posterior probability greater than 10% are shown, revealing that 14 variants are fully restricted to acting within HSPCs, 14 variants can also have regulatory effects in myeloid and lymphoid progenitors, and 17 variants are capable of acting across the majority of haematopoiesis. k-means clustering (k = 4 determined by the gap statistic) was used to identify patterns of accessibility, and cell types were hierarchically clustered. AC, accessible chromatin; M/L, myeloid and lymphoid.
Analyses performed using LDSC-SEG, with bar chart denoting statistical significance of observed positive enrichment. CNS, central nervous system.
a, Clustering and identification of cell types using a t-SNE plot generated from a pooled dataset of PBMCs (n = 86,160 cells) isolated from peripheral blood in 19 male donors. b, Expression of TCL1A (blue) in B lymphocytes. c, Analysis of LOY status in B lymphocytes identified 277 cells with LOY (red). d, Results from a resampling test that was performed to compare the expression of TCL1A in LOY (n = 277) and non-LOY (n = 2,459) B lymphocytes. The grey and red curves represent the resampled distribution of TCL1A expression in non-LOY and LOY cells, respectively. Expression of TCL1A was increased in the LOY B lymphocytes (fold change 1.68; two-sided P < 0.0001). e, Fold changes in gene expression between LOY and non-LOY B lymphocytes for 71 selected genes from the list of genes that mapped to the 156 index variants. Genes that were expressed in more than 5% of the investigated B lymphocytes were included. The solid blue line at a fold change of 1 represents no differential expression and the dashed red line represents 50% overexpression in LOY cells. H2AFY is also known as MACROH2A1; HMHA1 is also known as ARHGAP45.
Extended Data Fig. 8 Differential expression of TCL1A in B lymphocytes with and without LOY within individuals.
Error bars indicate the 95% confidence interval of the mean normalized expression of TCL1A within each group (n = 277 B lymphocytes for LOY; n = 2,459 for non-LOY). To avoid stochastic effects that might occur in estimations that use a small number of cells, results are shown for individuals with LOY in at least 10% of the B lymphocytes and with LOY in more than five individual B lymphocytes. Within each of the seven individuals (S1–S7) meeting this criteria, TCL1A showed a higher expression in the LOY cells compared to normal cells. This suggests that the observed TCL1A overexpression in B lymphocytes without a Y chromosome is independent of the individual genotypes at the lead GWAS SNP (rs2887399).
Extended Data Fig. 9 Many genes that are associated with LOY converge on mechanistic and regulatory aspects of the cell cycle.
All of the genes shown have been prioritized as potentially functional genes at our reported GWAS loci; gene symbols may be shown more than once. Coloured indicators next to each gene symbol specify the type of evidence on which it has been prioritized at its respective locus: blue, nearest protein-coding gene; green, eQTL; red, contains a highly correlated non-synonymous variant. Red boxes indicate each of the three known cell-cycle checkpoints. Red inhibition connectors denote that a target is inhibited by degradation; green that it is inhibited by binding. Green arrows indicate a signalling cascade and its effector or final physiological effect. Bidirectional dashed green arrows indicate the formation of a complex between the products of the two connected genes. With the exception of p53, proteins contained within green boxes have not been implicated in this GWAS, but are notable interactors of implicated genes. APC/C, anaphase-promoting complex/cyclosome; CDK, cyclin-dependent kinase; CENPA-NAC, CENPA nucleosome-associated complex; MC, mitotic checkpoint.
This file contains the consortium authorship and acknowledgements.
| Association statistics for the 156 mosaic LOY-associated index variants. Signals identified in the UK Biobank discovery analysis (N=205,011) with replication in up to 757,114 additional samples. All test statistics reported are two-sided p-values from linear or logistic regression models.
| All identified fine-mapped variants with a posterior probability > 0.01. All analyses based on genome-wide discovery dataset (N=205,011).
| All fine-mapped variants included within the same LD block (r2>0.9) as a fine mapped variant. All analyses based on genome-wide discovery dataset (N=205,011).
| Local permutation enrichments of fine-mapped variants Two-sided p-values for enrichment calculated using GoShifter.
| Cell-type enrichment analyses Performed using g-chromVAR using fine-mapped variants from the genome-wide discovery dataset (N=205,011).
| GTEx tissue enrichment analyses Statistics obtained from LDSC-SEG, run on the genome-wide discovery dataset (N=205,011).
| Epigenome roadmap enrichment analyses Statistics obtained from LDSC-SEG, run on the genome-wide discovery dataset (N=205,011).
| Gene annotation for the 156 mosaic LOY-associated index variants.
| Expression QTL analysis results using SMR All analyses based on genome-wide discovery dataset (N=205,011).
| Expression QTL analysis results using FUSION All analyses based on genome-wide discovery dataset (N=205,011).
| Association statistics for HLA imputed alleles All analyses based on genome-wide discovery dataset (N=205,011).
| STRING pathway analysis results using genes closest to identified lead variants.
| Global pathway results from MAGENTA analysis All analyses based on genome-wide discovery dataset (N=205,011).
| Overlap of the 156 mosaic LOY-associated variants with reported cancer susceptibility loci.
| Mendelian Randomization results for cancer associations All p-values are two-tailed and based on two-sample summary statistic analyses.
| Association statistics for the 156 mosaic LOY-associated variants on age at natural menopause (N=106,237 women) Menopause was analysed as a continuous trait using linear regression, reporting two-sided p-values.
About this article
Cite this article
Thompson, D.J., Genovese, G., Halvardson, J. et al. Genetic predisposition to mosaic Y chromosome loss in blood. Nature 575, 652–657 (2019). https://doi.org/10.1038/s41586-019-1765-3
Incident disease associations with mosaic chromosomal alterations on autosomes, X and Y chromosomes: insights from a phenome-wide association study in the UK Biobank
Cell & Bioscience (2021)
Scientific Reports (2021)
Nature Genetics (2021)