Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Lineage tracing of human development through somatic mutations

Abstract

The ontogeny of the human haematopoietic system during fetal development has previously been characterized mainly through careful microscopic observations1. Here we reconstruct a phylogenetic tree of blood development using whole-genome sequencing of 511 single-cell-derived haematopoietic colonies from healthy human fetuses at 8 and 18 weeks after conception, coupled with deep targeted sequencing of tissues of known embryonic origin. We found that, in healthy fetuses, individual haematopoietic progenitors acquire tens of somatic mutations by 18 weeks after conception. We used these mutations as barcodes and timed the divergence of embryonic and extra-embryonic tissues during development, and estimated the number of blood antecedents at different stages of embryonic development. Our data support a hypoblast origin of the extra-embryonic mesoderm and primitive blood in humans.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Experimental workflow and phylogenetic trees.
Fig. 2: Reconstructing lineage divergence through targeted sequencing.
Fig. 3: Timing of divergence of embryonic and extra-embryonic lineages during development.
Fig. 4: Representation of haematopoietic lineages in developmentally defined non-haematopoietic tissues.

Data availability

Whole genomes and targeted sequencing data have been deposited in the European Genome–phenome Archive (EGA) (https://www.ebi.ac.uk/ega/). WGS data have been deposited with EGA accession number EGAD00001006162 and targeted sequencing data have been deposited with accession number EGAD00001006118. Data from the EGA are accessible for research use only to all bona fide researchers, as assessed by the Data Access Committee (https://www.ebi.ac.uk/ega/about/access). Data can be accessed by registering for an EGA account and contacting the Data Access Committee. All laser-capture microdissection images are deposited on Mendeley Data (‘Phylogeny_of_foetal_haematopoiesis_2020_LCM_images’, at https://doi.org/10.17632/9b264dw38s.1), and are accessible without restriction. Laser-capture microdissection images can be viewed using the free software NDP.view 2. More extensive derived datasets are available, together with the analysis code, without restriction at https://github.com/mspencerchapman/Phylogeny_of_foetal_haematopoiesisSource data are provided with this paper.

Code availability

All scripts and some derived datasets are available at https://github.com/mspencerchapman/Phylogeny_of_foetal_haematopoiesis.

References

  1. 1.

    Ivanovs, A. et al. Human haematopoietic stem cell development: from the embryo to the dish. Development 144, 2323–2337 (2017).

    CAS  Article  Google Scholar 

  2. 2.

    OpenStax. Anatomy and Physiology (2016).

  3. 3.

    Luckett, W. P. Origin and differentiation of the yolk sac and extraembryonic mesoderm in presomite human and rhesus monkey embryos. Am. J. Anat. 152, 59–97 (1978).

    CAS  Article  Google Scholar 

  4. 4.

    Palis, J. & Yoder, M. C. Yolk-sac hematopoiesis: the first blood cells of mouse and man. Exp. Hematol. 29, 927–936 (2001).

    CAS  Article  Google Scholar 

  5. 5.

    Silver, L. & Palis, J. Initiation of murine embryonic erythropoiesis: a spatial analysis. Blood 89, 1154–1164 (1997).

    CAS  Article  Google Scholar 

  6. 6.

    Arnold, S. J. & Robertson, E. J. Making a commitment: cell lineage allocation and axis patterning in the early mouse embryo. Nat. Rev. Mol. Cell Biol. 10, 91–103 (2009).

    CAS  Article  Google Scholar 

  7. 7.

    Rossant, J. & Tam, P. P. L. New insights into early human development: lessons for stem cell derivation and differentiation. Cell Stem Cell 20, 18–28 (2017).

    CAS  Article  Google Scholar 

  8. 8.

    Enders, A. C. & King, B. F. Formation and differentiation of extraembryonic mesoderm in the rhesus monkey. Am. J. Anat. 181, 327–340 (1988).

    CAS  Article  Google Scholar 

  9. 9.

    Kelemen, E., Calvo, W. & Fliedner, T. M. Atlas of Human Hemopoietic Development (1979).

  10. 10.

    Charbord, P., Tavian, M., Humeau, L. & Péault, B. Early ontogeny of the human marrow from long bones: an immunohistochemical study of hematopoiesis and its microenvironment. Blood 87, 4109–4119 (1996).

    CAS  Article  Google Scholar 

  11. 11.

    Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).

    ADS  CAS  Article  Google Scholar 

  12. 12.

    Baron, C. S. & van Oudenaarden, A. Unravelling cellular relationships during development and regeneration using genetic lineage tracing. Nat. Rev. Mol. Cell Biol. 20, 753–765 (2019).

    CAS  Article  Google Scholar 

  13. 13.

    Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).

    ADS  CAS  Article  Google Scholar 

  14. 14.

    Bárcena, A. et al. Human placenta and chorion: potential additional sources of hematopoietic stem cells for transplantation. Transfusion 51, 94S–105S (2011).

    Article  Google Scholar 

  15. 15.

    Hodgkinson, A. & Eyre-Walker, A. Variation in the mutation rate across mammalian genomes. Nat. Rev. Genet. 12, 756–766 (2011).

    CAS  Article  Google Scholar 

  16. 16.

    Chapman, M. A. et al. Initial genome sequencing and analysis of multiple myeloma. Nature 471, 467–472 (2011).

    ADS  CAS  Article  Google Scholar 

  17. 17.

    Schuster-Böckler, B. & Lehner, B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature 488, 504–507 (2012).

    ADS  Article  Google Scholar 

  18. 18.

    Ju, Y. S. et al. Somatic mutations reveal asymmetric cellular dynamics in the early human embryo. Nature 543, 714–718 (2017).

    ADS  CAS  Article  Google Scholar 

  19. 19.

    Yoshida, K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020).

    ADS  CAS  Article  Google Scholar 

  20. 20.

    Ye, A. Y. et al. A model for postzygotic mosaicisms quantifies the allele fraction drift, mutation rate, and contribution to de novo mutations. Genome Res. 28, 943–951 (2018).

    CAS  Article  Google Scholar 

  21. 21.

    Schulz, K. N. & Harrison, M. M. Mechanisms regulating zygotic genome activation. Nat. Rev. Genet. 20, 221–234 (2019).

    CAS  Article  Google Scholar 

  22. 22.

    Molè, M. A., Weberling, A. & Zernicka-Goetz, M. Comparative analysis of human and mouse development: from zygote to pre-gastrulation. Curr. Top. Dev. Biol. 136, 113–138 (2020).

    Article  Google Scholar 

  23. 23.

    Xiang, L. et al. A developmental landscape of 3D-cultured human pre-gastrulation embryos. Nature 577, 537–542 (2020).

    CAS  Article  Google Scholar 

  24. 24.

    Kuruppumullage Don, P., Ananda, G., Chiaromonte, F. & Makova, K. D. Segmenting the human genome based on states of neutral genetic divergence. Proc. Natl Acad. Sci. USA 110, 14699–14704 (2013).

    ADS  Article  Google Scholar 

  25. 25.

    Wu, J. et al. Chromatin analysis in human early development reveals epigenetic transition during ZGA. Nature 557, 256–260 (2018).

    ADS  CAS  Article  Google Scholar 

  26. 26.

    Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).

    CAS  Article  Google Scholar 

  27. 27.

    Ellis, P. et al. Reliable detection of somatic mutations in solid tissues by laser-capture microdissection and low-input DNA sequencing. Nat. Protocols 16, 841–871 (2021).

    CAS  Article  Google Scholar 

  28. 28.

    Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).

    ADS  Article  Google Scholar 

  29. 29.

    Conway, T. et al. Xenome—a tool for classifying reads from xenograft samples. Bioinformatics 28, i172–i178 (2012).

    CAS  Article  Google Scholar 

  30. 30.

    Raine, K. M. et al. cgpPindel: identifying somatically acquired insertion and deletion events from paired end sequencing. Curr. Protoc. Bioinformatics 52, 15.7.1–15.7.12 (2015).

    Article  Google Scholar 

  31. 31.

    Jones, D. et al. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinformatics 56, 15.10.1–15.10.18 (2016).

    Article  Google Scholar 

  32. 32.

    Coorens, T. H. H. et al. Embryonal precursors of Wilms tumor. Science 366, 1247–1251 (2019).

    ADS  CAS  Article  Google Scholar 

  33. 33.

    Hoang, D. T. et al. MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation. BMC Evol. Biol. 18, 11 (2018).

    Article  Google Scholar 

Download references

Acknowledgements

The study was supported by European Research Council project 677501 – ZF_Blood (to A.C. and A.M.R.), an EMBO Young Investigator Award (to A.C.) and core support grants from the Wellcome Trust to the Wellcome Sanger Institute and both Wellcome and the MRC to the Wellcome Trust–Medical Research Council Cambridge Stem Cell Institute (203151/Z/16/Z). M.S.C. was supported by a Wellcome Clinical PhD Fellowship. We thank the Wellcome Sanger Institute (WSI) Cytometry Core Facility for their help with single-cell index sorting; the WSI DNA pipelines for their contribution to sequencing the data; the WSI Cancer, Ageing and Somatic Mutation programme IT and sample teams for their bioinformatic and logistical support; J. Eliasova (scientific illustrator, WSI) for her support and help with the illustrations; and the Human Developmental Biology Resource (HDBR) for providing samples.

Author information

Affiliations

Authors

Contributions

A.C., A.M.R. and P.J.C. conceived the study; A.M.R. performed all the experiments with help from B.M.; M.S.C. carried out the computational analysis of the WGS data, under the supervision of P.J.C.; Y.H. prepared all histology sections for microdissection; T.B. assisted with the targeted sequencing strategy; T.B. and P.S.R. assisted with the laser-capture microdissections; K.Y. helped with the analysis of the bronchial epithelial phylogeny data; E.H. and L.M. assisted with annotating histology slides; N.W., T.H.H.C. and E.M. assisted with somatic mutation calling from WGS data, the construction of the phylogeny and the assignment of mutations to the tree; N.W., J.N. and K.J.D. helped to design and implement population simulation models to help with interpreting the phylogenies; A.C., A.M.R. and M.S.C. designed the figures and wrote the manuscript with inputs from the other authors. All authors approved the final version of the manuscript.

Corresponding authors

Correspondence to Peter J. Campbell or Ana Cvejic.

Ethics declarations

Competing interests

P.J.C. is a co-founder and stock-holder in Mu Genomics Ltd. The other authors declare no competing interests.

Additional information

Peer review information Nature thanks Fernando Camargo, Patrick Tam and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Fluorescence-activated cell sorting strategy for haematopoietic progenitor cells from fetal liver and bone marrow.

a, Following the exclusion of debris and cell doublets by gating and depletion of mature cells using a lineage antibody cocktail, we used anti-CD34 and anti-CD38 staining to sort single haematopoietic progenitor cells from the liver of an 8-pcw fetus. The sorting strategy for different stem and haematopoietic progenitor populations from matched liver and two femurs of a 18-pcw fetus. Following exclusion of debris and cell doublets and lineage depletion, anti-CD34, anti-CD38, anti-CD90, anti-CD45RA, anti-CD49f, anti-CD7, anti-CD10 and anti-CD123 staining was used to sort haematopoietic progenitor populations. b, Laser-capture microdissected tissues. Representative histological slides of tissue structures with different developmental origins. Between 9 and 18 sections were made for each tissue. Slides were stained with haematoxylin and eosin before microdissection. Structures microdissected from the 8-pcw sample included: mesodermal core and syncytiotrophoblast from the placenta, blood circulating in the heart, muscle from the heart, tubules from the kidney, epithelium from the gut, epidermis from the skin and vertebral disc from the vertebrae. Structures microdissected from the 18-pcw sample included: epidermis and peripheral nerves from the skin, muscle from the heart and glomeruli from the kidney. HSCs, haematopoietic stem cells; CMPs, common myeloid progenitors; MEPs, megakaryocyte–erythroid progenitors. n = 20,000 cells.

Extended Data Fig. 2 Sample contamination and sequencing coverage.

a, Box plot showing the percentage of human sequencing reads, before the exclusion of contaminating mouse reads from the feeder layer. The boxes indicate the median and interquartile range (IQR) and the whiskers extend to the largest and smallest values no more than 1.5× IQR from the box. Outlying points are plotted individually. b, Dot plot showing, for each colony of the two fetuses, the final sequencing coverage, after exclusion of mouse reads, against the percentage of human reads. The solid lines show the effect of human read percentage on the final sequencing coverage for each fetus, estimated using a linear model. The shaded area is the 95% confidence interval of this effect. c, Box plot showing final sample coverage for the two fetuses. The boxes indicate the median and IQR and the whiskers extend to the largest and smallest values no more than 1.5× IQR from the box. Outlying points are plotted individually. d, Dot plot showing the uncorrected SNV burden per colony against sample coverage (samples with <4× coverage are excluded). The solid lines show the effect of sequencing coverage on the uncorrected SNV burden for each fetus, estimated using a linear model. The shaded area is the 95% confidence interval of this effect. e, Dot plot showing the corrected SNV burden per colony against sequencing coverage (samples with <4× coverage excluded). The solid lines show the effect of sequencing coverage on the corrected SNV burden for each fetus, estimated using a linear model. The shaded area is the 95% confidence interval of this effect. f, Dot plot showing the uncorrected indel burden per colony against sequencing coverage (samples with <4× coverage excluded). The solid lines show the effect of sequencing coverage on the indel burden for each fetus, estimated using a linear model. The shaded area is the 95% confidence interval of this effect. g, ASCAT plot showing a normal male diploid karyotype for the 8-pcw and 18-pcw fetuses. h, Histograms showing variant allele fraction (VAF) of shared mutations with targeted sequencing depth ≥8× in the 8-pcw fetus. i, Histograms showing VAF of private mutations with targeted sequencing depth ≥ 8× in the 8-pcw fetus. j, Histograms showing VAF of shared mutations with targeted sequencing depth ≥ 8× in the 18-pcw fetus. k, Histograms showing VAF of shared mutations with targeted sequencing depth ≥ 8× in the 18-pcw fetus. l, Histograms showing SNV burden per single-HSPC derived colony for the 8-pcw and 18-pcw fetuses, corrected for both the proportion of private mutations that are acquired in vitro, and for sample sensitivity. m, Indel burden per single-HSPC derived colony for the 8-pcw and 18-pcw fetuses, with no correction applied. n, Mean numbers of private and shared SNVs per colony for each fetus.

Source data

Extended Data Fig. 3 Benchmarking and validation of the phylogenetic trees.

a, Heat maps of the genotype data used for tree-building. Owing to the lower average coverage, there are more missing values in the 18-pcw data. Dendrograms are from hierarchical clustering of the data, and do not represent the phylogeny. b, Internal consistency of the shared mutation data for each fetus as determined by the disagreement score. A perfect phylogeny has a score of zero. Scores for the data are compared with scores for random shuffles of the genotype data at each locus. c, Comparison of the phylogenetic trees built by MPBoot and those by alternative phylogeny-inference algorithms IQTree and SCITE for both the 8-pcw and 18-pcw fetuses. Clades present in one phylogeny that are absent in the other are highlighted in red. The Robinson–Foulds distance of the alternative tree as compared with the MPBoot tree is shown. d, Robustness of each clade in three alternative bootstrapping approaches: bootstrapping of the raw sequencing read count data, bootstrapping of the mutation matrix and the bootstrap approximation method implemented with MPBoot. The proportion of bootstraps in which a clade is retained is shown, ordered by decreasing robustness. Clades from the first three generations—which are particularly important for later analysis—are highlighted in red. ef, Comparison of the sequencing read count bootstrap trees to the original trees by the quartet divergence and Robinson–Foulds similarity for the 8-pcw (e) and 18-pcw (f) fetuses. g, Bar plot showing the relative contribution of the two daughter cells of the first detected cell division in 14 phylogenies obtained by human adult bronchial epithelial cells. Original data obtained from ref. 19. MRCA, most recent common ancestor; R-F, Robinson–Foulds.

Source data

Extended Data Fig. 4 Sensitivity and specificity of targeted sequencing mutation calling in fetal tissues.

a, For the 8-pcw fetus, the sensitivity for detection of shared (top) and private (bottom) mutations at different cell fractions in each tissue. As the prior expectation for the presence of mutations is dependent on the position within the phylogeny, the sensitivity is lower for private mutations. Solid lines indicate median values, and shaded areas show 95% confidence intervals from simulations (n = 1,000). b, As in a, but for the 18-pcw fetus. c, Comparison of the cell fractions of called mutations in the 8-pcw data, and false-positive calls simulated from the error distribution. The overall call rate across tissues is printed on each panel. The data are split by mutations occurring at different generations from the zygote. Owing to the high prior expectation, early mutations are frequently called from the error distribution, but at extremely low cell fractions that do not overlap those of the data. Later mutations have more overlap in the cell fractions of calls made from the data and the error distribution, but are rarely made from the error distribution. d, As in c, but for the 18-pcw fetus. Mutations on the minor branch are not included in this figure as none were called in the data and the generation of these mutations is uncertain.

Source data

Extended Data Fig. 5 Genomic distribution of identified somatic mutations.

a, Plot showing the proportion of variants mapping to different genomic features. The box-and-whisker plots show proportions of simulated mutations in different categories (n = 524), with the boxes indicating median and IQR and the whiskers denoting the range. b, Plot showing the proportion of variants within protein-coding sequences. The histograms show the distribution obtained by 500 simulations of random acquisition of mutations across the genome. c, Plot showing the proportion of variants within introns of protein-coding genes. The histograms show the distribution obtained by 500 simulations of random acquisition of mutations across the genome. d, Mutations in protein-coding sequences in the 18-pcw fetus, mapped on the branch of acquisition. Only those found in ≥2 colonies are shown. e, Mutations in protein-coding sequences in the 8-pcw fetus, mapped on the branch of acquisition. Only those found in ≥2 colonies are shown.

Source data

Extended Data Fig. 6 Mutational signatures.

ac, Mutational signatures incorporating the trinucleotide context of shared mutations (a), private mutations assigned to the clonal peak in the binomial mixture model (that is, likely in vivo-acquired mutations) (b) and private mutations assigned to subclonal peaks in the binomial mixture (that is, likely in vitro-acquired mutations (c). d, Mutations detected in WGS of two 8-pcw trophoblast microbiopsies.

Source data

Extended Data Fig. 7 Timing of divergence of embryonic and extra-embryonic lineages during development.

a, Heat map of the targeted sequencing data from the 8-pcw fetus. Each column and each row represents a single tissue. The colour shows the level of correlation. b, Density plots showing the distribution of variant allele fraction of mutations identified through WGS of specific microbiopsies. c, Heat map of the targeted sequencing data from the 18-pcw fetus. Each column represents an individual mutation, and each row a single tissue. The colour shows the variant allele fraction of the mutation in that tissue; grey indicates that the mutation was not detected. d, Heat map of the targeted sequencing data from the 18-pcw fetus. Each column and each row represents a single tissue. The colour shows the level of correlation. e, Line plot showing lineage loss of microdissected tissues in the 18-pcw fetus at different times (represented as cell generations from the zygote). Shaded areas represent confidence intervals. f, Phylogenetic tree showing the clonal relationships of 18-pcw HSPCs. Mutations identified in HSPCs by WGS that were also detected in non-haematopoietic tissues are coloured on the tree according to the earliest diverging tissue in which the mutation was reliably detected. Branch lengths are proportional to the number of mutations accumulated. The tree contains only those SNVs included in the bait set for targeted sequencing. In all heat maps, the tissues are clustered using soft cosine similarity (Methods). Vaf, variant allele fraction.

Source data

Extended Data Fig. 8 High level of intermixing of 18-pcw HSPCs among fetal liver and two sites of bone marrow.

a, Ultrametric phylogenetic tree of 18-pcw HSPCs with branches coloured by the tissue from which HSPCs bearing the mutations were isolated. Black branches indicate that cells were found in more than one tissue; coloured branches indicate cells were unique to one specific tissue. b, Analysis of molecular variance used to formally test for clustering on the phylogeny of HSPCs isolated from the same tissue. The histogram shows the null distribution used to detect clustering. Distributions were obtained by randomly permuting which cells were assigned to which tissue (n = 30,000). The observed value of the phi statistic is shown as a red line, with the P value indicating no statistically significant clustering by tissue. c, Bar plot showing the number of new mutations identified through WGS in the 18 microbiopsies with coverage >15×. d, Heat map showing the cell fractions of individual new mutations shared by more than one tissue, identified through WGS in the different tissues. e, Line plot showing the fraction of captured haematopoietic lineages shared by non-haematopoietic tissues in the 18-pcw phylogeny, grouped by germ layer and plotted over successive generations. f, g, Phylogenetic trees highlighting mutations detected in HSPCs that were shared with the ectoderm (f) and mesoderm (g) in the 18-pcw fetus. The fraction of cells deriving from each lineage is plotted on a pie chart. Trees were made ultrametric. VAF, variant allele fraction.

Source data

Extended Data Fig. 9 Inference of lineages.

a, Potential outcomes of cell division of a multipotent antecedent observed in the HSPC phylogeny, b, Top of the 8-pcw phylogeny. Mutations detected in trophectoderm tissues are highlighted in red. Dark blue circles represent antecedent cells that are committed to the inner cell mass, but with a direct ancestor that was multipotent (contributing to both inner cell mass and trophectoderm). c, Top of the 8-pcw phylogeny. Mutations detected in extra-embryonic-mesoderm-derived tissues are highlighted in orange. Dark blue circles represent the 20 antecedent cells that are committed to the epiblast, but with a direct ancestor that was multipotent (contributing to both epiblast and hypoblast).

Supplementary information

Supplementary Methods

This file contains further detailed methods not covered in the main Methods section.

Reporting Summary

Peer Review File

Supplementary Table 1

A list of antibodies used for the study, including clone and manufacturer.

Supplementary Table 2

Laser capture microdissection biopsies undergoing targeted sequencing for the 8 pcw foetus.

Supplementary Table 3

Laser capture microdissection biopsies undergoing targeted sequencing for the 18 pcw foetus.

Supplementary Table 4

Laser capture microdissection biopsies undergoing whole-genome sequencing (8 pcw foetus).

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Spencer Chapman, M., Ranzoni, A.M., Myers, B. et al. Lineage tracing of human development through somatic mutations. Nature 595, 85–90 (2021). https://doi.org/10.1038/s41586-021-03548-6

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links