The ontogeny of the human haematopoietic system during fetal development has previously been characterized mainly through careful microscopic observations1. Here we reconstruct a phylogenetic tree of blood development using whole-genome sequencing of 511 single-cell-derived haematopoietic colonies from healthy human fetuses at 8 and 18 weeks after conception, coupled with deep targeted sequencing of tissues of known embryonic origin. We found that, in healthy fetuses, individual haematopoietic progenitors acquire tens of somatic mutations by 18 weeks after conception. We used these mutations as barcodes and timed the divergence of embryonic and extra-embryonic tissues during development, and estimated the number of blood antecedents at different stages of embryonic development. Our data support a hypoblast origin of the extra-embryonic mesoderm and primitive blood in humans.
This is a preview of subscription content
Subscription info for Chinese customers
We have a dedicated website for our Chinese customers. Please go to naturechina.com to subscribe to this journal.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Whole genomes and targeted sequencing data have been deposited in the European Genome–phenome Archive (EGA) (https://www.ebi.ac.uk/ega/). WGS data have been deposited with EGA accession number EGAD00001006162 and targeted sequencing data have been deposited with accession number EGAD00001006118. Data from the EGA are accessible for research use only to all bona fide researchers, as assessed by the Data Access Committee (https://www.ebi.ac.uk/ega/about/access). Data can be accessed by registering for an EGA account and contacting the Data Access Committee. All laser-capture microdissection images are deposited on Mendeley Data (‘Phylogeny_of_foetal_haematopoiesis_2020_LCM_images’, at https://doi.org/10.17632/9b264dw38s.1), and are accessible without restriction. Laser-capture microdissection images can be viewed using the free software NDP.view 2. More extensive derived datasets are available, together with the analysis code, without restriction at https://github.com/mspencerchapman/Phylogeny_of_foetal_haematopoiesis. Source data are provided with this paper.
All scripts and some derived datasets are available at https://github.com/mspencerchapman/Phylogeny_of_foetal_haematopoiesis.
Ivanovs, A. et al. Human haematopoietic stem cell development: from the embryo to the dish. Development 144, 2323–2337 (2017).
OpenStax. Anatomy and Physiology (2016).
Luckett, W. P. Origin and differentiation of the yolk sac and extraembryonic mesoderm in presomite human and rhesus monkey embryos. Am. J. Anat. 152, 59–97 (1978).
Palis, J. & Yoder, M. C. Yolk-sac hematopoiesis: the first blood cells of mouse and man. Exp. Hematol. 29, 927–936 (2001).
Silver, L. & Palis, J. Initiation of murine embryonic erythropoiesis: a spatial analysis. Blood 89, 1154–1164 (1997).
Arnold, S. J. & Robertson, E. J. Making a commitment: cell lineage allocation and axis patterning in the early mouse embryo. Nat. Rev. Mol. Cell Biol. 10, 91–103 (2009).
Rossant, J. & Tam, P. P. L. New insights into early human development: lessons for stem cell derivation and differentiation. Cell Stem Cell 20, 18–28 (2017).
Enders, A. C. & King, B. F. Formation and differentiation of extraembryonic mesoderm in the rhesus monkey. Am. J. Anat. 181, 327–340 (1988).
Kelemen, E., Calvo, W. & Fliedner, T. M. Atlas of Human Hemopoietic Development (1979).
Charbord, P., Tavian, M., Humeau, L. & Péault, B. Early ontogeny of the human marrow from long bones: an immunohistochemical study of hematopoiesis and its microenvironment. Blood 87, 4109–4119 (1996).
Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).
Baron, C. S. & van Oudenaarden, A. Unravelling cellular relationships during development and regeneration using genetic lineage tracing. Nat. Rev. Mol. Cell Biol. 20, 753–765 (2019).
Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).
Bárcena, A. et al. Human placenta and chorion: potential additional sources of hematopoietic stem cells for transplantation. Transfusion 51, 94S–105S (2011).
Hodgkinson, A. & Eyre-Walker, A. Variation in the mutation rate across mammalian genomes. Nat. Rev. Genet. 12, 756–766 (2011).
Chapman, M. A. et al. Initial genome sequencing and analysis of multiple myeloma. Nature 471, 467–472 (2011).
Schuster-Böckler, B. & Lehner, B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature 488, 504–507 (2012).
Ju, Y. S. et al. Somatic mutations reveal asymmetric cellular dynamics in the early human embryo. Nature 543, 714–718 (2017).
Yoshida, K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020).
Ye, A. Y. et al. A model for postzygotic mosaicisms quantifies the allele fraction drift, mutation rate, and contribution to de novo mutations. Genome Res. 28, 943–951 (2018).
Schulz, K. N. & Harrison, M. M. Mechanisms regulating zygotic genome activation. Nat. Rev. Genet. 20, 221–234 (2019).
Molè, M. A., Weberling, A. & Zernicka-Goetz, M. Comparative analysis of human and mouse development: from zygote to pre-gastrulation. Curr. Top. Dev. Biol. 136, 113–138 (2020).
Xiang, L. et al. A developmental landscape of 3D-cultured human pre-gastrulation embryos. Nature 577, 537–542 (2020).
Kuruppumullage Don, P., Ananda, G., Chiaromonte, F. & Makova, K. D. Segmenting the human genome based on states of neutral genetic divergence. Proc. Natl Acad. Sci. USA 110, 14699–14704 (2013).
Wu, J. et al. Chromatin analysis in human early development reveals epigenetic transition during ZGA. Nature 557, 256–260 (2018).
Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).
Ellis, P. et al. Reliable detection of somatic mutations in solid tissues by laser-capture microdissection and low-input DNA sequencing. Nat. Protocols 16, 841–871 (2021).
Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).
Conway, T. et al. Xenome—a tool for classifying reads from xenograft samples. Bioinformatics 28, i172–i178 (2012).
Raine, K. M. et al. cgpPindel: identifying somatically acquired insertion and deletion events from paired end sequencing. Curr. Protoc. Bioinformatics 52, 15.7.1–15.7.12 (2015).
Jones, D. et al. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinformatics 56, 15.10.1–15.10.18 (2016).
Coorens, T. H. H. et al. Embryonal precursors of Wilms tumor. Science 366, 1247–1251 (2019).
Hoang, D. T. et al. MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation. BMC Evol. Biol. 18, 11 (2018).
The study was supported by European Research Council project 677501 – ZF_Blood (to A.C. and A.M.R.), an EMBO Young Investigator Award (to A.C.) and core support grants from the Wellcome Trust to the Wellcome Sanger Institute and both Wellcome and the MRC to the Wellcome Trust–Medical Research Council Cambridge Stem Cell Institute (203151/Z/16/Z). M.S.C. was supported by a Wellcome Clinical PhD Fellowship. We thank the Wellcome Sanger Institute (WSI) Cytometry Core Facility for their help with single-cell index sorting; the WSI DNA pipelines for their contribution to sequencing the data; the WSI Cancer, Ageing and Somatic Mutation programme IT and sample teams for their bioinformatic and logistical support; J. Eliasova (scientific illustrator, WSI) for her support and help with the illustrations; and the Human Developmental Biology Resource (HDBR) for providing samples.
P.J.C. is a co-founder and stock-holder in Mu Genomics Ltd. The other authors declare no competing interests.
Peer review information Nature thanks Fernando Camargo, Patrick Tam and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Fluorescence-activated cell sorting strategy for haematopoietic progenitor cells from fetal liver and bone marrow.
a, Following the exclusion of debris and cell doublets by gating and depletion of mature cells using a lineage antibody cocktail, we used anti-CD34 and anti-CD38 staining to sort single haematopoietic progenitor cells from the liver of an 8-pcw fetus. The sorting strategy for different stem and haematopoietic progenitor populations from matched liver and two femurs of a 18-pcw fetus. Following exclusion of debris and cell doublets and lineage depletion, anti-CD34, anti-CD38, anti-CD90, anti-CD45RA, anti-CD49f, anti-CD7, anti-CD10 and anti-CD123 staining was used to sort haematopoietic progenitor populations. b, Laser-capture microdissected tissues. Representative histological slides of tissue structures with different developmental origins. Between 9 and 18 sections were made for each tissue. Slides were stained with haematoxylin and eosin before microdissection. Structures microdissected from the 8-pcw sample included: mesodermal core and syncytiotrophoblast from the placenta, blood circulating in the heart, muscle from the heart, tubules from the kidney, epithelium from the gut, epidermis from the skin and vertebral disc from the vertebrae. Structures microdissected from the 18-pcw sample included: epidermis and peripheral nerves from the skin, muscle from the heart and glomeruli from the kidney. HSCs, haematopoietic stem cells; CMPs, common myeloid progenitors; MEPs, megakaryocyte–erythroid progenitors. n = 20,000 cells.
a, Box plot showing the percentage of human sequencing reads, before the exclusion of contaminating mouse reads from the feeder layer. The boxes indicate the median and interquartile range (IQR) and the whiskers extend to the largest and smallest values no more than 1.5× IQR from the box. Outlying points are plotted individually. b, Dot plot showing, for each colony of the two fetuses, the final sequencing coverage, after exclusion of mouse reads, against the percentage of human reads. The solid lines show the effect of human read percentage on the final sequencing coverage for each fetus, estimated using a linear model. The shaded area is the 95% confidence interval of this effect. c, Box plot showing final sample coverage for the two fetuses. The boxes indicate the median and IQR and the whiskers extend to the largest and smallest values no more than 1.5× IQR from the box. Outlying points are plotted individually. d, Dot plot showing the uncorrected SNV burden per colony against sample coverage (samples with <4× coverage are excluded). The solid lines show the effect of sequencing coverage on the uncorrected SNV burden for each fetus, estimated using a linear model. The shaded area is the 95% confidence interval of this effect. e, Dot plot showing the corrected SNV burden per colony against sequencing coverage (samples with <4× coverage excluded). The solid lines show the effect of sequencing coverage on the corrected SNV burden for each fetus, estimated using a linear model. The shaded area is the 95% confidence interval of this effect. f, Dot plot showing the uncorrected indel burden per colony against sequencing coverage (samples with <4× coverage excluded). The solid lines show the effect of sequencing coverage on the indel burden for each fetus, estimated using a linear model. The shaded area is the 95% confidence interval of this effect. g, ASCAT plot showing a normal male diploid karyotype for the 8-pcw and 18-pcw fetuses. h, Histograms showing variant allele fraction (VAF) of shared mutations with targeted sequencing depth ≥8× in the 8-pcw fetus. i, Histograms showing VAF of private mutations with targeted sequencing depth ≥ 8× in the 8-pcw fetus. j, Histograms showing VAF of shared mutations with targeted sequencing depth ≥ 8× in the 18-pcw fetus. k, Histograms showing VAF of shared mutations with targeted sequencing depth ≥ 8× in the 18-pcw fetus. l, Histograms showing SNV burden per single-HSPC derived colony for the 8-pcw and 18-pcw fetuses, corrected for both the proportion of private mutations that are acquired in vitro, and for sample sensitivity. m, Indel burden per single-HSPC derived colony for the 8-pcw and 18-pcw fetuses, with no correction applied. n, Mean numbers of private and shared SNVs per colony for each fetus.
a, Heat maps of the genotype data used for tree-building. Owing to the lower average coverage, there are more missing values in the 18-pcw data. Dendrograms are from hierarchical clustering of the data, and do not represent the phylogeny. b, Internal consistency of the shared mutation data for each fetus as determined by the disagreement score. A perfect phylogeny has a score of zero. Scores for the data are compared with scores for random shuffles of the genotype data at each locus. c, Comparison of the phylogenetic trees built by MPBoot and those by alternative phylogeny-inference algorithms IQTree and SCITE for both the 8-pcw and 18-pcw fetuses. Clades present in one phylogeny that are absent in the other are highlighted in red. The Robinson–Foulds distance of the alternative tree as compared with the MPBoot tree is shown. d, Robustness of each clade in three alternative bootstrapping approaches: bootstrapping of the raw sequencing read count data, bootstrapping of the mutation matrix and the bootstrap approximation method implemented with MPBoot. The proportion of bootstraps in which a clade is retained is shown, ordered by decreasing robustness. Clades from the first three generations—which are particularly important for later analysis—are highlighted in red. e, f, Comparison of the sequencing read count bootstrap trees to the original trees by the quartet divergence and Robinson–Foulds similarity for the 8-pcw (e) and 18-pcw (f) fetuses. g, Bar plot showing the relative contribution of the two daughter cells of the first detected cell division in 14 phylogenies obtained by human adult bronchial epithelial cells. Original data obtained from ref. 19. MRCA, most recent common ancestor; R-F, Robinson–Foulds.
Extended Data Fig. 4 Sensitivity and specificity of targeted sequencing mutation calling in fetal tissues.
a, For the 8-pcw fetus, the sensitivity for detection of shared (top) and private (bottom) mutations at different cell fractions in each tissue. As the prior expectation for the presence of mutations is dependent on the position within the phylogeny, the sensitivity is lower for private mutations. Solid lines indicate median values, and shaded areas show 95% confidence intervals from simulations (n = 1,000). b, As in a, but for the 18-pcw fetus. c, Comparison of the cell fractions of called mutations in the 8-pcw data, and false-positive calls simulated from the error distribution. The overall call rate across tissues is printed on each panel. The data are split by mutations occurring at different generations from the zygote. Owing to the high prior expectation, early mutations are frequently called from the error distribution, but at extremely low cell fractions that do not overlap those of the data. Later mutations have more overlap in the cell fractions of calls made from the data and the error distribution, but are rarely made from the error distribution. d, As in c, but for the 18-pcw fetus. Mutations on the minor branch are not included in this figure as none were called in the data and the generation of these mutations is uncertain.
a, Plot showing the proportion of variants mapping to different genomic features. The box-and-whisker plots show proportions of simulated mutations in different categories (n = 524), with the boxes indicating median and IQR and the whiskers denoting the range. b, Plot showing the proportion of variants within protein-coding sequences. The histograms show the distribution obtained by 500 simulations of random acquisition of mutations across the genome. c, Plot showing the proportion of variants within introns of protein-coding genes. The histograms show the distribution obtained by 500 simulations of random acquisition of mutations across the genome. d, Mutations in protein-coding sequences in the 18-pcw fetus, mapped on the branch of acquisition. Only those found in ≥2 colonies are shown. e, Mutations in protein-coding sequences in the 8-pcw fetus, mapped on the branch of acquisition. Only those found in ≥2 colonies are shown.
a–c, Mutational signatures incorporating the trinucleotide context of shared mutations (a), private mutations assigned to the clonal peak in the binomial mixture model (that is, likely in vivo-acquired mutations) (b) and private mutations assigned to subclonal peaks in the binomial mixture (that is, likely in vitro-acquired mutations (c). d, Mutations detected in WGS of two 8-pcw trophoblast microbiopsies.
Extended Data Fig. 7 Timing of divergence of embryonic and extra-embryonic lineages during development.
a, Heat map of the targeted sequencing data from the 8-pcw fetus. Each column and each row represents a single tissue. The colour shows the level of correlation. b, Density plots showing the distribution of variant allele fraction of mutations identified through WGS of specific microbiopsies. c, Heat map of the targeted sequencing data from the 18-pcw fetus. Each column represents an individual mutation, and each row a single tissue. The colour shows the variant allele fraction of the mutation in that tissue; grey indicates that the mutation was not detected. d, Heat map of the targeted sequencing data from the 18-pcw fetus. Each column and each row represents a single tissue. The colour shows the level of correlation. e, Line plot showing lineage loss of microdissected tissues in the 18-pcw fetus at different times (represented as cell generations from the zygote). Shaded areas represent confidence intervals. f, Phylogenetic tree showing the clonal relationships of 18-pcw HSPCs. Mutations identified in HSPCs by WGS that were also detected in non-haematopoietic tissues are coloured on the tree according to the earliest diverging tissue in which the mutation was reliably detected. Branch lengths are proportional to the number of mutations accumulated. The tree contains only those SNVs included in the bait set for targeted sequencing. In all heat maps, the tissues are clustered using soft cosine similarity (Methods). Vaf, variant allele fraction.
Extended Data Fig. 8 High level of intermixing of 18-pcw HSPCs among fetal liver and two sites of bone marrow.
a, Ultrametric phylogenetic tree of 18-pcw HSPCs with branches coloured by the tissue from which HSPCs bearing the mutations were isolated. Black branches indicate that cells were found in more than one tissue; coloured branches indicate cells were unique to one specific tissue. b, Analysis of molecular variance used to formally test for clustering on the phylogeny of HSPCs isolated from the same tissue. The histogram shows the null distribution used to detect clustering. Distributions were obtained by randomly permuting which cells were assigned to which tissue (n = 30,000). The observed value of the phi statistic is shown as a red line, with the P value indicating no statistically significant clustering by tissue. c, Bar plot showing the number of new mutations identified through WGS in the 18 microbiopsies with coverage >15×. d, Heat map showing the cell fractions of individual new mutations shared by more than one tissue, identified through WGS in the different tissues. e, Line plot showing the fraction of captured haematopoietic lineages shared by non-haematopoietic tissues in the 18-pcw phylogeny, grouped by germ layer and plotted over successive generations. f, g, Phylogenetic trees highlighting mutations detected in HSPCs that were shared with the ectoderm (f) and mesoderm (g) in the 18-pcw fetus. The fraction of cells deriving from each lineage is plotted on a pie chart. Trees were made ultrametric. VAF, variant allele fraction.
a, Potential outcomes of cell division of a multipotent antecedent observed in the HSPC phylogeny, b, Top of the 8-pcw phylogeny. Mutations detected in trophectoderm tissues are highlighted in red. Dark blue circles represent antecedent cells that are committed to the inner cell mass, but with a direct ancestor that was multipotent (contributing to both inner cell mass and trophectoderm). c, Top of the 8-pcw phylogeny. Mutations detected in extra-embryonic-mesoderm-derived tissues are highlighted in orange. Dark blue circles represent the 20 antecedent cells that are committed to the epiblast, but with a direct ancestor that was multipotent (contributing to both epiblast and hypoblast).
This file contains further detailed methods not covered in the main Methods section.
A list of antibodies used for the study, including clone and manufacturer.
Laser capture microdissection biopsies undergoing targeted sequencing for the 8 pcw foetus.
Laser capture microdissection biopsies undergoing targeted sequencing for the 18 pcw foetus.
Laser capture microdissection biopsies undergoing whole-genome sequencing (8 pcw foetus).
About this article
Cite this article
Spencer Chapman, M., Ranzoni, A.M., Myers, B. et al. Lineage tracing of human development through somatic mutations. Nature 595, 85–90 (2021). https://doi.org/10.1038/s41586-021-03548-6
Nature Methods (2021)