Understanding how gene regulatory networks control the progressive restriction of cell fates is a long-standing challenge. Recent advances in measuring gene expression in single cells are providing new insights into lineage commitment. However, the regulatory events underlying these changes remain unclear. Here we investigate the dynamics of chromatin regulatory landscapes during embryogenesis at single-cell resolution. Using single-cell combinatorial indexing assay for transposase accessible chromatin with sequencing (sci-ATAC-seq)1, we profiled chromatin accessibility in over 20,000 single nuclei from fixed Drosophila melanogaster embryos spanning three landmark embryonic stages: 2–4 h after egg laying (predominantly stage 5 blastoderm nuclei), when each embryo comprises around 6,000 multipotent cells; 6–8 h after egg laying (predominantly stage 10–11), to capture a midpoint in embryonic development when major lineages in the mesoderm and ectoderm are specified; and 10–12 h after egg laying (predominantly stage 13), when each of the embryo’s more than 20,000 cells are undergoing terminal differentiation. Our results show that there is spatial heterogeneity in the accessibility of the regulatory genome before gastrulation, a feature that aligns with future cell fate, and that nuclei can be temporally ordered along developmental trajectories. During mid-embryogenesis, tissue granularity emerges such that individual cell types can be inferred by their chromatin accessibility while maintaining a signature of their germ layer of origin. Analysis of the data reveals overlapping usage of regulatory elements between cells of the endoderm and non-myogenic mesoderm, suggesting a common developmental program that is reminiscent of the mesendoderm lineage in other species2,3,4. We identify 30,075 distal regulatory elements that exhibit tissue-specific accessibility. We validated the germ-layer specificity of a subset of these predicted enhancers in transgenic embryos, achieving an accuracy of 90%. Overall, our results demonstrate the power of shotgun single-cell profiling of embryos to resolve dynamic changes in the chromatin landscape during development, and to uncover the cis-regulatory programs of metazoan germ layers and cell types.
This is a preview of subscription content
Subscription info for Chinese customers
We have a dedicated website for our Chinese customers. Please go to naturechina.com to subscribe to this journal.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Cusanovich, D. A . et al. Multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348, 910–914 (2015)
Maduro, M. F ., Meneghini, M. D ., Bowerman, B ., Broitman-Maduro, G . & Rothman, J. H. Restriction of mesendoderm to a single blastomere by the combined action of SKN-1 and a GSK-3β homolog is mediated by MED-1 and -2 in C. elegans. Mol. Cell 7, 475–485 (2001).
Sethi, A. J ., Wikramanayake, R. M ., Angerer, R. C ., Range, R. C. & Angerer, L. M. Sequential signaling crosstalk regulates endomesoderm segregation in sea urchin embryos. Science 335, 590–593 (2012)
Rodaway, A. & Patient, R. Mesendoderm. An ancient germ layer? Cell 105, 169–172 (2001).
Thomas, S . et al. Dynamic reprogramming of chromatin accessibility during Drosophila embryo development. Genome Biol. 12, R43 (2011)
Bonn, S . et al. Tissue-specific analysis of chromatin state identifies temporal signatures of enhancer activity during embryonic development. Nat. Genet. 44, 148–156 (2012)
Kvon, E. Z . et al. Genome-scale functional characterization of Drosophila developmental enhancers in vivo. Nature 512, 91–95 (2014)
Gallo, S. M . et al. REDfly v3.0: toward a comprehensive database of transcriptional regulatory elements in Drosophila. Nucleic Acids Res. 39, D118–D123 (2011)
Frise, E ., Hammonds, A. S. & Celniker, S. E. Systematic image-driven analysis of the spatial Drosophila embryonic expression landscape. Mol. Syst. Biol. 6, 345 (2010)
Tomancak, P . et al. Systematic determination of patterns of gene expression during Drosophila embryogenesis. Genome Biol 3, research0088.1 (2002)
Bonn, S . et al. Cell type-specific chromatin immunoprecipitation from multicellular complex samples using BiTS-ChIP. Nat. Protoc. 7, 978–994 (2012)
Doe, C. Q. Temporal patterning in the Drosophila CNS. Annu. Rev. Cell Dev. Biol. 33, 219–240 (2017)
Ciglar, L. & Furlong, E. E. Conservation and divergence in developmental networks: a view from Drosophila myogenesis. Curr. Opin. Cell Biol. 21, 754–760 (2009)
Spahn, P . et al. Multiple regulatory safeguards confine the expression of the GATA factor serpent to the hemocyte primordium within the Drosophila mesoderm. Dev. Biol. 386, 272–279 (2014)
Reuter, R. The gene serpent has homeotic properties and specifies endoderm versus ectoderm within the Drosophila gut. Development 120, 1123–1135 (1994)
Cannavò, E . et al. Genetic variants regulating expression levels and isoform diversity during embryogenesis. Nature 541, 402–406 (2017)
Van Der Maaten, L. & Hinton, G. H. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Rodriguez, A. & Laio, A. Machine learning. Clustering by fast search and find of density peaks. Science 344, 1492–1496 (2014)
Qiu, X . et al. Reversed graph embedding resolves complex single-cell developmental trajectories. Nat. Methods 14, 979–982 (2017)
Lecuit, T ., Samanta, R. & Wieschaus, E. slam encodes a developmental regulator of polarized membrane growth during cleavage of the Drosophila embryo. Dev. Cell 2, 425–436 (2002)
Beiman, M ., Shilo, B. Z. & Volk, T. Heartless, a Drosophila FGF receptor homolog, is essential for cell migration and establishment of several mesodermal lineages. Genes Dev. 10, 2993–3002 (1996)
Okumura, T ., Matsumoto, A ., Tanimura, T . & Murakami, R. An endoderm-specific GATA factor gene, dGATAe, is required for the terminal differentiation of the Drosophila endoderm. Dev. Biol. 278, 576–586 (2005)
Clark, H. F . et al. Dachsous encodes a member of the cadherin superfamily that controls imaginal disc morphogenesis in Drosophila. Genes Dev. 9, 1530–1542 (1995)
Simcox, A. A. & Sang, J. H. When does determination occur in Drosophila embryos? Dev. Biol. 97, 212–221 (1983)
Tingvall, T. O ., Roos, E. & Engström, Y. The GATA factor serpent is required for the onset of the humoral immune response in Drosophila embryos. Proc. Natl Acad. Sci. USA 98, 3884–3888 (2001)
Cao, J . et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357, 661–667 (2017)
McKenna, A . et al. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science 353, aaf7907 (2016)
Raj, B . et al. Simultaneous single-cell profiling of lineages and cell types in the vertebrate brain by scGESTALT. Preprint at https://doi.org/10.1101/205534 (2017)
Karaiskos, N . et al. The Drosophila embryo at single-cell transcriptome resolution. Science 358, 194–199 (2017)
Frieda, K. L . et al. Synthetic recording and in situ readout of lineage information in single cells. Nature 541, 107–111 (2017)
Tomancak, P. et al. Global analysis of patterns of gene expression during Drosophila embryogenesis. Genome Biol. 8, R145 (2007)
Hammonds, A. S. et al. Spatial expression of transcription factors in Drosophila embryonic organ development. Genome Biol. 14, R140 (2013)
Sandmann, T., Jakobsen, J. S. & Furlong, E. E. ChIP-on-chip protocol for genome-wide analysis of transcription factor binding in Drosophila melanogaster embryos. Nat. Protoc. 1, 2839–2855 (2006)
Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013)
Amini, S. et al. Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet. 46, 1343–1349 (2014)
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012)
Fraley, C. & Raftery, A. E. Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002)
Fraley, C ., Raftery, A. E ., Murphy, T. B. & Scrucca, L. Version 4 for R: Normal Mixture Modeling for Model-Based Clustering, Classification, and Density Estimation Technical Report No. 597 (Department of Statistics, Univ. of Washington, 2012)
Zhang, Y. et al. Model-based analysis of ChIP–seq (MACS). Genome Biol. 9, R137 (2008)
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010)
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014)
Setty, M. & Leslie, C. S. SeqGL identifies context-dependent binding signals in genome-wide regulatory element maps. PLOS Comput. Biol. 11, e1004271 (2015)
Ghandi, M., Lee, D., Mohammad-Noori, M. & Beer, M. A. Enhanced regulatory sequence prediction using gapped k-mer features. PLOS Comput. Biol. 10, e1003711 (2014)
Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011)
Van Der Maaten, L. Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014)
Krijthe, J. H. Rtsne: t-distributed stochastic neighbor embedding using a Barnes–Hut implementation. https://github.com/jkrijthe/Rtsne (2015)
Pliner, H. et al. Chromatin accessibility dynamics of myogenesis at single cell resolution. Preprint at https://doi.org/10.1101/155473 (2017)
Rubin, G. M. & Spradling, A. C. Genetic transformation of Drosophila with transposable element vectors. Science 218, 348–353 (1982).
Bischof, J., Maeda, R. K., Hediger, M., Karch, F. & Basler, K. An optimized transgenesis system for Drosophila using germ-line-specific φC31 integrases. Proc. Natl Acad. Sci. USA 104, 3312–3317 (2007)
Furlong, E. E., Andersen, E. C., Null, B., White, K. P. & Scott, M. P. Patterns of gene expression during Drosophila mesoderm development. Science 293, 1629–1633 (2001)
Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods 9, 676–682 (2012)
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009)
Girardot, C., Scholtalbers, J., Sauer, S., Su, S. Y. & Furlong, E. E. Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers. BMC Bioinformatics 17, 419 (2016)
This work was technically supported by the EMBL Advanced Light Microscopy, Genomics and Flow Cytometry Facilities. We thank D. Prunkard and L. Gitari in the UW-Pathology Flow Cytometry Facility for their assistance with sorting, and all members of the Furlong and Shendure laboratories for discussions and comments. This work was financially supported by BMBF (TransDiag-2) funds to E.E.M.F., and NIH (DP1HG007811 and R01HG006283) and the Paul G. Allen Family Foundation funds to J.S. D.A.C. was partly supported by T32HL007828 from the National Heart, Lung, and Blood Institute. J.S. is a Howard Hughes Medical Institute Investigator.
L.C. and F.J.S. own shares in and are employed by Illumina, Inc. transduction.
Reviewer Information Nature thanks M. Bulyk, S. Gisselbrecht and B. Gottgens for their contribution to the peer review of this work.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
a, log10 counts of sci-ATAC-seq reads per barcode at each time point are bimodally distributed. A threshold of 500 reads was used to identify barcodes corresponding to valid cells versus background. b, Fragment size distribution at each time point is consistent with expected nucleosomal banding pattern of standard (bulk) ATAC-seq experiments. c, Violin plot for distribution of unique, mappable reads per cell at each time point (2–4 h, n = 8,024; 6–8 h, n = 7,880; 10–12 h, n = 7,181) plotted on a logarithmic scale. White point indicates median value, thick black line extends to 25th and 75th percentile, and thin black lines extend to most extreme values within 1.5 times the interquartile range of the median. The filled colour width represents a density estimate of the distribution of cells along the y axis. d, Fraction of previously characterized DHS covered in at least 10 cells upon sampling a given number of cells (solid lines) as compared to random genomic windows (dashed lines). e, An UpSet plot shows the degree to which the top 20,000 windows overlap between the three time points. Each bar shows the number of sites included in a specific intersection and the ‘peg board’ below shows which particular comparison is included in that bar. f, Bar plot of the number of sites identified as significantly open in each clade (1% FDR; grey bar, first cutoff) and the number of sites specific to that clade (orange bar, second cutoff). Overlaid on the barplot (purple points) is the fraction of sites passing the first cut-off that also pass the second cut-off (count of orange bar/count of grey bar).
Enrichment for tissue-of-expression information for characterized distal enhancers overlapping clade-specific peaks at 6–8 h (a) and 10–12 h (b). Each column represents a different clade and each row represents an annotation term assigned to tested enhancer elements. Shading indicates the odds ratio for the intersection of enhancers sharing a given annotation with clade-specific accessible sites. Shown are all categories in the top ten enrichments of any clade (enrichment scores capped at 15 for display) containing at least 35 known enhancer overlaps.
Extended Data Figure 3 Relationship between transcription-factor binding motifs and occupancy, and LSI clade-specific accessibility.
a–c, SeqGL was run on LSI clade-specific distal peaks at each time point to identify enriched sequence motifs. The top five most-enriched unique motifs for each clade are displayed. Coloured circles indicate the clade represented by each line. For the later time points (6–8 h and 10–12 h), blue is neurogenic ectoderm, yellow is non-neurogenic ectoderm, red is myogenic mesoderm and green is mesendoderm. The results show an enrichment of motifs for factors associated with early development at 2–4 h with more tissue-specific factor motifs (for example, mesodermal factor Mef2 or neural regulator Tramtrack) within germ-layer annotated clades at later stages of development. d–l, Using ChIP occupancy data (peaks) and transcription factor binding motifs compiled previously16, we scanned for all transcription factor motif instances under ChIP peaks from datasets spanning 6–8 h of development using FIMO. Aggregate read counts in 4-kb windows centred on each identified motif instance are shown for each of the four LSI clades at 6–8 h. Green, endoderm; red, myogenic mesoderm; yellow, non-neurogenic ectoderm; blue, neurogenic ectoderm. Light shading in the same colours indicates 95% confidence intervals. d–g, Aggregate plots for four ubiquitous transcription factors (BEAF32, CTCF, Pho, and Trl) at 6–8 h. h–l, Aggregate plots for mesodermal transcription factors (Bap, Lmd, Mef2, Tin, Twi) at 6–8 h.
In addition to processing data from each time point independently, data from all cells can be analysed together (with the caveat that time point and batch are confounded). Here, we show binarized, LSI-transformed and clustered count data for 2-kb windows across the genome for cells from all three time points (blue, 2–4 h; red, 6–8 h; orange, 10–12 h) processed together. The predominant pattern is one in which 2–4-h cells cluster separately from 6–8-h and 10–12-h cells. Cells from 6–8 h and 10–12 h are intermingled, clustering first (roughly) by germ layer of origin.
Extended Data Figure 5 Sex of individual cells identified by ratio of X-chromosome to autosomal reads.
Embryos at all stages consist of a mixture of male and female embryos (males, XY; females, XX). a, t-SNE plots of three time points from analysis in which sex chromosome sites were not excluded. Many clusters exhibited a bi-lobed structure, where each individual cluster was made up of two mirrored lobes (red circles identify one example of a bi-lobed cluster from each time point). This was most apparent at the 10–12-h time point. b, Histogram of the ratio of X-chromosome to autosomal reads in individual cells. To explore whether this bi-lobed structure was a function of sex biases in clustering, we attempted to sex individual cells. The ratio of X-chromosome to autosomal reads shows a bimodal distribution, as expected in a system with heterogametic (XY) males and no evidence of imprinting. The purple line marks the local minimum between the two peaks of the histograms. c, Initial t-SNE clusters coloured according to sex assignment. Red indicates female cells and blue indicates male cells. Colouring individual cells by their sex reveals that the bi-lobed architecture is largely driven by sex biases in clustering. d, After removing X-chromosomal reads, data were re-clustered and individual cells were recoloured according to the ratio of X-chromosome to autosomal reads (red, female; blue, male). The resulting clusters showed an approximately equal number of male and female cells except for clusters 1 and 10 at the 2–4-h time point.
a–c, t-SNE maps of cells at 2–4 h with colour representing either the Monocle-inferred pseudotime of each cell (a) or the ratio of reads per cell at enhancers active at different stages of development (b, c). Read counts within temporally characterized enhancers provide insight into the specific stage of development from which a cell is derived. Plotted here are ratios of counts in earlier versus later active enhancers showing a rough temporal progression from left to right that is also inferred by Monocle. d–f, Heat maps of sites that are significantly associated with pseudotime (based on a likelihood ratio test). For each site, a spline was fitted to the data across pseudotime. Sites (rows) were ordered for the heat maps based on the pseudotime at which they first reached half the maximum predicted accessibility from the fit curve. The colours indicate the spline-predicted accessibility across pseudotime, scaled as the fraction of the maximum accessibility for that site. g–i, Single-locus plots of the most significant closing, opening and transient sites. Histogram of percentage of cells in which the specified site is accessible in 10 bins across pseudotime, within the 2–4-h time point. The curve is from spline fit for accessibility in cells through pseudotime. j–l, As in g–i, examples of sites with lineage-specific association with pseudotime. One example of a branch-specific opening site for each germ layer: ectoderm (j), endoderm (k) and mesoderm (l). In g–i, P values were calculated for likelihood ratio tests evaluating the effect of progress through pseudotime on accessibility (n = 100 bins of cells; see Methods for details). Note that the branch point in pseudotime occurs at approximately 5.6 on the x axis.
Extended Data Figure 7 Library complexity and fraction of X chromosome reads highlights clusters of collisions between cells from different tissues.
Density plots of the estimated library complexity (using the same equation implemented in Picard; left) and the representation of X-chromosome reads (right) in individual clusters. While most of the clusters defined by t-SNE are readily biologically interpretable, a small number of clusters (containing relatively few cells) were not easily characterized and are marked by an increase in both estimated library complexity and an unusual distribution of X chromosome to autosomal reads. These clusters are likely to be clusters of collisions; that is, cases in which two or more distinct cells share the same barcode as a consequence of the combinatorial indexing protocol. The black line is the global distribution for all cells in that time point. The grey lines show the results of randomly sampling an equal number of cells to the cluster in question. The coloured line marks the distribution for the cluster being interrogated. a, c, e, Most clusters show relatively similar distributions of library complexity (left) and a characteristic, bimodal distribution among cells in the ratio of X-chromosome to autosomal reads (reflecting our use of a pool of male (XY) and female (XX) embryos, right). b, d, Putative collision clusters show a clear increase in the average library complexity (left) and a unimodal rather than bimodal distribution of X-chromosome to autosomal reads (right). f, These features are not universally diagnostic (for example, cluster 10 at 2–4 h seems to show a strong, bona fide sex bias), but the combination of features is strongly predictive of clusters containing few cells and conflicting biological annotations based on gene or enhancer overlaps.
t-SNE maps of cells from each of the three time points coloured according to the LSI clade to which they were previously assigned (Fig. 1a–c). For the post-gastrulation time points, green is endoderm, red is myogenic mesoderm, yellow is non-neurogenic ectoderm and blue is neurogenic ectoderm. There is strong correspondence between the germ-layer-level clade annotations from the LSI analysis and tissue-specific t-SNE clusters, particularly at the post-gastrulation time points (6–8 h and 10–12 h).
Extended Data Figure 9 Cell cluster assignment is similar using either enhancer or gene tissue activity.
For each time point, cell clusters were annotated by first dividing peaks into TSS-distal (putative enhancers) and TSS-proximal (gene promoters) peaks. Each cell cluster was then annotated separately by overlaps between cluster-enriched peaks and: (1) enhancers, comparing the TSS-distal elements to the tissue or cell-type activity of characterized enhancers; (2) genes, comparing TSS-proximal elements to the tissue expression of genes; and (3) Gene Ontology (GO) information (see Methods). Shown are the cluster assignments based on enhancer, gene expression, or Gene Ontology annotation alone. The final assignment, used in the main figures, combines all enrichment information to produce more robust assignments.
All candidate clade-specific enhancers tested in transgenic reporters. For each time point, upper panels show single cells visualized by t-SNE with the blue intensity representing the number of sci-ATAC-seq reads obtained from each tested element in each individual cell. Cell clusters bounded by dashed lines correspond to the predicted clade of activity. Lower panels show representative embryos for each time point with nuclei stained with DAPI (grey), in situ hybridization of the reporter gene driven by the enhancer (yellow) and a tissue marker (magenta). Annotation of each element’s activity involved observations across hundreds of embryos. Scale bar, 50 μm.
This file contains Supplementary Tables 1-3, 5-13 and a Supplementary Table Guide. (ZIP 20356 kb)
Enrichment analyses for t-SNE-defined clades. This data file contains sheets for the total enrichment, for enrichment of proximal elements (within 500bp of an annotated transcription start site) and distal (>500bp from an annotated TSS). Annotation datasets are described in “Gene Expression, Enhancer Expression, and TF Binding Data” in Methods for details. The term ‘custom’ refers to our database of TF ChIP peaks, while ‘stark’ refers to a subset of the CAD database from7 that are active at 2-4hr using terms that are specific to early development that are missing from the primary CAD database. These terms were calculate for all time points, but were only used to annotate clusters at 2-4hr. Excel file with multiple sheets. Statistical significance for each test was determined by a two-sided Fisher’s Exact Test with the number of significant and tested peaks for each category given in the supplementary table. Only statistically significant categories are listed. A full list of significant and tested elements for each cluster and each time point can be found in Table S1. A list of all tested categories can be found in Table S13. (XLSX 27021 kb)
About this article
Cite this article
Cusanovich, D., Reddington, J., Garfield, D. et al. The cis-regulatory dynamics of embryonic development at single-cell resolution. Nature 555, 538–542 (2018). https://doi.org/10.1038/nature25981
ASC proneural factors are necessary for chromatin remodeling during neuroectodermal to neuroblast fate transition to ensure the timely initiation of the neural stem cell program
BMC Biology (2022)
Genome Biology (2022)
BMC Genomics (2022)
Harnessing changes in open chromatin determined by ATAC-seq to generate insulin-responsive reporter constructs
BMC Genomics (2022)
Transcription factor-driven coordination of cell cycle exit and lineage-specification in vivo during granulocytic differentiation
Nature Communications (2022)