Direct lineage reprogramming represents a remarkable conversion of cellular and transcriptome states1,2,3. However, the intermediate stages through which individual cells progress during reprogramming are largely undefined. Here we use single-cell RNA sequencing4,5,6,7 at multiple time points to dissect direct reprogramming from mouse embryonic fibroblasts to induced neuronal cells. By deconstructing heterogeneity at each time point and ordering cells by transcriptome similarity, we find that the molecular reprogramming path is remarkably continuous. Overexpression of the proneural pioneer factor Ascl1 results in a well-defined initialization, causing cells to exit the cell cycle and re-focus gene expression through distinct neural transcription factors. The initial transcriptional response is relatively homogeneous among fibroblasts, suggesting that the early steps are not limiting for productive reprogramming. Instead, the later emergence of a competing myogenic program and variable transgene dynamics over time appear to be the major efficiency limits of direct reprogramming. Moreover, a transcriptional state, distinct from donor and target cell programs, is transiently induced in cells undergoing productive reprogramming. Our data provide a high-resolution approach for understanding transcriptome states during lineage differentiation.
This is a preview of subscription content
Subscription info for Chinese customers
We have a dedicated website for our Chinese customers. Please go to naturechina.com to subscribe to this journal.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Xu, J., Du, Y. & Deng, H. Direct lineage reprogramming: strategies, mechanisms, and applications. Cell Stem Cell 16, 119–134 (2015).
Arlotta, P. & Berninger, B. Brains in metamorphosis: reprogramming cell identity within the central nervous system. Curr. Opin. Neurobiol. 27, 208–214 (2014).
Graf, T. Historical origins of transdifferentiation and reprogramming. Cell Stem Cell 9, 504–516 (2011).
Treutlein, B. et al. Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509, 371–375 (2014).
Shalek, A. K. et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498, 236–240 (2013).
Zeisel, A. et al. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142 (2015).
Ramsköld, D. et al. Full-length mRNA-seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 30, 777–782 (2012).
Vierbuchen, T. et al. Direct conversion of fibroblasts to functional neurons by defined factors. Nature 463, 1035–1041 (2010).
Pfisterer, U. et al. Direct conversion of human fibroblasts to dopaminergic neurons. Proc. Natl Acad. Sci. USA 108, 10343–10348 (2011).
Yoo, A. S. et al. MicroRNA-mediated conversion of human fibroblasts to neurons. Nature 476, 228–231 (2011).
Ambasudhan, R. et al. Direct reprogramming of adult human fibroblasts to functional neurons under defined conditions. Cell Stem Cell 9, 113–118 (2011).
Caiazzo, M. et al. Direct generation of functional dopaminergic neurons from mouse and human fibroblasts. Nature 476, 224–227 (2011).
Wapinski, O. L. et al. Hierarchical mechanisms for direct reprogramming of fibroblasts to neurons. Cell 155, 621–635 (2013).
Chanda, S. et al. Generation of induced neuronal cells by the single reprogramming factor ASCL1. Stem Cell Rep. 3, 282–296 (2014).
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).
Buettner, F. et al. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33, 155–160 (2015).
Merkle, F. T. & Eggan, K. Modeling human disease with pluripotent stem cells: from genome association to function. Cell Stem Cell 12, 656–668 (2013).
Perrier, A. L. et al. Derivation of midbrain dopamine neurons from human embryonic stem cells. Proc. Natl Acad. Sci. USA 101, 12543–12548 (2004).
Li, X. J. et al. Specification of motoneurons from human embryonic stem cells. Nat. Biotechnol. 23, 215–221 (2005).
Camp, J. G. et al. Human cerebral organoids recapitulate gene expression programs of fetal neocortex development. Proc. Natl Acad. Sci. USA 112, 15672–15677 (2015).
Di Stefano, B. et al. C/EBPα poises B cells for rapid reprogramming into induced pluripotent stem cells. Nature 506, 235–239 (2014).
Lujan, E. et al. Early reprogramming regulators identified by prospective isolation and mass cytometry. Nature 521, 352–356 (2015).
Takahashi, K. et al. Induction of pluripotency in human somatic cells via a transient state resembling primitive streak-like mesendoderm. Nat. Commun. 5, 3678 (2014).
Zhang, Y. et al. An RNA-sequencing transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex. J. Neurosci. 34, 11929–11947 (2014).
Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).
Baker, S. C. et al. The External RNA Controls Consortium: a progress report. Nat. Methods 2, 731–734 (2005).
Jiang, L. et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).
Carpenter, A. E. et al. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 7, R100 (2006).
Wu, A. R. et al. Quantitative assessment of single-cell RNA-sequencing methods. Nat. Methods 11, 41–46 (2014).
Babraham Institute. Babraham Bioinformatics. FASTQC. http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10–12 (2011).
Schmieder, R. & Edwards, R. Quality control and preprocessing of metagenomic datasets. Bioinformatics 27, 863–864 (2011).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25, 1105–1111 (2009).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
RStudio. Integrated Development for R. RStudio, Inc., Boston, MA URL http://www.rstudio.com/ (2015).
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. http://www.R-project.org/
Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).
Csardi, G. & Nepusz, T. The igraph software package for complex network research. InterJournal 1695 (2006).
Huang, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protocols 4, 44–57 (2009).
Zhang, H. M. et al. AnimalTFDB: a comprehensive animal transcription factor database. Nucleic Acids Res. 40, D144–D149 (2012).
Kumar, R. M. et al. Deconstructing transcriptional heterogeneity in pluripotent stem cells. Nature 516, 56–61 (2014).
Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014).
The authors would like to acknowledge B. Passarelli and B. Vernot for discussions regarding bioinformatic pipelines, P. Lovelace for support with FACS and other Quake and Wernig laboratory members for discussions and support. This work was supported by NIH grant RC4NS073015-01 (M.W., S.Q.R., B.T.), the Stinehart-Reed Foundation, the Ellison Medical Foundation, the New York Stem Cell Foundation, CIRM grant RB5-07466 (all to M.W.), a National Science Scholarship from the Agency for Science, Technology and Research (Q.Y.L.), NIH grant GM092925 (S.A.M.S., J.S.), the German Research Foundation (M.M.) and a PhRMA foundation Informatics fellowship (J.G.C.). S.R.Q. is an investigator of the Howard Hughes Medical Institute. M.W. is a New York Stem Cell Foundation (NYSCF) Robertson Investigator and a Tashia and John Morgridge Faculty Scholar at the Child Health Research Institute at Stanford.
S.R.Q. is a founder and consultant for Fluidigm Corporation.
Reviewer Information Nature thanks F. Tang and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
Extended Data Figure 1 The majority of MEFs are actively undergoing cell cycle, but exit cell cycle upon Ascl1 induction.
a, Live cell imaging of Tau–eGFP reporter over the course of BAM-mediated iN cell reprogramming. Tau–eGFP fluorescence normalized to the maximum expression is shown in relation to days post-BAM induction. Tau–eGFP expression began at day 5 and reached a peak at day 8 after induction. Shown are representative images from day 0, day 5 and day 9. b, Box plots of intercellular transcriptome variance showed that MEFs are more heterogeneous than mouse embryonic stem cells under 2iLIF culture conditions44 and less heterogeneous than glioblastoma cells45. c, PCA of genes with most variance in day 0 MEFs revealed MEF heterogeneity (blue, A). Density plot showing the distribution of number of cells along PC1 loading is shown above the PCA plot. d, Heat map and hierarchical clustering of genes used for the PCA in panel c shows to major MEF subpopulations. Each column represents a single cell, and each row a gene. Subpopulation A is highlighted in blue in the dendrogram. e, GO enrichment for genes in c shows that MEF subpopulation A is distinguished by the low or lack of expression of genes enriched for cell cycle terms. f, g, PCA and heat map of the same genes used in panels c–e, this time including day 0 MEFs (circles, light green) and day 2 cells (squares, dark green), showed that most of the day 2 cells had the same cell cycle signature as MEF subpopulation A. Cells in columns of both heat maps are ordered based on PC1 loading.
Extended Data Figure 2 Total number of transcripts per cell decreases during MEF-to-iN cell reprogramming.
a, Average detected transcript levels (mean FPKM, log2) for 92 ERCC RNA spike-ins as a function of provided number of molecules per lysis reaction for each of the 8 independent single-cell RNA-seq experiments. Linear regression fits through data points are shown. The length of each ERCC RNA spike-in transcript is encoded in the size of the data points. No particular bias towards the detection of shorter versus longer transcripts is observed. The linear regression fit was used to convert FPKM values to approximate number of transcripts. b, Box plots showing the distribution of the total number of transcripts per single cell for each experiment. Number of transcripts per cell were calculated from the FPKM values of all genes in each cell using the correlation between number of transcripts of exogenous spike-in mRNA sequences and their respective measured mean FPKM values (calibration curves are shown in panel a). The total number of transcripts expressed by a single cell and detected by single-cell RNA-seq is highest in MEFs and is more than twofold decreased upon overexpression of Ascl1 or BAM. c, Box plots showing the distribution of the median transcript number per gene across all cells of one experiment. The distributions are similar over the course of iN cell reprogramming.
Extended Data Figure 3 Clonal MEFs reprogram successfully into iN cells, and Ascl1-only and BAM induce similar responses during early iN cell reprogramming.
a, Immunostaining of heterogenous Ascl1-infected MEFs and clonal MEFs with homogenous Ascl1 transgene insertions, fixed 12 days after Ascl1 induction, using rabbit anti-Tubb3 (red) and mouse anti-Map2 (cyan) antibodies and DAPI (blue) as a nuclear stain. Reprogramming efficiencies are comparable regardless of variation in Ascl1 copy numbers. Images are representative for one reprogramming experiment. b, Bar plots showing expression of Ascl1-target genes (Hes6, Zfp238, Snca, Cox8b, Bex1, Dner) and MEF marker genes averaged across single cells from day 0 MEFs and day 2 Ascl1-only cells, as well as from bulk RNA-seq data from MEFs, day 2 BAM, and day 2 Ascl1-only cells. This data shows that the initiation of reprogramming at day 2 is similar for Ascl1-alone and BAM-mediated reprogramming.
a, Bonferroni-corrected P values for gene ontology enrichments are shown for each group of genes from Fig. 2a, with representative genes listed (Supplementary Data 4). b, Biplot showing Tau–eGFP fluorescence intensity as a function of Ascl1 transcript level in day 5 cells. Point size is proportional to eGFP transcript levels in log2[FPKM]. There is a positive correlation (R2 = 0.49) indicating that cells with higher Ascl1 expression are more likely to reprogram. c, Heat map of eGFP–Ascl1 expression in 14 individual cells (columns) during live cell imaging. Rows represent time post Ascl1 induction in 45-min intervals.
Extended Data Figure 5 Live cell imaging shows diminishing of eGFP–Ascl1 signal in cells that fail to reprogram.
a, Immunostaining for Tubb3 and Map2 at day 12 post induction of Ascl1, C-terminal tagged Ascl1–eGFP and N-terminal tagged eGFP–Ascl1 in CD-1 MEFs. eGFP–Ascl1 has comparable reprogramming efficiency with untagged Ascl1 while Ascl1–eGFP has a much reduced reprogramming efficiency, so eGFP–Ascl1 was chosen for live cell imaging. Images are representative for one reprogramming experiment per condition. b, Representative images from live cell imaging showing an example of diminishing of eGFP signal in a cell that failed to reprogram (that is, cell was Tuj1-negative at day 6). c, Live cell imaging of eGFP signal of eGFP–Ascl1 infected MEFs between 3–6 days post dox induction. d, eGFP imaging of live cells 6 days post induction of Ascl1 and corresponding immunostaining for Tubb3 after fixation.
Extended Data Figure 6 Brn2 and Myt1l repress alternative fates that compete with the iN cell fate during advanced Ascl1 reprogramming.
a, Scatter plot showing PC1 and PC2 loadings from principal component analysis (PCA) of single cells from all time points with experimental time point and reprogramming condition (Ascl1 versus BAM) encoded in point shape and colour. b, Overview of quadratic programming. Fractional identities are calculated assuming a linear combination of different cell fates. c, Biplots showing the fractional fibroblast identity as a function of fractional neuron (left) and fractional myocyte (right) identity for each cell with points shaped and colour coded based on reprogramming time point and condition. d, Correlation of transcriptomes from days 0, 2, 5, and 20/22 cells (Ascl1-only and BAM-induced) with bulk RNA-seq from MEFs, cortical neurons and myocytes. Bottom bars show Tau–eGFP fluorescence intensity. e, Bar plot quantifying the number of cells with a maximum correlation to bulk RNA-seq data from each of the observed fates (d). f, Immunofluorescent detection of Tau–eGFP (green), DAPI (blue), Myh3 (red) and Tubb3 (cyan) for day 22 cells that were infected with Ascl1 co-infected with Brn2 or Myt1l. See Fig. 3e for respective data for cells infected with Ascl1-only or all three BAM factors. Images are representative for four biological replicates. Right, mean fractions of eGFP+ cells that express either Tubb3 or Myh3. Only Tubb3+ cells with a neuronal morphology were counted. Co-expression of Ascl1 with Brn2 and/or Myt1l increases fraction of Tau–eGFP+ cells that are also Tubb3+, while decreasing the number of cells that are Myh3+. Six or seven images were analysed for each of four biological replicates. Error bars, s.e.m. g–i, qRT–PCR of selected myogenic (g), neuronal (h), and fibroblast (i) markers using day 22 cells that are infected with Ascl1 only or co-infected with Brn2 or Myt1l or both and FAC-sorted by Tau–eGFP (n = 3, biological replicates; error bars, s.e.m.). Myogenic genes were significantly downregulated in Tau–eGFP+ cells that were co-infected with Brn2 and/or Myt1l compared to those infected with Ascl1 alone, while some neuronal genes are significantly upregulated (Map2, Gria) (*P < 0.05, **P < 0.01, ***P < 0.001, two-tailed t-test).
Extended Data Figure 7 Comparison of Monocle and quadratic programming with respect to ordering of neuronal cells through the reprogramming path.
a, Biplot showing the total number of transcripts per cell for all cells on the MEF-to-iN cell lineage as a function of the fraction neuron identity of each cell (see Fig. 4). The total number of transcripts decreases during the reprogramming process. b, Cells (depicted as circles) are arranged in the 2D independent component space based on the expression of genes used for quadratic programming in Fig. 4a. Lines connecting cells represent the edges of a minimal spanning tree with the bold black line indicating the longest path. Time points are colour coded. c, Monocle plots with single cells coloured based on gene expression that distinguishes the stages of iN cell reprogramming. d, Biplot shows the correlation between ordering of cells based on pseudo-time (Monocle) and fractional identity (quadratic programming). Time points are colour coded. Pearson correlation coefficient = 0.91.
Extended Data Figure 8 Neuronal maturation proceeds through expression of distinct transcriptional regulators.
a, Correlogram showing transcriptional regulators (TRs) highly correlated within MEFs as well as the initiation phase and the maturation phase of reprogramming. b, Heat map shows expression of TRs that control the two stages of MEF to iN cell reprogramming (Fig. 4d) in cells ordered based on fractional neuron identity. Each row represents a single cell, each column a gene. Experimental time point (green/blue sidebar) and fractional neuron identity (yellow/red sidebar) are shown at the top. c–e, Pseudo-temporal expression dynamics of exemplary TRs marking the initiation stage (c) and the maturation stage (d) of iN cell reprogramming as well as MEF identity (e). Transcript levels of the TRs are shown across all single cells on the MEF-to-iN cell lineage ordered based on fractional neuron identity. Growth curves based on a model-free spline method were fitted to the data. f, qRT–PCR of selected TRs from initiation and maturation subnetworks from Fig. 4d. Uninfected MEF controls and day 2–12 Ascl1-infected cells were assayed for all selected TRs, and day 22 Ascl1-alone and BAM-infected cells were additionally assayed for maturation TRs. Cells for day 5 to day 22 samples were FAC-sorted into Tau–eGFP+ and Tau–eGFP− populations (n = 4 for all populations, biological replicates; error bars, s.e.m.). g, Western blot for selected TRs from the initiation subnetwork presented in panel b. β-Actin was used as a loading control (Supplementary Data 8).
About this article
Cite this article
Treutlein, B., Lee, Q., Camp, J. et al. Dissecting direct reprogramming from fibroblast to neuron using single-cell RNA-seq. Nature 534, 391–395 (2016). https://doi.org/10.1038/nature18323
Fusion of single-cell transcriptome and DNA-binding data, for genomic network inference in cortical development
BMC Bioinformatics (2021)
Nature Reviews Molecular Cell Biology (2021)
Regeneration of infarcted mouse hearts by cardiovascular tissue formed via the direct reprogramming of mouse fibroblasts
Nature Biomedical Engineering (2021)
Genome Biology (2020)
Nature Cell Biology (2020)