The emerging diversity of single-cell RNA-seq datasets allows for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. However, it is challenging to analyze them together, particularly when datasets are assayed with different technologies, because biological and technical differences are interspersed. We present Harmony (https://github.com/immunogenomics/harmony), an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Harmony simultaneously accounts for multiple experimental and biological factors. In six analyses, we demonstrate the superior performance of Harmony to previously published algorithms while requiring fewer computational resources. Harmony enables the integration of ~106 cells on a personal computer. We apply Harmony to peripheral blood mononuclear cells from datasets with large experimental differences, five studies of pancreatic islet cells, mouse embryogenesis datasets and the integration of scRNA-seq with spatial transcriptomics data.
This is a preview of subscription content
Subscription info for Chinese customers
We have a dedicated website for our Chinese customers. Please go to naturechina.com to subscribe to this journal.
Get time limited or full article access on ReadCube.
All prices are NET prices.
All data analyzed in this article are publicly available through online sources. We included links to all data sources in Supplementary Table 8.
Harmony and LISI are available as R packages on https://github.com/immunogenomics/harmony and https://github.com/immunogenomics/lisi. Scripts to reproduce results of the primary analyses will be made available on https://github.com/immunogenomics/harmony2019. Additionally, vignettes are included as Supplementary Notes. Supplementary Note 1 provides a detailed walkthrough of Harmony, connecting theoretical algorithm components to their code implementations. Supplementary Note 2 demonstrates the LISI metric and how to evaluate its statistical significance. Supplementary Note 1 uses Harmony with simulated datasets.
Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protocols 13, 599–604 (2018).
Regev, A. et al. The human cell atlas. eLife 6, e27041 (2017).
Zhang, F. et al. Defining inflammatory cell states in rheumatoid arthritis joint synovial tissues by integrating single-cell transcriptomics and mass cytometry. Nat. Immunol. 20, 928–942 (2019).
Arazi, A. et al. The immune cell landscape in kidneys of lupus nephritis patients. Nat. Immunol. 20, 902–914 (2019).
Der, E. et al. Tubular cell and keratinocyte single-cell transcriptomics applied to lupus nephritis reveal type I IFN and fibrosis relevant pathways. Nat. Immunol. 20, 915–927 (2019).
Hicks, S. C., Townes, F. W., Teng, M. & Irizarry, R. A. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19, 562–578 (2017).
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).
Hie, B. L., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2018).
Polanski, K. et al. BBKNN: fast batch alignment of single cell transcriptomes. Bioinformatics https://doi.org/10.1093/bioinformatics/btz625 (2019).
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
Li, B. et al. HCA Data Portal: census of immune cells (Human Cell Atlas, 2019).
Segerstolpe, A. et al. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab. 24, 593–607 (2016).
Baron, M. et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst. 3, 346–360 (2016).
Lawlor, N. et al. Single-cell transcriptomes identify human islet cell signatures and reveal cell-type-specific expression changes in type 2 diabetes. Genome Res. 27, 208–222 (2017).
Grun, D. et al. De novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell 19, 266–277 (2016).
Muraro, M. J. et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 3, 385–394 (2016).
Ritchie, M. E. et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Gao, T. et al. Pdx1 maintains β cell identity and function by repressing an α cell program. Cell Metab. 19, 259–271 (2014).
Jia, S. et al. Insm1 cooperates with neurod1 and foxa2 to maintain mature pancreatic β-cell function. EMBO J. 34, 1417–1433 (2015).
Sachdeva, M. M. et al. Pdx1 (MODY4) regulates pancreatic beta cell susceptibility to ER stress. Proc. Natl Acad. Sci. USA 106, 19090–19095 (2009).
Katoh, M. C. et al. MafB is critical for glucagon production and secretion in mouse pancreatic α cells in vivo. Mol. Cell. Biol. 38, e00504–e00517 (2018).
Liu, J. et al. Islet-1 regulates arx transcription during pancreatic islet α-cell development. J. Biol. Chem. 286, 15352–15360 (2011).
Akiyama, M. et al. X-box binding protein 1 is essential for insulin regulation of pancreatic α-cell function. Diabetes 62, 2439–2449 (2013).
Burcelin, R., Knauf, C. & Cani, P. D. Pancreatic alpha-cell dysfunction in diabetes. Diabetes Metab. 34, S49–S55 (2008).
Pijuan-Sala, B. et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature 566, 490–495 (2019).
Moffitt, J. R.et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 362, eaau5324 (2018).
Moffitt, J. et al. Data from: Molecular, Spatial and Functional Single-cell Profiling of the Hypothalamic Preoptic Region (Dryad, Dataset, 2018); https://doi.org/10.5061/dryad.8t8s248
Khan, A. et al. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res. 46, D260–D266 (2018).
Close, J. et al. Satb1 is an activity-modulated transcription factor required for the terminal differentiation and connectivity of medial ganglionic eminence-derived cortical interneurons. J. Neurosci. 32, 17690–17705 (2012).
Lein, E. S. et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature 445, 168–176 (2007).
Leek, J. T. & Storey, J. D. Capturing heterogeneity in gene expressionstudies by surrogate variable analysis. PloS Genet. 3, e161 (2007).
Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nature Protocols 7, 500–507 (2012).
Mizoguchi, F. et al. Functionally distinct disease-associated fibroblast subsets in rheumatoid arthritis. Nat. Commun. 9, 789 (2018).
Manno, G. L. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).
Mao, Q., Wang, L., Goodison, S. & Sun, Y. Dimensionality reduction via graph structure learning. In Proc. 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015, 765–774 (ACM, 2015).
Dhillon, I. S. & Modha, D. S. Concept decompositions for large sparse text data using clustering. Mach. Learn. 42, 143–175 (2001).
Jordan, M. I. & Jacobs, R. A. Hierarchical mixtures of experts and the EM algorithm. Neural Comput. 6, 181–214 (1994).
Buttner, M., Miao, Z., Wolf, F. A., Teichmann, S. A. & Theis, F. J. A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods 16, 43–49 (2019).
Azizi, E. et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell 174, 1293–1308 (2018).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
McInnes, L. & Healy, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2019).
Lun, A. T. L., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with bioconductor. F1000 Res. 5, 2122 (2016).
Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008, P10008 (2008).
Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 14, 128 (2013).
Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).
The Gene Ontology Consortium. Expansion of the gene ontology knowledgebase and resources. Nucleic Acids Res. 45, D331–D338 (2017).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. the gene ontology consortium. Nat. Genet. 25, 25–29 (2000).
This work was supported in part by funding from the National Institutes of Health (grant nos. UH2AR067677 and U19AI111224 and no. 1R01AR063759 (to S.R.) and T32 AR007530-31 (to I.K.)). We thank members of the Raychaudhuri and Brenner labs for comments and discussion. I.K. and K.W. were funded as part of a collaborative research agreement with F. Hoffmann-La Roche Ltd (Basel, Switzerland), to S.R. and M.B.B.
I.K. does paid bioinformatics consulting through Brilyant LLC.
Peer review information Nicole Rusk was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Figs. 1–19, Supplementary Results and Supplementary Notes 1–3.
Harmony R package. Software to perform Harmony integration analysis.
LISI R package. Software to compute the Local Inverse Simpson’s Index.
Jurkat LISI, Time benchmark, Memory Benchmark, HCA LISI, PBMC LISI, Inhibitory, Excitatory, Data Sources.
About this article
Cite this article
Korsunsky, I., Millard, N., Fan, J. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods 16, 1289–1296 (2019). https://doi.org/10.1038/s41592-019-0619-0
Deep immunophenotyping reveals endometriosis is marked by dysregulation of the mononuclear phagocytic system in endometrium and peripheral blood
BMC Medicine (2022)
Genome Biology (2022)
CUT&Tag2for1: a modified method for simultaneous profiling of the accessible and silenced regulome in single cells
Genome Biology (2022)
Expression-based species deconvolution and realignment removes misalignment error in multispecies single-cell data
BMC Bioinformatics (2022)
BMC Bioinformatics (2022)