Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding

Abstract

Recent advances in single-cell technologies have enabled the characterization of epigenomic heterogeneity at the cellular level. Computational methods for automatic cell type annotation are urgently needed given the exponential growth in the number of cells. In particular, annotation of single-cell chromatin accessibility sequencing (scCAS) data, which can capture the chromatin regulatory landscape that governs transcription in each cell type, has not been fully investigated. Here we propose EpiAnno, a probabilistic generative model integrated with a Bayesian neural network, to annotate scCAS data automatically in a supervised manner. We systematically validate the superior performance of EpiAnno for both intra- and inter-dataset annotation on various datasets. We further demonstrate the advantages of EpiAnno for interpretable embedding and biological implications via expression enrichment analysis, partitioned heritability analysis, enhancer identification, cis-coaccessibility analysis and pathway enrichment analysis. In addition, we show that EpiAnno has the potential to reveal cell type-specific motifs and facilitate scCAS data simulation.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: EpiAnno annotates the cell type of scCAS data via supervised Bayesian embedding.
Fig. 2: Evaluation of intra-dataset annotation performance.
Fig. 3: Biological implications of the cell type-specific peaks identified by EpiAnno.
Fig. 4: Evaluation of inter-dataset annotation performance.
Fig. 5: Motif enrichment analysis and scCAS data simulation.

Data availability

The CLP_LMPP_MPP and CLP_CMP_MPP datasets were collected from NCBI Gene Expression Omnibus (GEO) under accession no. GSE96772. The forebrain dataset can be accessed from GEO under accession number GSE100033. The InSilico dataset was collected from GEO with accession no. GSE65360. The leukaemia dataset can be accessed from GEO with accession no. GSE74310. The mouse brain datasets are available at http://atlas.gs.washington.edu/mouse-atac/data/. The PBMC5k and PBMC10k datasets are available at https://support.10xgenomics.com/single-cell-atac/datasets.

Code availability

The EpiAnno software, including detailed documents and tutorial, is freely available on GitHub (https://github.com/xy-chen16/EpiAnno) and Zenodo (https://doi.org/10.5281/zenodo.5716525)61.

References

  1. The Tabula Muris Consortium Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).

  2. Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).

    Google Scholar 

  3. Abdelaal, T. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 20, 194 (2019).

    Google Scholar 

  4. Kiselev, V. Y., Yiu, A. & Hemberg, M. scmap: projection of single-cell RNA-seq data across data sets. Nat. Methods 15, 359–362 (2018).

    Google Scholar 

  5. Pliner, H. A., Shendure, J. & Trapnell, C. Supervised classification enables rapid annotation of cell atlases. Nat. Methods 16, 983–986 (2019).

    Google Scholar 

  6. Xie, P. et al. SuperCT: a supervised-learning framework for enhanced characterization of single-cell transcriptomic profiles. Nucleic Acids Res. 47, e48 (2019).

    Google Scholar 

  7. Klemm, S. L., Shipony, Z. & Greenleaf, W. J. Chromatin accessibility and the regulatory epigenome. Nat. Rev. Genet. 20, 207–220 (2019).

    Google Scholar 

  8. Cusanovich, D. A. et al. A single-cell atlas of in vivo mammalian chromatin accessibility. Cell 174, 1309–1324 (2018).

    Google Scholar 

  9. Chen, H. et al. Assessment of computational methods for the analysis of single-cell ATAC-seq data. Genome Biol. 20, 241 (2019).

    Google Scholar 

  10. Preissl, S. et al. Single-nucleus analysis of accessible chromatin in developing mouse forebrain reveals cell-type-specific transcriptional regulation. Nat. Neurosci. 21, 432–439 (2018).

    Google Scholar 

  11. Domcke, S. et al. A human cell atlas of fetal chromatin accessibility. Science 370, eaba7612 (2020).

  12. Navidi, Z., Zhang, L. & Wang, B. simATAC: a single-cell ATAC-seq simulation framework. Genome Biol. 22, 74 (2021).

    Google Scholar 

  13. Sun, T., Song, D., Li, W. V. & Li, J. J. scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured. Genome Biol. 22, 163 (2021).

    Google Scholar 

  14. Zamanighomi, M. et al. Unsupervised clustering and epigenetic classification of single cells. Nat. Commun. 9, 2410 (2018).

    Google Scholar 

  15. Xiong, L. et al. SCALE method for single-cell ATAC-seq analysis via latent feature extraction. Nat. Commun. 10, 4576 (2019).

    Google Scholar 

  16. Chen, S. et al. RA3 is a reference-guided approach for epigenetic characterization of single cells. Nat. Commun. 12, 2177 (2021).

    Google Scholar 

  17. Cusanovich, D. A. et al. Multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348, 910–914 (2015).

    Google Scholar 

  18. Buenrostro, J. D. et al. Integrated single-cell analysis maps the continuous regulatory landscape of human hematopoietic differentiation. Cell 173, 1535–1548 (2018).

    Google Scholar 

  19. Ma, W., Su, K. & Wu, H. Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection and reference construction. Genome Biol. 22, 264 (2021).

    Google Scholar 

  20. Ma, F. & Pellegrini, M. ACTINN: automated identification of cell types in single cell RNA sequencing. Bioinformatics 36, 533–538 (2020).

    Google Scholar 

  21. Sun, S., Zhu, J., Ma, Y. & Zhou, X. Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol. 20, 269 (2019).

    Google Scholar 

  22. Slowikowski, K., Hu, X. & Raychaudhuri, S. SNPsea: an algorithm to identify cell types, tissues and pathways affected by risk loci. Bioinformatics 30, 2496–2497 (2014).

    Google Scholar 

  23. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    Google Scholar 

  24. Visel, A., Minovitsky, S., Dubchak, I. & Pennacchio, L. A. VISTA Enhancer Browser—a database of tissue-specific human enhancers. Nucleic Acids Res. 35, D88–D92 (2007).

    Google Scholar 

  25. Gao, T. & Qian, J. EnhancerAtlas 2.0: an updated resource with enhancer annotation in 586 tissue/cell types across nine species. Nucleic Acids Res. 48, D58–D64 (2020).

    Google Scholar 

  26. Hinrichs, A. S. et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 34, D590–D598 (2006).

    Google Scholar 

  27. Gao, T. et al. scEnhancer: a single-cell enhancer resource with annotation across hundreds of tissue/cell types in three species. Nucleic Acids Res. https://doi.org/10.1093/nar/gkab1032 (2021).

  28. Zeng, W. et al. SilencerDB: a comprehensive database of silencers. Nucleic Acids Res. 49, D221–D228 (2021).

    Google Scholar 

  29. Pliner, H. A. et al. Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Mol. Cell 71, 858–871 (2018).

    Google Scholar 

  30. McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).

    Google Scholar 

  31. Allen, N. J. Astrocyte regulation of synaptic behavior. Annu. Rev. Cell Dev. Biol. 30, 439–463 (2014).

    Google Scholar 

  32. Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).

    Google Scholar 

  33. Corces, M. R. et al. Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat. Genet. 48, 1193–1203 (2016).

    Google Scholar 

  34. Schep, A. N., Wu, B., Buenrostro, J. D. & Greenleaf, W. J. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat. Methods 14, 975–978 (2017).

    Google Scholar 

  35. Bannwarth, S. et al. Organization of the human tarbp2 gene reveals two promoters that are repressed in an astrocytic cell line. J. Biol. Chem. 276, 48803–48813 (2001).

    Google Scholar 

  36. Fujiyama, T. et al. Inhibitory and excitatory subtypes of cochlear nucleus neurons are defined by distinct bHLH transcription factors, Ptf1a and Atoh1. Development 136, 2049–2058 (2009).

    Google Scholar 

  37. Jin, S. et al. Inference and analysis of cell–cell communication using CellChat. Nat. Commun. 12, 1088 (2021).

    Google Scholar 

  38. Nathanson, J. L. et al. Short promoters in viral vectors drive selective expression in mammalian inhibitory neurons, but do not restrict activity to specific inhibitory cell-types. Front. Neural Circuits 3, 19 (2009).

    Google Scholar 

  39. Wang, P., Zhao, D., Lachman, H. M. & Zheng, D. Enriched expression of genes associated with autism spectrum disorders in human inhibitory neurons. Transl. Psychiatry 8, 13 (2018).

    Google Scholar 

  40. Matcovitch-Natan, O. et al. Microglia development follows a stepwise program to regulate brain homeostasis. Science 353, aad8670 (2016).

    Google Scholar 

  41. Zusso, M. et al. Regulation of postnatal forebrain amoeboid microglial cell proliferation and development by the transcription factor Runx1. J. Neurosci. 32, 11285–11298 (2012).

    Google Scholar 

  42. Wittstatt, J., Reiprich, S. & Küspert, M. Crazy little thing called Sox—new insights in oligodendroglial Sox protein function. Int. J. Mol. Sci. 20, 2713 (2019).

    Google Scholar 

  43. Romano, S., Vinh, N. X., Bailey, J. & Verspoor, K. Adjusting for chance clustering comparison measures. J. Mach. Learn. Res. 17, 4635–4666 (2016).

    MathSciNet  MATH  Google Scholar 

  44. Nataf, S., Guillen, M. & Pays, L. TGFB1-mediated gliosis in multiple sclerosis spinal cords is favored by the regionalized expression of HOXA5 and the age-dependent decline in androgen receptor ligands. Int. J. Mol. Sci. 20, 5934 (2019).

    Google Scholar 

  45. Lananna, B. V. et al. Cell-autonomous regulation of astrocyte activation by the circadian clock protein BMAL1. Cell Rep. 25, 1–9 (2018).

    Google Scholar 

  46. Liu, Q., Xu, J., Jiang, R. & Wong, W. H. Density estimation using deep generative neural networks. Proc. Natl Acad. Sci. USA 118, e2101344118 (2021).

  47. Risso, D., Perraudeau, F., Gribkova, S., Dudoit, S. & Vert, J. P. A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun. 9, 284 (2018).

    Google Scholar 

  48. Eraslan, G., Simon, L. M., Mircea, M., Mueller, N. S. & Theis, F. J. Single-cell RNA-seq denoising using a deep count autoencoder. Nat. Commun. 10, 390 (2019).

    Google Scholar 

  49. Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).

    Google Scholar 

  50. Tian, T., Wan, J., Song, Q. & Wei, Z. Clustering single-cell RNA-seq data with a model-based deep learning approach. Nat. Mach. Intell. 1, 191–198 (2019).

    Google Scholar 

  51. Liu, Q., Chen, S., Jiang, R. & Wong, W. H. Simultaneous deep generative modeling and clustering of single cell genomic data. Nat. Mach. Intell. 3, 536–544 (2021).

    Google Scholar 

  52. Stuart, T., Srivastava, A., Madad, S., Lareau, C. A. & Satija, R. Single-cell chromatin state analysis with Signac. Nat. Methods 18, 1333–1341 (2021).

    Google Scholar 

  53. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).

  54. Chen, X., Chen, S. & Jiang, R. EnClaSC: a novel ensemble approach for accurate and robust cell-type classification of single-cell transcriptomes. BMC Bioinformatics 21, 392 (2020).

    Google Scholar 

  55. Li, Y. & Luo, Y. Performance-weighted-voting model: an ensemble machine learning method for cancer type classification using whole-exome sequencing mutation. Quant. Biol. 8, 347–358 (2020).

    Google Scholar 

  56. Wold, S., Esbensen, K. & Geladi, P. Principal component analysis. Chemometr. Intell. Lab. Syst. 2, 37–52 (1987).

    Google Scholar 

  57. Vierstra, J. et al. Mouse regulatory DNA landscapes reveal global principles of cis-regulatory evolution. Science 346, 1007–1012 (2014).

    Google Scholar 

  58. Su, A. I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl Acad. Sci. USA 101, 6062–6067 (2004).

    Google Scholar 

  59. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    Google Scholar 

  60. Chen, S. Q., Zhang, B. H., Chen, X. Y., Zhang, X. G. & Jiang, R. stPlus: a reference-based method for the accurate enhancement of spatial transcriptomics. Bioinformatics 37, I299–I307 (2021).

    Google Scholar 

  61. Chen, X. et al. xy-chen16/EpiAnno: EpiAnno. Zenodo https://doi.org/10.5281/zenodo.5716525 (2021).

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China grant no. 2021YFF1200902 (R.J.), the National Natural Science Foundation of China grants nos. 61873141 (R.J.), 61721003 (X.Z.), 61573207 (R.J.), U1736210 (H.L.), a grant from the Guoqiang Institute, Tsinghua University (R.J.), and the Tsinghua-Fuzhou Institute for Data Technology. We thank S. Lei for helpful suggestions and L. Xiong for cell type labels of the forebrain dataset.

Author information

Authors and Affiliations

Authors

Contributions

R.J. conceived the study and supervised the project. X.C. and S.C. designed, implemented and validated EpiAnno. S.S., Z.G. and L.H. helped with analysing the results. X.C., S.C., R.J., H.L. and X.Z. wrote the manuscript, with input from all the authors.

Corresponding author

Correspondence to Rui Jiang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Wei Lin and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes 1–5, Figs. 1–10 and Tables 1–3.

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chen, X., Chen, S., Song, S. et al. Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding. Nat Mach Intell 4, 116–126 (2022). https://doi.org/10.1038/s42256-021-00432-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-021-00432-w

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing