The human genome expresses thousands of natural antisense transcripts (NAT) that can regulate epigenetic state, transcription, RNA stability or translation of their overlapping genes1,2. Here we describe MAPT-AS1, a brain-enriched NAT that is conserved in primates and contains an embedded mammalian-wide interspersed repeat (MIR), which represses tau translation by competing for ribosomal RNA pairing with the MAPT mRNA internal ribosome entry site3. MAPT encodes tau, a neuronal intrinsically disordered protein (IDP) that stabilizes axonal microtubules. Hyperphosphorylated, aggregation-prone tau forms the hallmark inclusions of tauopathies4. Mutations in MAPT cause familial frontotemporal dementia, and common variations forming the MAPT H1 haplotype are a significant risk factor in many tauopathies5 and Parkinson’s disease. Notably, expression of MAPT-AS1 or minimal essential sequences from MAPT-AS1 (including MIR) reduces—whereas silencing MAPT-AS1 expression increases—neuronal tau levels, and correlate with tau pathology in human brain. Moreover, we identified many additional NATs with embedded MIRs (MIR-NATs), which are overrepresented at coding genes linked to neurodegeneration and/or encoding IDPs, and confirmed MIR-NAT-mediated translational control of one such gene, PLCG1. These results demonstrate a key role for MAPT-AS1 in tauopathies and reveal a potentially broad contribution of MIR-NATs to the tightly controlled translation of IDPs6, with particular relevance for proteostasis in neurodegeneration.
This is a preview of subscription content
Subscription info for Chinese customers
We have a dedicated website for our Chinese customers. Please go to naturechina.com to subscribe to this journal.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Customized code used throughout this study can be found at https://github.com/robertosimone-ucl/scripts_RIBOseq_QuantSeq and https://github.com/robertosimone-ucl/scripts_DEG_in_AD. The code used by PINOT can be found at https://www.reading.ac.uk/bioinf/downloads/PINOT_scripts/. Further details are available upon reasonable request from the corresponding authors. In all other cases software tools used for specific analyses are reported and cited in the Methods.
Pelechano, V. & Steinmetz, L. M. Gene regulation by antisense transcription. Nat. Rev. Genet. 14, 880–893 (2013).
Statello, L., Guo, C.-J., Chen, L.-L. & Huarte, M. Gene regulation by long non-coding RNAs and its biological functions. Nat. Rev. Mol. Cell Biol. 22, 96–118 (2021).
Veo, B. L. & Krushel, L. A. Secondary RNA structure and nucleotide specificity contribute to internal initiation mediated by the human tau 5′ leader. RNA Biol. 9, 1344–1360 (2012).
Spillantini, M. G. & Goedert, M. Tau pathology and neurodegeneration. Lancet Neurol. 12, 609–622 (2013).
Pittman, A. M. et al. Linkage disequilibrium fine mapping and haplotype association analysis of the tau gene in progressive supranuclear palsy and corticobasal degeneration. J. Med. Genet. 42, 837–846 (2005).
Gsponer, J., Futschik, M. E., Teichmann, S. A. & Babu, M. M. Tight regulation of unstructured proteins: from transcript synthesis to protein degradation. Science 322, 1365–1368 (2008).
Zucchelli, S. et al. Antisense transcription in loci associated to hereditary neurodegenerative diseases. Mol. Neurobiol. 56, 5392–5415 (2019).
Sibley, C. R. et al. Recursive splicing in long vertebrate genes. Nature 521, 371–375 (2015).
Miller, J. A. et al. Neuropathological and transcriptomic characteristics of the aged brain. eLife 6, e31126 (2017).
Bennett, D. A. et al. Religious orders study and rush memory and aging project. J. Alzheimers Dis. 64 (s1), S161–S189 (2018).
Coupland, K. G. et al. Role of the long non-coding RNA MAPT-AS1 in regulation of microtubule associated protein tau (MAPT) expression in Parkinson’s disease. PLoS ONE 11, e0157924 (2016).
Elkouris, M. et al. Long non-coding RNAs associated with neurodegeneration-linked genes are reduced in Parkinson’s disease patients. Front. Cell. Neurosci. 13, 58 (2019).
Smit, A. F. & Riggs, A. D. MIRs are classic, tRNA-derived SINEs that amplified before the mammalian radiation. Nucleic Acids Res. 23, 98–102 (1995).
Gilbert, N. & Labuda, D. CORE-SINEs: eukaryotic short interspersed retroposing elements with common sequence motifs. Proc. Natl Acad. Sci. USA 96, 2869–2874 (1999).
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Morita, T. & Sobue, K. Specification of neuronal polarity regulated by local translation of CRMP2 and Tau via the mTOR–p70S6K pathway. J. Biol. Chem. 284, 27734–27745 (2009).
Bottley, A., Phillips, N. M., Webb, T. E., Willis, A. E. & Spriggs, K. A. eIF4A inhibition allows translational regulation of mRNAs encoding proteins involved in Alzheimer’s disease. PLoS ONE 5, e13030 (2010).
Mauro, V. P. & Edelman, G. M. The ribosome filter hypothesis. Proc. Natl Acad. Sci. USA 99, 12031–12036 (2002).
Andorfer, C. et al. Hyperphosphorylation and aggregation of tau in mice expressing normal human tau isoforms. J. Neurochem. 86, 582–590 (2003).
Nalls, M. A. et al. Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson’s disease. Nat. Genet. 46, 989–993 (2014).
Hon, C.-C. et al. An atlas of human long non-coding RNAs with accurate 5′ ends. Nature 543, 199–204 (2017).
Kapusta, A. et al. Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet. 9, e1003470 (2013).
Holcik, M. in The Oxford Handbook of Neuronal Protein Synthesis (ed. Sossin, W. S.) (Oxford Univ. Press, 2018).
Weingarten-Gabbay, S. et al. Comparative genetics. Systematic discovery of cap-independent translation sequences in human and viral genomes. Science 351, aad4939 (2016).
Paek, K. Y. et al. Translation initiation mediated by RNA looping. Proc. Natl Acad. Sci. USA 112, 1041–1046 (2015).
Mathys, H. et al. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature 570, 332–337 (2019).
Grubman, A. et al. A single-cell atlas of entorhinal cortex from individuals with Alzheimer’s disease reveals cell-type-specific gene expression regulation. Nat. Neurosci. 22, 2087–2097 (2019).
Friedman, B. A. et al. Diverse brain myeloid expression profiles reveal distinct microglial activation states and aspects of Alzheimer’s disease not evident in mouse models. Cell Rep. 22, 832–847 (2018).
Tomkins, J. E. et al. PINOT: an intuitive resource for integrating protein-protein interactions. Cell Commun. Signal. 18, 92 (2020).
Oates, M. E. et al. D2P2: database of disordered protein predictions. Nucleic Acids Res. 41, D508–D516 (2013).
Ciryam, P., Tartaglia, G. G., Morimoto, R. I., Dobson, C. M. & Vendruscolo, M. Widespread aggregation and neurodegenerative diseases are associated with supersaturated proteins. Cell Rep. 5, 781–790 (2013).
Edwards, Y. J. K., Lobley, A. E., Pentony, M. M. & Jones, D. T. Insights into the regulation of intrinsically disordered proteins in the human proteome by analyzing sequence and gene expression data. Genome Biol. 10, R50 (2009).
Sposito, T. et al. Developmental regulation of tau splicing is disrupted in stem cell-derived neurons from frontotemporal dementia patients with the 10 + 16 splice-site mutation in MAPT. Hum. Mol. Genet. 24, 5260–5269 (2015).
Shi, Y., Kirwan, P. & Livesey, F. J. Directed differentiation of human pluripotent stem cells to cerebral cortex neurons and neural networks. Nat. Protoc. 7, 1836–1846 (2012).
Hall, C. E. et al. Progressive motor neuron pathology and the role of astrocytes in a human stem cell model of VCP-related ALS. Cell Rep. 19, 1739–1749 (2017).
De Palma, M. & Naldini, L. Transduction of a gene expression cassette using advanced generation lentiviral vectors. Methods Enzymol. 346, 514–529 (2002).
Kutner, R. H., Zhang, X.-Y. & Reiser, J. Production, concentration and titration of pseudotyped HIV-1-based lentiviral vectors. Nat. Protoc. 4, 495–505 (2009).
Paxinos, G. & Franklin, K. The Mouse Brain in Stereotaxic Coordinates (Academic, 2004).
Kopec, A. M., Rivera, P. D., Lacagnina, M. J., Hanamsagar, R. & Bilbo, S. D. Optimized solubilization of TRIzol-precipitated protein permits western blotting analysis to maximize data available from brain tissue. J. Neurosci. Methods 280, 64–76 (2017).
Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods 9, 676–682 (2012).
Potter, C. J. & Luo, L. Splinkerette PCR for mapping transposable elements in Drosophila. PLoS ONE 5, e10168 (2010).
Trabzuni, D. et al. Quality control parameters on a large dataset of regionally dissected human control brains for whole genome expression studies. J. Neurochem. 119, 275–282 (2011).
Ovcharenko, I., Nobrega, M. A., Loots, G. G. & Stubbs, L. ECR Browser: a tool for visualizing and accessing data from comparisons of multiple vertebrate genomes. Nucleic Acids Res. 32, W280–W286 (2004).
Stoneley, M., Paulin, F. E., Le Quesne, J. P., Chappell, S. A. & Willis, A. E. C-Myc 5′ untranslated region contains an internal ribosome entry segment. Oncogene 16, 423–428 (1998).
Kraushar, M. L. et al. Temporally defined neocortical translation and polysome assembly are determined by the RNA-binding protein Hu antigen R. Proc. Natl Acad. Sci. USA 111, E3815–E3824 (2014).
McGlincy, N. J. & Ingolia, N. T. Transcriptome-wide measurement of translation by ribosome profiling. Methods 126, 112–129 (2017).
Adiconis, X. et al. Comparative analysis of RNA sequencing methods for degraded or low-input samples. Nat. Methods 10, 623–629 (2013).
Blazquez, L. et al. Exon junction complex shapes the transcriptome by repressing recursive splicing. Mol. Cell 72, 496–509.e9 (2018).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10 (2011).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Moll, P., Ante, M., Seitz, A. & Reda, T. QuantSeq 3′ mRNA sequencing for RNA quantification. Nat. Methods 11, i–iii (2014).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Wickham, H. Ggplot2: Elegant Graphics for Data Analysis (Springer, 2009).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Waterhouse, A. M., Procter, J. B., Martin, D. M. A., Clamp, M. & Barton, G. J. Jalview Version 2–a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191 (2009).
Lin, M. F., Jungreis, I. & Kellis, M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27, i275–i282 (2011).
Plessy, C. et al. Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan. Nat. Methods 7, 528–534 (2010).
Lizio, M. et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 16, 22 (2015).
Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Law, C. W., Chen, Y., Shi, W. & Smyth, G. K. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014).
Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).
Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 14, 128 (2013).
Wang, J., Duncan, D., Shi, Z. & Zhang, B. WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Res. 41, W77–W83 (2013).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Xia, J., Benner, M. J. & Hancock, R. E. W. NetworkAnalyst—integrative approaches for protein-protein interaction network analysis and visual exploration. Nucleic Acids Res. 42, W167–W174 (2014).
Machiela, M. J. & Chanock, S. J. LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics 31, 3555–3557 (2015).
Luisier, R. et al. Intron retention and nuclear loss of SFPQ are molecular hallmarks of ALS. Nat. Commun. 9, 2010 (2018).
Pisarev, A. V., Kolupaeva, V. G., Yusupov, M. M., Hellen, C. U. T. & Pestova, T. V. Ribosomal position and contacts of mRNA in eukaryotic translation initiation complexes. EMBO J. 27, 1609–1621 (2008).
We thank L. Wilson and A. Willis for providing pRF and pRhcvF luciferase reporter vectors; P. Fratta and A. Isaacs for suggestions and comments on the manuscript, and other members of the UK Brain Expression Consortium: S. Guelfi, K. D’Sa, M. Matarin, J. Vandrovcova, A. Ramasamy, J. A. Botia, C. Smith and P. Forabosco. This work was supported by the Reta Lila Weston Trust for Medical Research for funding to T.T.W., R.d.S. and R.S.; CBD Solutions for funding to R.d.S., R.S. and P.S.); the Medical Research Council (G0501560 to R.d.S.), Parkinson’s UK (K1212 to R.d.S.), PSP Association (R.d.S.), CurePSP (R.d.S.), Brain Research UK (R.d.S.), Alzheimer’s Research UK to R.d.S.; a BBSRC LiDo PhD studentship to F.J.; an AgeUK PhD Studentship to V.A.K.; the NIHR Queen Square Dementia BRU to S.W., E.P. and J.A.H.; the Italian Ministry of Education, University and Research Futuro in Ricerca (RBFR-0895DC) ‘Mechanisms of post-transcriptional regulation of gene expression in dementias’ to M.A.D.; University of Trento PhD studentship and an IBRO InEurope Short Stay grant to K.S.; and the MRC Sudden Death Brain Bank. This work was supported by the Francis Crick Institute which receives its core funding from Cancer Research UK (FC001002), the UK Medical Research Council (FC001002) and the Wellcome Trust (FC001002). This research was funded in part by the Wellcome Trust (4 Year Wellcome Trust Studentship to O.G.W.) and by the European Research Council under the European Union’s Seventh Framework Programme (617837-Translate to J.U.) and under the European Union’s Horizon 2020 research and innovation programme (835300-RNPdynamics to J.U.). S.W. holds a Alzheimer’s Research UK Senior Research Fellowship (ARUK-SRFEXT2020-001); R.P. holds an MRC Senior Clinical Fellowship (MR/S006591/1). This work was also supported by the UK Dementia Research Institute which receives its funding from DRI Ltd, funded by the UK Medical Research Council, Alzheimer’s Society and Alzheimer’s Research UK; Medical Research Council (award number MR/N026004/1 to J.A.H.), Wellcome Trust (award number 202903/Z/16/Z to J.A.H.), Dolby Family Fund to J.A.H., National Institute for Health Research University College London Hospitals Biomedical Research Centre funding to J.A.H.
R.S. and R.d.S. are named as inventors on Patent WO2017199041A1, which is based on this work.
Peer review information Nature thanks Anton Komar, Claes Wahlestedt and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
a, SNPs within MAPT-AS1 genomic region that are linked (R2 ≥ 0.5) to tagging SNPs from the NHGRI GWAS catalogue are reported. The specific trait associated to each tagging SNP together with the p-value from the GWAS study and their cited publications PubMed ID are shown. All p-values ≤ 5x10−8 were considered to be significant. Linkage disequilibrium (LD) correlations (R2) were calculated using LDlink1.170 for different populations. ASW: Americans of African Ancestry in SW USA; CEU: Utah Residents (CEPH) with Northern and Western European Ancestry; CHB: Han Chinese in Beijing, China; CHD: Chinese in Metropolitan Denver, Colorado; GIH: Gujarati Indians in Houston, Texas; JPT: Japanese in Tokyo, Japan; LWK: Luhya in Webuye, Kenya; MXL: Mexican ancestry in Los Angeles, California; MKK: Maasai in Kinyawa, Kenya; TSI: Toscani in Italy; YRI: Yoruba in Ibadan, Nigeria. b, For each linked SNP listed in a, the minor allele frequency (MAF) from the 1000 Genomes Project is given, together with the exon/intron location. c, Pairwise linkage disequilibrium heat map created using LDmatrix (https://ldlink.nci.nih.gov/?tab=ldmatrix). Red squares of increasing hue indicate increasing linkage disequilibrium correlation between SNPs. A physical map of the genomic region is reported together with annotated RefSeq transcripts for each gene. d, Enlarged view of the MAPT-AS1 3′-exon (in grey) containing the inverted MIRc element (in green), with two exonic linked SNPs downstream (rs17690326, rs17763596). e, Detailed scheme of the H1/H2 inversion haplotypes (hg19). All major annotated genes in the linkage disequilibrium (LD) region are coloured in blue for the H1 haplotype, and in orange for the H2 inversion haplotype, with a white arrow representing their relative orientation. Arrays of Low Copy Repeats (LCRs), delimiting the inversion region, are represented by tandem arrows. MAPT-AS1 gene is coloured in yellow.
Extended Data Fig. 2 Evolutionary conservation of t-NAT1 and -2 isoforms and MAPT-AS1 promoter region across primates.
a, b, Scheme of human t-NAT1 and t-NAT2l transcript isoforms, exons (grey), with the region of overlap with MAPT (green) and the inverted MIR element in 3′-end (red). Multiple sequence alignment of the human t-NAT1 and t-NAT2l transcripts with the genomic sequences of 10 non-human primates (baboon, bonobo, chimp, gibbon, gorilla, marmoset, mouse lemur, orangutan, rhesus, squirrel monkey). Sequences were aligned using MUSCLE 3.857, and graphically displayed using Jalview 258. Pyrimidines in cyan and purines in magenta; splice junction is highlighted in yellow. A consensus sequence is at the base of multi-alignment with bar plot representing percentage sequence identity. c, d, Multi-alignment showing sequence similarity between 3′-ends of human t-NAT1 (388-449) and t-NAT2l (510-554) and consensus MIR elements of different subfamilies (MIR3, MIR, MIRb, MIRc), as annotated by RepeatMasker. Homology regions of 62 and 45 nt respectively, are shared with the CORE-SINE, a 65 nt evolutionarily conserved domain at the centre of each MIR repeat element, schematically represented here and originally described by14. e, g, Phylogenetic trees associated to t-NAT1 and t-NAT2l multi-alignment represented in a, b, obtained with the neighbour joining method using Jalview 2. Numbers reported on each connecting line in the tree represent Jaccard distances based on pairwise sequence similarity. f, h, Negative PhyloCSF score59 (https://github.com/mlin/PhyloCSF/wiki) showing low protein-coding potential of t-NAT1 and t-NAT2l. The plots represent distribution of scores for each codon in each frame within each t-NAT isoform, across 29 mammals. i, Evolutionary conservation of MAPT-AS1 promoter region across 6 distant species (Homo sapiens, Macaca mulatta, Mus musculus, Rattus norvegicus, Canis familiaris, Bos taurus), computed using the ECR browser43. Exons: yellow, introns: orange and repeat elements: green. Peaks represent percentage of identity to the human sequence. At bottom, CAGE and nanoCAGE60 tag clusters from FANTOM4 and FANTOM5 datasets retrieved from the ZENBU genome browser61, mapped to MAPT-AS1 promoter region, on sense (blue) or antisense strand (red). Values on the y-axis represent CAGE counts normalized per million tags (tpm).
Extended Data Fig. 3 Expression of MAPT and MAPT-AS1 across brain regions and inverse correlation to tau pathology; levels and localization of endogenous MAPT mRNA is unaffected by stable expression of MAPT-AS1, whereas tau protein is increased by MAPT-AS1 with a flipped-MIR.
a, RNA-Seq read counts from8, for MAPT mRNA and MAPT-AS1 lncRNA transcripts (t-NAT2 s, t-NAT1, t-NAT2l) across 12 different regions of four independent human brains. Values represent mean counts ± s.d. CBRL, Cerebellum; FCTX, frontal cortex; HIPP, hippocampus; HYPO, hypothalamus; MEDU, medulla; OCTX, occipital cortex; PUTM, putamen; SNIG, substantia nigra; SPCO, spinal cord; TCTX temporal cortex; THAL, thalamus; WHMT white matter. b, single-molecule RNA fluorescent in situ hybridization (smRNA-FISH) showing MAPT-AS1 (green) and MAPT (grey) transcripts expressed both in nucleus (DAPI, blue) and cytoplasm of SH-SY5Y neuroblastoma cells. Representative images of n = 3 independent experiments. Scale bars represent 10 μm. c, 2d-density scatter plot of MAPT-AS1 and MAPT expression (FPKM) from post-mortem brains (Allen Brain Institute) coloured by Braak-stage. Red lines delimit middle points. Inset numbers represent samples. d, Braak-stage distributions within upper (Q2+3), lower (Q1+4), left (Q1+2) or right (Q3+4) hemi-plot as in c are significantly different (two-sided unpaired Wilcoxon Rank-Sum test). e, Cumulative proportion (y-axis) of phospho-tau immunohistochemistry (AT8-IHC, fraction of labelled pixels in ROI), phospho-tau to total-tau ratio (p-Tau/Tau ratio) and Aß42 to Aß40 ratio (aß42/aß40 ratio) (x-axis) for different Braak-stages (0-1, 2-4, 5-6). f, Cumulative proportion (y-axis) of MAPT, MAPT-AS1 and KANSL1-AS1 gene expression levels (normalized FPKM, x-axis) for different Braak-stages (0-1, 2-4, 5-6). For data in e, f, *P < 0.05, ***P < 0.001 two-sided Kolmogorov–Smirnov (KS) test, n = 377 human post-mortem brains. RNA-seq, IHC and Illuminex-immunoassay data in this analysis are from the Allen Brain Institute’s Dementia, Ageing and Traumatic Brain Injury study (http://aging.brain-map.org/)9. g, Normalized MAPT and MAPT-AS1 RNA expression levels (fold-changes) detected by RT–qPCR from SH-SY5Y cells stably expressing different deletion mutants of MAPT-AS1: t-NAT1 with flipped overlapping region (Flip), t-NAT1 with region not-overlapping with 5′UTR (Nover), t-NAT1 with overlapping region (Over), tNAT1 with deleted 5′-exon (t-NAT1Δ5′), tNAT1 with deleted 3′-exon (t-NAT1Δ3′), t-NAT2l with deleted 5′-exon (t-NAT2Δ5′), t-NAT2l with deleted 3′-exon (t-NAT2Δ3′). Values are normalized to cells stably transfected with an empty vector (Empty). Data represent independent SH-SY5Y clones stably expressing each construct (n = 3 for Empty, n = 4 for Flip, Nover and Over, mean ± s.d.; two-sided Kruskal–Wallis with Dunn’s multiple comparison test). h, Both full-length (FL) and mutants with deleted MIR element (ΔM) of MAPT-AS1 localize to both cytosol and nucleus without altering the nucleo-cytoplasmic distribution of MAPT mRNA as detected by RT–qPCR. (data represent independent SH-SY5Y clones stably expressing each construct: n = 3 Empty, n = 3 t-NAT1-FL, n = 6 t-NAT1-ΔM, n = 3 t-NAT2-FL, n = 6 t-NAT2-ΔM, mean ± s.d.; two-sided Kruskal–Wallis with Dunn’s multiple comparison test). i, Quantitative expression of human MAPT-AS1 and MAPT transcripts measured by RT–qPCR (2-ΔΔCt) in sub-cellular fractions of SH-SY5Y cells, (n = 3 independent experiments, mean ± s.d.). j, Quantification of immunoblots probed with anti-tau and anti-β-actin antibodies. Protein lysates (20μg) from independent clones of SH-SY5Y cells stably expressing different MAPT-AS1 splice-isoforms, either full-length (t-NAT1-FL, t-NAT2-FL), with deleted MIR (t-NAT1-ΔM, t-NAT2-ΔM) or with a flipped MIR repeat (t-NAT1-Mflip). For each construct, total tau was normalized to β-actin levels quantified using ImageJ (n = 6 independent stable clones, mean ± s.d.; one-way ANOVA with Dunnett’s test). As with the whole deletion of MIR (t-NAT1-ΔM), flipped MIR (t-NAT1-Mflip, delimited by red lines) increases tau protein.
Extended Data Fig. 4 Characterization of human induced pluripotent stem cell-derived cortical and motor neurons.
a, Control-1 (male) human iPSCs (hiPSCs) differentiated into cortical neurons using dual SMAD inhibition followed by specification of both deep- and upper-layer cortical excitatory neurons34. Neural rosettes at 20 days in vitro (DIV) express cortical progenitor markers PAX6 and OTX2, proliferation marker ki67 and neuronal marker TUJ1. By 100DIV, terminally differentiated neurons express βIII-tubulin, and later-born upper-layer neurons express SATB2 and BRN2. Scale bars = 20μm, n = 3 independent experiments. b, Quantitative expression of MAPT and MAPT-AS1 (t-NAT1, t-NAT2 s, t-NAT2l) in 3 independent inductions of hiPSC-derived cortical neurons (from 0 to 100DIV, one male healthy donor) measured by RT–qPCR (2-ΔΔCt/2-ΔΔCtmax). c, hiPSCs (control-1 and control-3), differentiated into motor neurons (MNs) using a previous established protocol71, were immunostained for NPC and MN markers and imaged by the Opera-Phenix (PerkinElmer). Images were acquired and quantified using Columbus v18.104.22.168890. NPCs at 18DIV express OLIG2 and NKX6.1, whereas 25DIV MNs express SMI32 and choline acetyltransferase (ChAT), bar graphs on the right (mean ± s.e.m., n = 23 (NKX6.1), n = 27 (OLIG2), n = 29 (SMI32), n = 22 (ChAT) imaged wells across 3 different lines, scale bars: 20μm). d, ICC images of MNs (26DIV), immunolabelled with the TUJ1, total-tau and DAPI after transduction with lentivirus (MOI 10), expressing shRNAs targeting either the exon-4 of MAPT-AS1 (shEx4) or Renilla luciferase ORF as a negative control (shRen) (mean ± s.d. n = 3 for control-1 and control-2 iPSC-MNs, scale bars: 40μm). Relative tau levels normalized to TUJ1 measured as ratio of integrated densities is compared between the two groups as reported in bar graph on right (unpaired two-tailed t-test). e, western blot of MNs (26-28DIV) from two healthy controls, transduced with LV-shRen (n = 5) or LV-shEx4 (n = 6), probed with anti-total-tau and anti-GAPDH antibodies. Quantification is shown on the right (mean ± s.d. *P < 0.05, two-sided Wilcoxon-test).
Extended Data Fig. 5 MAPT-AS1 represses tau IRES-mediated translation in a MIR-dependent manner, with no effect on MAPT 3′-UTR and no major off-targets.
a, Reported secondary structure of MAPT 5′UTR (-242 to -1 relative to AUG)3. Domains 1 and 2 and 5′-TOP motif of tau-IRES are indicated and a blue line denotes overlap with t-NAT1 (5′-exon position 88-163). b, Relative abundance of MAPT-AS1, MAPT and β-actin mRNAs in polysomal fractions from cells stably expressing FL or ΔM MAPT-AS1 isoforms (mean ± s.d.). Absorbance profiles (254 nm) are in background. c, Relative abundance of MAPT mRNA in fraction pools corresponding to 40-60S, 80S, light, medium or heavy polysomes. FL but not ΔM t-NAT1 or t-NAT2 significantly reduced MAPT mRNA association with heavy polysomes (n = 3 Empty, n = 4 t-NAT1FL, n = 6 t-NAT1ΔM, n = 3 t-NAT2FL, n = 5 t-NAT2ΔM in b, c) (mean ± s.e.m., one-way ANOVA with Holm-Sidak’s test; two points outside of axes in c). d, pRTF or pRF construct with pcDNA3.1 empty vector, t-NAT1 full-length (FL) or with deleted MIR (t-NAT1-ΔM) were co-transfected into SH-SY5Y cells and relative luciferase levels measured after 48 h. Significant reduction of tau-IRES activity (Fluc/Rluc ratio) was detected in cells expressing t-NAT1-FL, but not t-NAT1-ΔM, resulting in significant increase in MAPT IRES-mediated cap-independent translation. Similarly, t-NAT2l-FL repressed MAPT IRES activity, whereas t-NAT2l-ΔM with deleted MIR, had no such effect. Data in d represent mean ± s.d., n = 3 independent experiments (**P < 0.01, *P < 0.05, one-way ANOVA with Dunnett’s test). e, Schematic representation of luciferase constructs (pMIR-reporter) to study MAPT-AS1 effects on MAPT 3′-UTR following co-transfection in SH-SY5Y cells. Either the full-length (FL) or 3 partially overlapping fragments (Fr1, Fr2, Fr3) of MAPT 3′-UTR were cloned downstream to the Firefly luciferase ORF. f, Top, firefly luciferase (Fluc) normalized to Renilla luciferase (Rluc) was quantified in SH-SY5Y cells co-transfected with either an empty pcDNA3.1 vector or different variants of t-NAT1 lncRNA (n = 3 independent experiments). f, Bottom, Fluc/Rluc ratio was quantified in SH-SY5Y cells co-transfected with either empty vector or different variants of t-NAT2l lncRNA (n = 3 independent experiments). In all cases differences were not statistically significant except for t-NAT1-Δ3′ (one-way ANOVA with Dunnet’s test). g, Representative genome-wide metaplot of ribosome density over protein-coding mRNAs; a large majority of reads align as expected with 5′UTR and CDS, with a minority at 3′UTRs. RIBO-seq libraries were from 3 independent SH-SY5Y clones stably expressing each MAPT-AS1 variant or an empty vector (n = 17). h, Bar plot of the relative number of RIBO-seq reads with 5′-end in each reading frame, showing periodicity of ribosome footprints (RFPs) (n = 17). i, RIBO-seq volcano plot showing differentially translated genes in SH-SY5Y cells stably expressing full-length t-NAT1 (FL) compared to those with empty vector (Empty). Vertical red line in correspondence of MAPT (log2(fold change) = −1.45, P = 0.036, DESeq2 Wald test) shows that few other genes are similarly depleted of RFPs, with only 6 (gene symbols in grey) having a total of 170 counts across 17 samples (a sample was excluded due to barcode cross-contamination with an unrelated CLIP library in the same sequencing run), but none with an adjusted significant p-value. j, QuantSeq volcano plot showing differentially expressed genes in SH-SY5Y cells stably expressing full-length t-NAT1 (FL) compared to cells with empty vector (Empty). MAPT (red) mRNA levels not significantly different. Only genes with at least 1,000 read counts across 18 samples are named by their symbol (grey), although their adjusted p-values were not significant. Only three genes show a significant downregulation at the mRNA level (in blue, adjusted p-value <0.05), likely representing transcriptional off-targets. P-values in i, j, were computed by DESeq2 using the Wald test with Bonferroni multiple comparison correction.
Extended Data Fig. 6 Distribution of 7-mer MIR-complementary motifs along the human 18S rRNA secondary structure.
Human 18S ribosomal RNA secondary structure as retrieved from (http://apollo.chemistry.gatech.edu/RibosomeGallery/) is divided into an “active region” (red) and an “inactive region” (grey). As described24, active region is enriched for motifs able to mediate 40S ribosome recruitment through direct mRNA-rRNA interactions with 5′-UTRs of about 10% of human genes. Here, the 18S rRNA secondary structure is superimposed with 7-mers of complementary motifs (black dots) contained within each MIR embedded in MIR-NATs overlapping with 5′-UTRs of PC genes. Only 7-mers complementary to the 18S active region are shown. The 7-mer motifs represented here map to both the MIR elements within antisense MIR-NATs and the 5′-UTRs of the respective target genes, as reported in detail in Supplementary Table 8. Matching positions of MIR motif-1 and -2 from MAPT-AS1 are reported (blue lines). 18S rRNA helices previously reported by Pisarev et al.72 to interact with mRNA regions upstream (yellow ovals) or downstream (salmon ovals) to the AUG start codon are indicated.
Extended Data Fig. 7 MIR-NATs S-AS pairs within networks of interacting proteins, enriched for NDD-genes.
a, MIRs are more frequent in lncRNAs than mRNAs (5′UTR, 3′UTR, CDS). b, 1,197 GENCODE v19 MIR-NATs form S-AS pairs with 1,045 protein-coding (PC) genes: 40.69% overlap 5′UTR, 32.50% overlap CDS and 26.81% overlap 3′UTR. c, PC-genes with 5′UTR-overlapping MIR-NATs (n = 630) are more expressed in human brain (log10 FPKM) compared to genes with 3′UTR (n = 392) or CDS (n = 474) overlaps. Box plot: median with upper and lower quartiles; whiskers, values outside of interquartile range; points represent outliers (Welch two-sample t-test; one-way ANOVA across all gene-regions P = 0.0214). d, Enriched cellular components and disease GO-terms ranked by Enrichr. 5′UTR-overlapping genes significantly associate with dementia (one-sided Fisher’s exact test p-values combined with z-scores, Supplementary Table 3). e, MIR-NATs cognate PC-genes sorted by their overlap (3′UTR, 5′UTR, CDS) form networks of interacting proteins (coloured seeds), computed using PINOT29, and are associated with neurodegenerative diseases, enriched within 5′UTR network (P = 1.5x10−4, 100,000 random simulations pnorm).
Extended Data Fig. 8 Brain RNA-seq co-expression analysis. Genes paired with antisense MIR-NATs have significantly more structured 5′- and 3′-UTRs.
a, Co-expression heat maps representing distribution of RNA-seq read counts for 100 most abundant MIR-NAT target protein-coding genes (left panel) and 100 most abundant MIR-NAT genes (right panel), both hierarchically clustered based on their expression level in 12 different regions of 4 independent post-mortem brains from healthy human donors. Genes are clustered on y-axis. Brain regions on x-axis (CBRL, Cerebellum; FCTX, frontal cortex; HIPP, hippocampus; HYPO, hypothalamus; MEDU, medulla; OCTX, occipital cortex; PUTM, putamen; SNIG, substantia nigra; SPCO, spinal cord; TCTX temporal cortex; THAL, thalamus; WHMT, white matter). For each brain region, 4 independent brain samples are represented in each column. A colour key with histogram relative to each heat map, have z-values associated to each colour on the x-axis and RNA-seq counts on the y-axis. The histogram represents distribution of the RNA-seq counts for each z-value. b, Similar co-expression heat maps, as in a, representing 1,045 MIR-NAT target protein-coding genes (on the left side) and 1,197 antisense MIR-NAT genes (on the right side). c, Pie chart showing the percentage of MIR-NAT S-AS pairs annotated in GENCODE v19 and with 5′-UTR overlap, sorted by their Pearson’s correlation coefficient. The majority of S-AS pairs show positive correlations. d, Histogram representing frequency of occurrence for 1,197 MIR-NAT S-AS pairs in bins of Pearson’s correlation (from -1 to +1 in bins of 0.05). All MIR-NAT S-AS are visualized together, irrespective of their pattern of overlapping. MAPT-AS1-MAPT correlation coefficient is indicated. e, f, 3′-UTR (e) or 5′-UTR (f) minimum free energy (MFE), normalized by its length was computed using RNAfold 2.1.9 for each protein-coding gene in the human genome (hg19), and sorted based on their respective type of lncRNA overlap. Box plot presents median, upper and lower quartile boundaries for each group of protein-coding (PC) genes. PC genes pairing with MIR-NATs have both 3′-UTR and 5′-UTR significantly more structured than PC genes without lncRNA overlap (***, P < 0.0001 one-way ANOVA followed by Dunnett’s test). PC gene groups are as follows: PC genes overlapping antisense with MIR-NAT, ‘PC-MIRlncRNA’; PC genes overlapping with any lncRNA without embedded MIR repeat, ‘PC-lncRNA-NOMIR’; all PC genes with any overlapping lncRNA, ‘PC-lncRNA’; MIR-NATs, ‘MIRlncRNA’; PC genes without lncRNA overlap, ‘PC-NO-lncRNA’.
Extended Data Fig. 9 Majority of genes targeted by antisense MIR-NATs interact in a PPI network and are enriched for neurodegenerative disease-associated and immune system-associated genes.
a, Protein–protein interaction (PPI)-network obtained from literature-curated interaction data from InnateDB database, using 392 seed proteins participating in S-AS pairs with MIR-NATs. Genes coding proteins associated with neurodegenerative diseases, represented as red-filled circles, are significantly enriched in network (P = 1.63 × 10−8, Benjamini-Hochberg FDR using WebGestalt). Only primary interactions are represented in a zero-degree interaction network generated with NetworkAnalyst tool69. Self-interactions are not considered. b, Schematic structures of representative genes pairing with antisense MIR-NATs and involved in different neurodegenerative diseases. GENCODE v19 annotated isoforms of the human SNCA, APP, MBNL1 and SLC1A2 genes and respective overlapping antisense MIR-NAT. MIR elements within each lncRNA are indicated (red). c, Protein–protein interaction (PPI)-network obtained from literature-curated interaction data from InnateDB database, using 392 seed proteins participating in S-AS pairs with MIR-NATs. Genes encoding proteins associated with either the immune system (green) or innate immune system (blue), are significantly enriched into the network (respectively P = 0.0041, P = 0.0328, Benjamini-Hochberg FDR using NetworkAnalyst). Only primary interactions are represented in a zero-degree network generated using NetworkAnalyst tool69. Self-interactions are not considered. d, Gene expression heat map for 487 protein-coding genes with 5′-UTR overlapping with antisense MIR-NATs in 126 normal human tissues, from 557 publicly available microarray datasets, retrieved from the Enrichment Profiler Database (http://xavierlab2.mgh.harvard.edu/EnrichmentProfiler/index.html). Genes are clustered on y-axis and tissues are clustered on x-axis. Scale bar at bottom indicates colours associated to each z-score in the expression heat map. e, Scheme of the PLCG1 and PLCG1-AS genes is reported (hg19); the inverted MIRb is in red. Immunoblots of 6 independent SH-SY5Y clones stably expressing either empty vector (Empty), PLCG1-AS full-length (FL) or with whole inverted MIRb deleted (ΔM), probed with anti-PLCG1 and β-actin antibodies. f, PLCG1 protein level is reduced in cells expressing FL- but not ΔM-PLCG1-AS as quantified in the graph (n = 6 independent stable SH-SY5Y clones for each construct, mean ± s.d., *P < 0.05; one-way ANOVA with Dunnett’s test). g, PLCG1 mRNA expression level from bulk RNA-seq of temporal cortex (TC) and prefrontal cortex (PFC) from the Mayo Clinic (n = 160) and ROS-MAP (n = 632) datasets, respectively, is significantly increased in AD patients (AD) compared to asymptomatic AD (AsymAD) and healthy controls (Control), (box-plots: midpoints, medians; boxes, 25th and 75th percentiles; whiskers, minima and maxima; two-sided Wilcoxon-test) (data from http://swaruplab.bio.uci.edu:3838/bulkRNA/). Control samples were classified as Braak stage 0-I. Early-stage pathology samples were defined as Braak stage II-IV and CERAD score of possible AD, while late-stage pathology samples were Braak stage V-VI and CERAD score of probable and definite AD.
Extended Data Fig. 10 446 genes targeted by MIR-NATs contribute to the transcriptional signature of Alzheimer’s disease.
a, Meta-analysis of snRNA-seq from Mathys (M), Grubman (G) and bulk RNA-seq from Friedman (GSE95587) datasets: rows are 446 MIR-NAT differentially expressed genes (DEG): 38 NDD-genes and 69 lncRNAs. DEGs across datasets partially overlap with 65 (27.7% up, 72.3% down) within Mathys, 160 (48.1% up, 51.9% down) within Grubman and 307 (58% up, 42% down) within Friedman datasets. Cell types: excitatory neurons (Ex), inhibitory neurons (In), neurons (Neu), astrocytes (Ast), oligodendrocytes (Olig), oligodendrocyte precursors (OPC), microglia (Mic), hybrid cells (Hyb), endothelial (Endo), unidentified cells (Unid). DEG counts are log2(mean gene expression in AD-pathology/mean gene expression in no-pathology) > 0.25 (two-sided Wilcoxon rank-sum test FDR <0.01 and Poisson mixed-model FDR <0.05, Mathys; two-sided Wilcoxon rank-sum test, FDR <0.05, Grubman and GSE95587). Annotations: gene-type (biotype), NDD-genes in DisGeNET database (disease), MIR orientation (MIR), S-AS region (overlap), percentage of protein IDRs by 75% of D2P2 predictors (disorder), number of protein–protein interactors (degree).
Extended Data Fig. 11 Majority of genes targeted by MIR-NATs are enriched for interacting intrinsically disordered proteins (IDPs).
a, Extended protein–protein interaction (PPI)-network from experimentally validated interaction data from various databases mined by PINOT29, using 760 nonredundant seed proteins participating in S-AS pairs with MIR-NATs. 399 seeds (40.3%) are genes encoding for IDPs with more than 90% IDRs, represented as red-filled circles, are significantly enriched into the network (P = 0.0096, 100,000 random simulations in R, with Bonferroni correction, details in Supplementary Table 7). Only first-degree interactions are represented. Percentage of sequence predicted to span intrinsically disordered regions (IDRs) by at least 75% of the 9 algorithms from the D2P2 database30 is colour coded from blue (0–30%) to red (>90%). b, 11 NDD-hub proteins in the above network are presented in this zoom-in view: (APP, ATP13A2, DCTN1, GABARAPL1, HSP90AA1, MAPT, MATR3, PLCG1, SNCA, SRRM2, VIM). c, Topological properties of extended PPI network, computed by Cytoscape68.
Uncropped images for protein gels.
List of oligonucleotides and probes used.
List of HGNC symbols and gene description for 1,045 protein-coding genes forming sense-antisense pairs with MIR-NATs.
Diseases and cellular components GO-terms significantly enriched for protein-coding genes forming sense-antisense pairs with MIR-NATs, for each region of overlap.
List of HGNC symbols, UNIPROT ID and gene description for protein-coding genes with 5’UTR-overlapping MIR-NATs.
List of HGNC symbols, UNIPROT ID and gene description for protein-coding genes with CDS-overlapping MIR-NATs.
List of HGNC symbols, UNIPROT ID and gene description for protein-coding genes with 3’UTR-overlapping MIR-NATs.
List of 989 Ensembl genes sorted by their percentage of IDR, and overrepresentation statistical analysis by 100,000 random simulations for 0-30% IDR, 50-90% IDR, >90% IDR.
k-mers matching the “active region” of 18S rRNA and embedded in each MIR-NAT transcript.
List of all abbreviations used in the manuscript.
About this article
Cite this article
Simone, R., Javad, F., Emmett, W. et al. MIR-NATs repress MAPT translation and aid proteostasis in neurodegeneration. Nature 594, 117–123 (2021). https://doi.org/10.1038/s41586-021-03556-6