Somatic mutations are the driving force of cancer genome evolution1. The rate of somatic mutations appears to be greatly variable across the genome due to variations in chromatin organization, DNA accessibility and replication timing2,3,4,5. However, other variables that may influence the mutation rate locally are unknown, such as a role for DNA-binding proteins, for example. Here we demonstrate that the rate of somatic mutations in melanomas is highly increased at active transcription factor binding sites and nucleosome embedded DNA, compared to their flanking regions. Using recently available excision-repair sequencing (XR-seq) data6, we show that the higher mutation rate at these sites is caused by a decrease of the levels of nucleotide excision repair (NER) activity. Our work demonstrates that DNA-bound proteins interfere with the NER machinery, which results in an increased rate of DNA mutations at the protein binding sites. This finding has important implications for our understanding of mutational and DNA repair processes and in the identification of cancer driver mutations.
This is a preview of subscription content
Subscription info for Chinese customers
We have a dedicated website for our Chinese customers. Please go to naturechina.com to subscribe to this journal.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Yates, L. R. & Campbell, P. J. Evolution of the cancer genome. Nature Rev. Genet. 13, 795–806 (2012)
Schuster-Böckler, B. & Lehner, B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature 488, 504–507 (2012)
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013)
Polak, P. et al. Reduced local mutation density in regulatory DNA of cancer genomes is linked to DNA repair. Nature Biotechnol. 32, 71–75 (2014)
Polak, P. et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature 518, 360–364 (2015)
Hu, J., Adar, S., Selby, C. P., Lieb, J. D. & Sancar, A. Genome-wide analysis of human global and transcription-coupled excision repair of UV damage at single-nucleotide resolution. Genes Dev. 29, 948–960 (2015)
Gao, S., Drouin, R. & Holmquist, G. P. DNA repair rates mapped along the human PGK1 gene at nucleotide resolution. Science 263, 1438–1440 (1994)
Conconi, A., Liu, X., Koriazova, L., Ackerman, E. J. & Smerdon, M. J. Tight correlation between inhibition of DNA repair in vitro and transcription factor IIIA binding in a 5S ribosomal RNA gene. EMBO J. 18, 1387–1396 (1999)
The International Cancer Genome Consortium International network of cancer genome projects. Nature 464, 993–998 (2010)
The Cancer Genome Atlas Research Network The Cancer Genome Atlas pan-cancer analysis project. Nature Genet. 45, 1113–1120 (2013)
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013)
Supek, F. & Lehner, B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521, 81–84 (2015)
Hara, R., Mo, J. & Sancar, A. DNA damage in the nucleosome core is refractory to repair by human excision nuclease. Mol. Cell. Biol. 20, 9173–9181 (2000)
Yazdi, P. G. et al. Increasing nucleosome occupancy is correlated with an increasing mutation rate so long as dna repair machinery is intact. PLoS ONE 10, e0136574 (2015)
Tolstorukov, M. Y., Volfovsky, N., Stephens, R. M. & Park, P. J. Impact of chromatin structure on sequence variability in the human genome. Nature Struct. Mol. Biol. 18, 510–515 (2011)
Tornaletti, S. & Pfeifer, G. P. UV damage and repair mechanisms in mammalian cells. Bioessays 18, 221–228 (1996)
Reijns, M. A. et al. Lagging-strand replication shapes the mutational landscape of the genome. Nature 518, 502–506 (2015)
Katainen, R. et al. CTCF/cohesin-binding sites are frequently mutated in cancer. Nature Genet. 47, 818–821 (2015)
Fredriksson, N. J., Ny, L., Nilsson, J. A. & Larsson, E. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nature Genet. 46, 1258–1263 (2014)
Martincorena, I. et al. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015)
Marteijn, J. A., Lans, H., Vermeulen, W. & Hoeijmakers, J. H. J. Understanding nucleotide excision repair and its roles in cancer and ageing. Nature Rev. Mol. Cell Biol. 15, 465–481 (2014)
Tornaletti, S. & Pfeifer, G. P. UV light as a footprinting agent: modulation of uv-induced dna damage by transcription factors bound at the promoters of three human genes. J. Mol. Biol. 249, 714–728 (1995)
Gale, J. M., Nissen, K. A. & Smerdon, M. J. UV-induced formation of pyrimidine dimers in nucleosome core DNA is strongly modulated with a period of 10.3 bases. Proc. Natl Acad. Sci. USA 84, 6644–6648 (1987)
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)
Goodman, M. F. & Woodgate, R. Translesion DNA polymerases. Cold Spring Harb. Perspect. Biol. 5, a010363 (2013)
Nouspikel, T. DNA repair in mammalian cells. Cell. Mol. Life Sci. 66, 994–1009 (2009)
Wyrick, J. J. & Roberts, S. A. Genomic approaches to DNA repair and mutagenesis. DNA Repair (Amst.) 36, 146–155 (2015)
Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013)
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012)
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015)
Sherwood, R. I. et al. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nature Biotechnol. 32, 171–178 (2014)
Derrien, T. et al. Fast computation and applications of genome mappability. PLoS ONE 7, e30377 (2012)
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. 57, 289–300 (1995)
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010)
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2014)
Haradhvala, N. J. et al. Mutational strand asymmetries in cancer genomes reveal mechanisms of DNA damage and repair. Cell 164, 538–549 (2016)
We acknowledge funding from the Spanish Ministry of Economy and Competitiveness (grant number SAF2012-36199), the Marató de TV3 Foundation, and the Spanish National Institute of Bioinformatics (INB). R.S. is supported by an EMBO Long-Term Fellowship (ALTF 568-2014) co-funded by the European Commission (EMBOCOFUND2012, GA-2012-600394) support from Marie Curie Actions. A.G.-P. is supported by a Ramón y Cajal contract (RYC-2013-14554).
The authors declare no competing financial interests.
Extended data figures and tables
The mutation rate is higher in active TFBS (bound by their TF and overlapping DHS; bound-DHS, red line) compared to: (i) inactive TFBS (not overlapping any DHS; bound-noDHS, blue line); and (ii) unbound inactive TFBS (not bound by TF and not overlapping any DHS; unbound-noDHS, orange line). The binding sites considered here correspond to the subset of TFs (n = 58) for which both the bound and unbound motif predictions are available from the ENCODE integrative analysis24. For comparison purposes, we sampled an equal number of unbound-noDHS TFBS (unbound-noDHS sampled, brown line) as in the set of bound-DHS, and confirmed that the mutation rate is still higher in the bound TFBS. The background mutation rates of each group are represented as black lines. The zero coordinate in the x axis corresponds to the TFBS mid-point, and the magenta line above it represents the average size of TFBS.
Extended Data Figure 2 Elevated mutation rate at the binding sites of individual transcription factors in melanoma.
Here, we show the mutation rate of the TFBS of all TFs with at least 1,000 binding sites overlapping melanocytes DHS. The observed mutation rate is shown in red (light colour in the background corresponds to the actual data points, and the thick solid line on top is the best-fit spline), while the background mutation rate is represented by the black line. The zero coordinate in the x axis corresponds to the TFBS mid-point, and the magenta line above it represents the average size of TFBS.
a–c, Mutation rate centred in DHS sites in melanomas is shown for all DHS genome wide (a), a subset of DHS regions overlapping promoters (2.5kb from TSS) (b) and DHS regions outside promoters (c). Within b and c, the first row shows the mutation rate in regions that do not contain sequences of any overlapping TFBS (noTFBS), neither predicted TFBS (from PIQ31, corresponding to 1,284 different motifs) or known TFBS (mapped from ENCODE28 ChIP-seq analysis, corresponding to 109 TFs). The second row contains only predicted TFBS (predTFBS), removing any sequences that overlap the known TFBS. The third row contains the subset of sequences that overlap with all predicted TFBS, without removing the known ones (predTFBSall). The last row contains the subset of sequences with known TFBS. The barplot at the right of each panel compares the mutation rate in the DHS and the flank for each group of regions, and the P value (from chi-square test) shows the enrichment of mutation rate between two groups. The increase in predicted TFBS is, as expected, lower than that observed within the TFBS mapped by ENCODE (DHS-promoter-TFBS), reflecting the lower precision in the mapping of the predictions compared to mapping by ChIP-seq. The zero coordinate in the x axis corresponds to the DHS peak mid-point, and the magenta line above it represents the average size of DHS (~150 nt).
Mutation rate around TFBS plotted alongside the average repair of two types of UV-light induced DNA damage—CPD and 6–4PP in wild-type NHF1 cell line of skin fibroblasts and the CS-B mutant cell line for proximal (left column) and distal (right column) TFBS in a. Also, a lower level of nucleotide excision repair is observed at the binding sites of individual transcription factors. For example, the results for CTCF, ETS1, IRF1 and TAF1 are shown in b. In both a and b, the observed mutation rate is shown in red (light colour in the background corresponds to the actual data points, and the thick solid line on top is the best-fit spline). The two top rows show the CPD repair on NHF1 and CS-B cells, respectively and the two bottom rows show the 6–4PP repair on NHF1 and CS-B cells, respectively. Here the average repair levels are shown separately for the forward and reverse strands of the genome (as obtained from ref. 6).
Extended Data Figure 5 The level of nucleotide excision repair, and the resulting mutation rate in TFBS correlate with the strength of the binding signal of transcription factors to their sites.
Regions around TFBS sites were obtained from ref. 17. As in ref. 17, the binding sites were classified into four quartiles (low to high) using the ChIP-seq read coverage that reflects the strength of binding or occupancy. The binding sites in the ‘high’ quartile (fourth column) tend to bear higher mutation rates at the centre (correlating with lower repair) compared to the ‘low’ quartile (first column). The nucleotide excision repairs of two photoproducts (CPD and 6–4PP) shown here are from NHF1 wild-type cell line. Average repair levels are shown separately for the forward and reverse strands of the genome (as obtained from ref. 6).
a–c, The distribution of nucleotide excision repair, for the two types of UV-light induced DNA damages, is shown for all DHS genome-wide (a), DHS regions overlapping promoters (2.5 kb from TSS) (b) and DHS regions outside promoters (c). Within b and c the first column shows the mutation rate in regions that do not contain sequences of any overlapping TFBS (noTFBS), neither predicted TFBS (from PIQ31, corresponding to 1,284 different motifs) or known TFBS (mapped from ENCODE28 ChIP-seq analysis, corresponding to 109 TFs). The second column contains only predicted TFBS (predTFBS), removing any sequences that overlap the known TFBS. The third column contains the subset of sequences that overlap with all predicted TFBS, without removing the known ones (predTFBSall). The last column contains the subset of sequences with known TFBS. The two top rows in a, b and c show the CPD repair on NHF1 and CS-B cells, respectively and the two bottom rows show the 6–4PP repair on NHF1 and CS-B cells, respectively. Here average repair levels are shown separately for the forward and reverse strands of the genome (as obtained from ref. 6). The zero coordinate in the x axis corresponds to the DHS peak mid-point, and the magenta line above it represents the average size of DHS (~150 nt).
To carry out this analysis, TFBS overlapping transcribed regions (located 200–500 bp downstream of TSS) were centred at the TFBS mid-point. We plotted the mutation and repair rates of UV induced damages (CPD and 6–4PP) in XP-C cells, which possess only transcription-coupled repair capability. TFBS in either strand were separated: those in the template strand of the gene are shown in the left panel, while those in the non-template strand are presented in the right panel. All TFBS and their flanking regions are shown in the same orientation (5′ to 3′). This result shows that TF binding to both strands results in lower transcription-coupled NER activity.
Mutation rates around TFBS of promoter regions of lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), and colorectal cancer (CRC) are shown. CRC samples are separated into two groups, those with missense mutations of the DNA polymerase epsilon (POL-E) gene or hypermutated (n = 8 samples) and the rest or hypomutated (n = 34 samples). In the left column, the mutation rate is shown for active TFBS that overlap DHS sites (red line) and inactive TFBS that do not overlap DHS (green line). The right column graphs present the mutation rate of six different changes separately in active TFBS. In lung cancers (LUAD and LUSC), C > A changes, caused by tobacco carcinogens, contributes more to the elevated mutation rate, which indicates that NER activity is lower at these active TFBS.
Overrepresentation of mutations at TFBS as compared to their immediate flanking regions for different cancer types and mutational signatures. The mutational process/signatures specific to each cancer type are defined as in ref. 36: UV-light associated signature (C>T) in melanoma (SKCM), tobacco smoking associated signature (C>A) in lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), mutated POL-E associated signatures (T(C>A)T, T(C>T)G) in colorectal samples, and APOBEC associated mutational signature (T(C>G)T, T(C>G)A, T(C>T)T, T(C>T)A) in breast (BRCA), bladder (BLCA) and head and neck squamous cell carcinomas (HNSC). Mutations in each sample that don’t follow the aforementioned mutational signatures are grouped into one class (referred to as ‘others’) for each cancer type. The log2 fold change in the x axis represents how much higher (positive fold change) or lower (negative fold change) than the expected the observed mutation rate in TFBS is; the corresponding significance value (derived from a chi-square test) is shown on the y axis for each cancer type-signature combination. These results show that the only tumour samples with mutations clearly overrepresented at TFBS are lung carcinomas and melanomas. In both cases it is the predominant mutational signature, induced by the external mutagenic agent (UV-caused C > T mutations in melanomas, and tobacco-caused C > A mutations in lung carcinomas) which causes originally bulky lesions in the DNA that are repaired by NER. In contrast, no increment of the mutation rate in TFBS is observed in colon adenocarcinomas, where NER activity is not expected to play a major role in the mutational process, and only a modest increment is detected in other tumour types. Note that given the small number of whole-genome samples available and the lower mutational burden of breast, bladder and head and neck tumours compared to melanomas, lung carcinomas and colorectal tumours (Extended Data Table 1), the results for these tumour types should be taken with caution. Future analyses with larger cohorts of whole genomes, which would also allow a more accurate and specific separation of mutations by mutational processes should shed clearer light on this question.
This file contains the results of mutation rate enrichment at the binding sites of individual TFs in melanoma. (XLSX 16 kb)
This file contains the results of sample-wise analysis of mutation rate enrichment at the active TFBS for 38 melanoma samples and one normal skin sample. (XLSX 9 kb)
About this article
Cite this article
Sabarinathan, R., Mularoni, L., Deu-Pons, J. et al. Nucleotide excision repair is impaired by binding of transcription factors to DNA. Nature 532, 264–267 (2016). https://doi.org/10.1038/nature17661
Functional and genetic determinants of mutation rate variability in regulatory elements of cancer genomes
Genome Biology (2021)
Nature Reviews Cancer (2021)
Nature Communications (2021)
npj Genomic Medicine (2021)