Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), the causative agent of coronavirus disease 2019 (COVID-19), emerged in late 2019 and has since caused a pandemic. Although there have been extensive studies worldwide, our understanding of this newly emerged pathogen is far from sufficient. The pathogenesis of the SARS-CoV-2 infection is not fully understood, although a “two-stage” hypothesis was proposed in our previous study.1
As a member of enveloped virus in family Coronaviridae, SARS-CoV-2 makes use of a densely glycosylated spike (S) protein to gain entry into host cells. The S protein is a trimeric class I transmembrane protein composed of two functional subunits. The S1 subunit binds to cellular angiotensin-converting enzyme 2 (ACE2) for host cell recognition, and the S2 subunit functions in viral–host cell membrane fusion. The S protein is the most attractive immunogen for eliciting antibody responses and is therefore the primary focus for neutralizing antibody and vaccine development.
The glycosylation of viral envelope proteins has a wide range of functions, including regulating viral tropism, protein stability and shielding the underlying epitopes from immune surveillance. Thus, a full understanding of the glycosylation of SARS-CoV-2 S protein is critical to reveal the pathogenesis of the virus and to guide the design of therapeutic and prophylactic strategies. A total of 22 N-glycosites were mapped in the in vitro-expressed S protein ectodomain and the S protein extracted from virions.2 Due to the technical limitations, only a few O-glycosylation modifications were confirmed on purified S protein2,3,4,5 and none has been reported on the S protein extracted from SARS-CoV-2 virions, which is the most representative antigen on virions.
To obtain a comprehensive N- and O-linked glycosylation landscape of the S protein at its native status including glycosites, glycoforms, and the relative intensity, we extracted S protein from the SARS-CoV-2 virions and purified recombinant full-length wild-type (WT) S protein expressed in human embryonic kidney 293T cells (Supplementary information, Fig. S1a, b). To generate glycopeptides and ensure maximum coverage of the protein sequence, the S protein was digested separately with chymotrypsin, α-lytic endopeptidase or LysC-trypsin. The glycopeptides were analyzed using nano liquid chromatography (nLC) coupled with an ultra-high resolution Orbitrap Eclipse Tribrid mass spectrometer, and stepped collisional energy (SCE) HCD and HCDpdEThcD were applied for fragmentation. The data were processed by software Byonic (v3.8.13, Protein Metrics Inc., Cupertino) and Byologic glyco-analysis software (v3.8-11 ×64, Protein Metrics Inc., Cupertino). We specifically applied multiple approaches for the O-glycosylation analysis and site confirmation. First, an additional treatment with PNGase F in O18 water after protease digestion was carried out for N-glycan removing, in which deamidation of asparagine (Asn) yielded an aspartic acid residue with a mass shift of +2.98 Da. It discriminated modified N-glycosites from unoccupied Asn or glutamine (Gln) thus excluded the interference of N-glycosylation on the identification of O-glycosylation. Second, we conducted simultaneous search on N- and O-linked glycans together in the same samples. In order to conclusively rule out artifactual assignment of N-linked glycan to nearby Ser (S) or Thr (T), we used four criteria for the characterization of O-glycopeptides, including (1) the MS/MS spectra contains glycans or oxonium ions (i.e., feature B ions); (2) the MS/MS spectra contains feature Y ions; (3) isotope distribution of precursor is reasonable; and (4) the retention time of glycopeptides and non-glycopeptides is comparable. We did extensive manual validation in order to select the valid O-glycopeptides for the specific site analysis.6 Diagnostic ions were used as the requisite criterion for the O-glycosite determination.
In line with the previous study, we identified 22 N-glycosites with confirmation (Fig. 1a; Supplementary information, Fig. S2 and Dataset S1).7 For the first time, a total of 17 O-glycosites were identified on the S protein extracted from SARS-CoV-2 virions (Fig. 1a; Supplementary information, Fig. S3, Table S1 and Dataset S1), among which 14 sites were determined with diagnostic ions (Fig. 1a; Supplementary information, Fig. S3 and Table S1). The O-glycoforms of the S protein extracted from virions are diverse and in sharp difference with the reported glycoforms of purified S protein (Fig. 1b).5 We found that O-glycosylation occurred in clusters on the S protein. The S1 domain was more O-glycosylated with 11 sites, while the remaining 6 sites were detected at the N-terminal of the S2 domain (Fig. 1a). Interestingly, 11 out of 17 identified O-glycosites located near glycosylated Asn, including S60, T124, S151, T236, T604/S605, T618, S659, T1076, T1077, S1097 and T1100 (Fig. 1a, c; Supplementary information, Fig. S3 and Table S1). The glycopeptide containing T604 and S605 sites was well characterized, however, we were not able to determine the exact glycosylation site due to a lack of diagnostic ions. Therefore, we counted T604/S605 as one O-glycosite.
In order to further investigate the dynamics between N- and O-linked glycosylation, we defined the three amino acids on each side of the glycosylated Asn within the consensus motif of NxS/T (x is not proline (P)) as the “position associated to N-sequon” (named N ± 1–3). There are 35 S/T within positions associated to N-sequon; 11 of them were O-glycosylated among which 10 sites were determined. It is intriguing that 7 out of the 10 sites (70%) were located at the N + 2 position, which is in the consensus motif of N-glycosylation (Fig. 1d). All the identified N-glycosites and O-glycosites associated to N-sequon were mapped on the surface of S protein based on the cryo-EM structure of the trimeric SARS-CoV-2 S protein (Protein Data Bank (PDB) ID 6XR8) (Fig. 1e).
To further validate the phenomenon that N- and the O-linked glycosylation occurred together in N-sequon-associated positions, we carried out site-directed mutagenesis. An N-to-Q mutation was generated on the N-linked glycosite N616 on purified full-length WT S protein. The O-glycosite T618 was analyzed together with the deamidated N616 (Supplementary information, Fig. S4). Mutations of N616 completely abolished the O-glycosylation on T618 (Fig. 1f), indicating that the presence of glycosylated Asn is the prerequisite of O-glycosylation associated to N-sequon.
Based on the observations above, we proposed an “O-Follow-N” rule, whereby O-glycosylation occurs near the glycosylated Asn in N-sequon. This may also apply to other proteins and promote the identification of O-glycosites (Fig. 1g). It has been reported that GalNAc-transferases (GalNAc-Ts), which mediate the initiation of mucin-type O-glycosylation, contain a lectin-like domain that binds glycans.8 We reasoned that GalNAc-Ts may recognize glycans on the N-glycosites, thus catalyzing O-glycosylation near the N-glycosites.
In summary, we conducted a site-specific glycosylation analysis, and comprehensively profiled the N- and O-linked glycosylation of the S protein either extracted from virions or in vitro expressed. To our knowledge, this is the first and largest glycosylation dataset of S protein directly extracted from the SARS-CoV-2 virions to date, which broadens our understanding of the glycosylation of the SARS-CoV-2 S protein. In this context, we observed a unique pattern that O-glycosites were located near glycosylated Asn in N-sequon. This phenomenon was also observed by Sanda et al. on purified S protein and on other proteins by other research groups,4,9,10 but no rule has been proposed due to the limited number of identified sites.
The observation reported here of O-glycosites in close proximity to N-glycosylation suggests the possible “O-Follow-N” rule. It has long been known that N-glycosylation occurs in the NxS/T (x is not proline (P)) consensus motif, however, it is not clear whether the S/T after glycosylated Asn is prone to be O-glycosylated. If this is the case, it would be interesting to explore whether the glycosylation of S/T depends on the nearby N-glycosylation. The “O-Follow-N” rule discovered in this study would shed light on the potential new mechanisms of O-glycosylation, especially the synergies between N- and O-glycosylation, and would greatly benefit fundamental glycobiology studies.
The mass spectrometry raw files have been deposited in the MassIVE proteomics database under the accession number PXD023346.
Tian, W. et al. Nat. Commun. 11, 5859 (2020).
Watanabe, Y., Allen, J. D., Wrapp, D., McLellan, J. S. & Crispin, M. Science 369, 330–333 (2020).
Shajahan, A., Supekar, N. T., Gleinich, A. S. & Azadi, P. Glycobiology 30, 981–988 (2020).
Sanda, M., Morrison, L. & Goldman, R. Anal. Chem. 93, 2003–2009 (2021).
Bagdonaite, I. et al. Viruses 13, 551 (2021).
Xu, T., Wong, C. C., Kashina, A. & Yates, J. R. 3rd Nat. Protoc. 4, 325–332 (2009).
Yao, H. et al. Cell 183, 730–738.e13 (2020)
Imberty, A., Piller, V., Piller, F. & Breton, C. Protein Eng. 10, 1353–1356 (1997).
Riethmueller, S. et al. PLoS Biol. 15, e2000080 (2017).
Chandrasekhar, K. D. et al. J. Physiol. 589, 3721–3730 (2011).
We would like to thank Min Huang, Yue Zhou, Guanbo Wang, Minjie Tan for their assistance in mass spectrometry experiments. We would also like to thank Thermo Fisher Scientific for their support in collaboration. We would like to thank Shuaixin Gao, Yu Hu, Chao Su, Qihui Wang for their assistance in S protein purification. We thank Hui Wang and Beijing Institute of Biological Products Co., Ltd. for providing SARS-CoV-2 virions. We are grateful to Xuefei Yin (PMI Inc.), Yan Zhang, Wenfeng Zeng and Mingliang Ye for the assistance in the use of the software. We thank Yan Zhang and Xin Chen for their insightful discussion. This work was supported by the Fundamental Research Funds for the Central Universities (BMU2017YJ003, BMU2018XTZ002), the Research Funds from Health@InnoHK Program launched by the Innovation Technology Commission of the Hong Kong Special Administrative Region, the PKU-Baidu Fund (2019BD007), the Training Program of the Big Science Strategy Plan (2020YFE0202200), the National Program on Key Research Projects of China (2016YFD0500301), and the Tianjin Synthetic Biotechnology Innovation Capacity Improvement Project (TSBICIP-KJGG-004).
The authors declare no competing interests.
About this article
Cite this article
Tian, W., Li, D., Zhang, N. et al. O-glycosylation pattern of the SARS-CoV-2 spike protein reveals an “O-Follow-N” rule. Cell Res 31, 1123–1125 (2021). https://doi.org/10.1038/s41422-021-00545-2
Signal Transduction and Targeted Therapy (2021)