arising from S. Flaxman et al. Nature https://doi.org/10.1038/s41586-020-2405-7 (2020)
Flaxman et al.1 took on the challenge of estimating the effectiveness of five categories of non-pharmaceutical intervention (NPI)—social distancing encouraged, self isolation, school closures, public events banned, and complete lockdown—on the spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). On the basis of mortality data collected between January and early May 2020, they concluded that only one of these, the lockdown, had been effective in 10 out of the 11 European countries that were studied. However, here we use simulations with the original model code to suggest that the conclusions of Flaxman et al. with regard to the effectiveness of individual NPIs are not justified. Although the NPIs that were considered have indisputably contributed to reducing the spread of the virus, our analysis indicates that the individual effectiveness of these NPIs cannot be reliably quantified.
Flaxman et al.1 presented a method to estimate the effects of NPIs on the time-varying reproduction number (Rt) of SARS-CoV-2 infection. Data from 11 European countries were pooled on the basis of the assumption that the effects of NPIs on Rt are not country-specific: the factor of relative change in Rt resulting from a particular NPI was assumed to be independent of the country in which the NPI was implemented.
Some country-specific flexibility was, however, provided through the basic reproduction number (R0) being country-specific. More notably, additional flexibility was introduced by ascribing a country-specific effect to the NPI that was introduced last in each country. This replaced the parameterization in a preprint version (Imperial College Report 13)2, in which a country-specific effect was instead assigned to the lockdown NPI.
Our criticism concerns the final published version of the model1,3. Previous iterations of the model are not explicitly considered, but we reference them for two purposes: (1) to demonstrate the sensitivity of the final published model to subtle and realistic alterations in parameter values; (2) to illustrate how the modelling choices appear to lack motivation other than to introduce flexibility, which masks sensitivity issues pertaining to the fundamental structure of the model. As made evident below, we believe the core problem is that the death data are not descriptive enough to support the conclusions of Flaxman et al., which were based on simulation results obtained using an over-flexible model.
Of the 11 modelled countries, Sweden is worthy of particular attention, given that it was the only country in which no lockdown took place. As we have previously shown4, the estimated effects of NPIs change markedly when the model is not allowed to give the Swedish data the special treatment that the country-specific last NPI parameter enables. The country-specific last NPI parameter is needed to explain the decrease of Rt supported by the Swedish death data, and to provide a good model fit despite the absence of a lockdown in Sweden.
Figure 1 shows the outcome for Sweden when executing the model3,5 either with (Fig. 1a) or without (Fig. 1b) the last NPI adjustment in place. With the last NPI adjustment in place, the public events ban results in a mean reduction of Rt of 71% (95% credible interval: 59–81%) in Sweden, which contrasts with the negligible effect of the public events ban in the other 10 countries (less than 2% mean reduction of Rt and less than 15% with 95% credibility). Notably, the estimated effectiveness of the public events ban in Sweden is comparable to that of lockdown in the 10 countries in which one was implemented. As lockdown was the last intervention in most countries, its estimated effect comprises a pooled effect (82% mean reduction of Rt) and a separate country-specific ‘last NPI’ effect (mean change in Rt of between −24% and 18% for the countries considered).
The result above—that is, the public events ban and the lockdown being mutually effective in Sweden and 10 other European countries—was not addressed by Flaxman et al, which is noteworthy as this result undermines the conclusion of lockdown being especially effective. Furthermore, without the introduction of the last intervention parameter after the publication of the preprint2, the inconsistency would have been readily visible in reported plots (Fig. 1b).
It seems unlikely to be a result of circumstance that lockdown was implemented in the 10 countries in which it had a large effect on Rt, and omitted in the single country in which the public events ban instead had a similar effect (sufficient to drive Rt below 1). An alternative hypothesis is that the infection-to-death distribution used by the model, combined with the death data that were available by early May, makes the model ascribe almost all of the reduction in Rt to the last intervention that was implemented in each country. This hypothesis is supported by executing the model code3,5 with different interventions being defined as having occurred last in the country in which no lockdown occurred (Sweden), as shown in Fig. 2.
Exchanging the last intervention for a different one is not merely interesting from a theoretical perspective. For example, it is hard to judge whether transitioning to online teaching at high school and university levels, while keeping elementary schools and preschools open, constitutes a school closure or not. Similarly, the crowd-size limit associated with the public events ban NPI remains a parameter to be decided by the modeller. Early versions of the model defined the public events ban to have taken place in Sweden on 12 March 2020, when gatherings exceeding 500 persons were prohibited. This was later changed to 29 March 2020, when gatherings exceeding 50 persons were prohibited. These subtle alterations of the definitions alter which NPI, of school closure, public events ban, or social distancing encouraged, was the last to be implemented in Sweden. In each case, the model uses the last intervention to explain the majority of the drop of Rt to below 1, which is needed to stay consistent with the decrease in reported deaths.
As mentioned above, our analyses were conducted using the original model implementation3,5 referenced from the final published paper1, and we have considered the definitions of NPIs reported in the preceding versions of the model1,2,3 solely to highlight how small and plausible perturbations of these definitions can result in a lack of practical identifiability, in the statistical sense. Identifiability issues have to some extent been acknowledged by the authors; Flaxman et al. state that “The close spacing of interventions in time [...] means that the individual effects of the other interventions are not identifiable”1. However, this is overshadowed by the subsequent presentation of credible intervals for the effects of the different NPIs, and the claim that “Lockdown has an identifiable large effect on transmission (81% (75–87%) reduction)”1. We believe that the basis of this claim is unclear. As seen in the supplementary videos of the Nature article1, the credible intervals narrow as more data become available, further hiding the identifiability problems of the underlying model and potentially giving the results a false sense of reliability.
Our point here is not to argue whether or not a school closure took place in Sweden, or what the most appropriate crowd-size limit is. Instead, our findings highlight that the model presented by Flaxman et al. is very sensitive to reasonable, minor changes in the input data. As indicated by our simulation examples, and further supported by our previous analyses4, there is a fundamental problem with the identifiability of the effectiveness of individual NPIs, including the lockdown. This problem is caused by the close temporal spacing between the implementation of these NPIs throughout Europe. In particular, we note in relation to the lockdown NPI that an estimated value that is considerably larger than zero should not be confused with statistical identifiability of the corresponding parameter.
Although we fully support the ambition of Flaxman et al.1—to estimate the effectiveness of different NPIs from the available data—we find the underlying modelling approach problematic. Flexible parameterization leads to issues with identifiability, which are masked by model assumptions. In particular, we find it questionable to designate a country-specific effectiveness parameter to the last NPI that was introduced in each country. Besides the problems illustrated in Fig. 2, with large variations in the estimated effectiveness of NPIs, this prohibits prospective use of the model, as it is unknown at any given time whether the latest NPI will also be the last to be implemented in a particular country.
We conclude that the model1,3 is in effect too flexible, and therefore allows the data to be explained in various ways. This has led the authors to go beyond the data in reporting that particular interventions are especially effective. This kind of error—mistaking assumptions for conclusions—is easy to make, and not especially easy to catch, in Bayesian analysis. As NPIs are revoked, and possibly reintroduced over an extended period of time, more data will become available and practical identifiability of the separate effects of NPIs may be obtained. Until then, we suggest that the model1,3, and its conclusion that all NPIs apart from lockdown have been of low effectiveness, should be treated with caution with regard to policy-making decisions.
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.
A fork of the original code and associated data, which was used to generate the figures presented here, is provided in a separate GitHub repository5. This fork is based on the GitHub repository commit 885466d of the original code3, in which the README file states that it was “the exact code that was used [in the Nature article1]”. We have, however, noticed discrepancies between the original code3 and the figures in the article1. For example, the code that was used to generate figure 1 in Flaxman et al.1 defines the self-isolation NPI as having been implemented as the last NPI in Spain on 17 March 2020, whereas the code in the commit defines this date as 14 March 2020.
Flaxman, S. et al. Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe. Nature 584, 257–261 (2020).
Flaxman, S. et al. Report 13: Estimating the number of infections and the impact of non-pharmaceutical interventions on COVID-19 in 11 European countries. https://doi.org/10.25561/77731 (Imperial College London, 2020).
Imperial College London. covid19model. Available at https://github.com/ImperialCollegeLondon/covid19model (2020).
Soltesz, K. et al. On the sensitivity of non-pharmaceutical intervention models for SARS-CoV-2 spread estimation. Preprint at https://doi.org/10.1101/2020.06.10.20127324 (2020).
Heimerson, A. covid19model fork. Available at https://github.com/albheim/covid19model (2020).
We acknowledge Ericsson Research for hosting our model runs in their data centre. This work has been partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation; the ELLIIT strategic research area on IT and mobile communications; the Swedish Research Council (grant reference number 2017-04989); the Swedish Foundation for Strategic Research (SSF) via the project ASSEMBLE (grant reference number RIT15-0012); and ALF Grants, Region Östergötland.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Soltesz, K., Gustafsson, F., Timpka, T. et al. The effect of interventions on COVID-19. Nature 588, E26–E28 (2020). https://doi.org/10.1038/s41586-020-3025-y
Nature Communications (2021)
Scientific Reports (2021)