Oxford’s Stringency Index is Falling Apart

– December 24, 2020 Reading Time: 7 minutes

Since the pandemic started, the chattering classes ‒ and particularly those of us who have been somewhat skeptical of the policy interventions ‒ have used a chart tool from Our World in Data. This site, run by Oxford researcher Max Roser with a great team of scholars and communicators, is a fantastic tool for visualizing and condensing the world’s scientific data. I use it almost every day.

For every country, the chart in question gives a daily value for what’s known as “Government Response Stringency.” Commentators, policy-makers, researchers, and journalists have used it to gauge differences in countries’ response to the pandemic. This is exactly the kind of useful research we want academia to produce: third-party, unbiased ranking, by political scientists and data scientists, such that the rest of us can get a quick understanding of what’s going on.

Even though all the data used (and its methods) are publicly available, nobody really cared to look under the hood. These are well-established researchers at one of the world’s best universities (no bias on my part!), with a strong commitment to accuracy and excellence ‒ they couldn’t possibly have gotten anything seriously wrong, could they?

Think again.

The Blavatnik School of Government at Oxford, from which the ranking stems, employs hundreds of people for this effort, sifting through news and public communication to distil the pandemic responses by governments in 184 countries and territories (minus a handful where data isn’t reported or credible).

First of all, several different indices exist, not the single one that Our World in Data reports. Secondly, the Our World in Data team has confused the issue by mislabeling the chart they construct from the Blavatnik team’s data: they report a “Government Response Stringency Index” whereas the Blavatnik team lists both a “Government Response Index” and a “Stringency Index.” The source tab, titled “Stringency Index,” describes the variables that go into a third index ‒ “Containment and Health Index” ‒ while the actual data reported comes from the “Stringency Index,” i.e. not the one titling their graph.

Second, those of us who repeatedly use this graph to monitor the policy responses in different countries noted a distinct shift sometime in late October or early November. Particularly, I noticed it for the Nordic countries, as me and my co-authors Dan Klein and Christian Bjørnskov closely follow their development. Until sometime recently, Sweden, which most media coverage couldn’t get enough of reporting, was the least stringent of all the Nordics. Life was freer, pandemic restrictions were less invasive, and policy responses less strong; this aligned with Nordic people’s experience on the ground. The chart we used in our paper therefore looked like this:

Sweden trailed the other Nordics since the beginning of the pandemic, and approaching summer, the others ‒ bar Denmark ‒ came down to Swedish levels. Sweden did not “do nothing,” but revoked freedoms and bossed people around a little less than everyone else.

If we portray the same data today, using the same dates (January 22 to Aug 3), the chart looks like this:

“Huh!” says the inhabitants of any Nordic country, as this wouldn’t pass the most rudimentary sniff test. In this updated graph, May 22 marks the date where the Blavatnik team downgraded Denmark, after which Sweden looks like the strictest of all the Nordics. In reality, Denmark barred even Swedish citizens from entering the country and mandated that select non-Danish nationals (Germans, Icelanders, Norwegians) hop through very restrictive obstacles to even enter. On July 9 the Danish authorities flipped about wearing masks, from its health agency all spring recommending against using them because of the lack of evidence that they worked (the public transport company DSB even forbade its employees to wear them) to mandating them on buses and trains. Sweden did none of that, yet judging by the updated chart, it was now considered more strict.

Something is rotten in the state of Oxford.

The story of Iceland is actually even worse for the ranking. Despite its heavy controls, it looks like the most benevolent of all Nordics. The variable for international travel (“C8,” which wasn’t even included in the original rankings in the spring), best displays the failures of the Blavatnik team rankings. The five-step ranking (highest is worst: these numbers are ordinal but take on numerical values in the calculation) reads:

0: no restrictions
1: Screening arrivals
2: Quarantine arrivals
3: Ban arrivals from some regions
4: Ban arrivals from all regions

Now, I would quibble even with the basic ranking in that quarantining lots of people from neighboring countries is worse than banning a handful of arrivals from far-off countries. But ‘3’ captures a world of differences that don’t get reflected in the index.

When Denmark admitted some people into its lands on May 25, while barring everyone else, they were promptly downgraded to rank 3 ‒ the same as Sweden. Because Sweden followed the baseline European Union policies of restricting non-EEA nationals, it kept a 3 throughout the pandemic. When they included or excluded certain non-European countries, made it easier for those with family connections to arrive, or allowed those holding work permits to come, nothing got reflected in the ranking. Iceland, for invasively quarantining arrivals for various lengths of time, had a 2 for the early pandemic. Thereafter, it was ranked 3 for the rest of the pandemic, even though the island-nation first barred almost everyone, then in June opened for summer tourists against a quick-and-easy test at the border, then in August imposed a 5-day quarantine separated by double tests (all at travelers’ expense). Very different regimes, but the ‘3’ remained throughout.

Contrast this with Sweden: no quarantines, no tests, the base minimum of restrictions to far-off travelers (who weren’t traveling anyway).

Not Fit For Purpose

The problems of the ranking go way deeper, to its catch-all bins, its ad hoc interpretations and substandard references.

Let’s look at May 22 again. The Swedish health authorities, in keeping with its decentralized and expert-led effort, had offered recommendations not to travel if you were ill and that everyone else ought to consider if they really needed to travel. This somehow warranted a ‘1’ on the Internal Movement sub-index (C7), misleadingly equaling the Danish rank and policy messaging. On May 22, the Blavatnik team cited the U.S Embassy saying that there were “no inter-city or inter-regional restrictions on internal movement in Denmark.” While noting that authorities recommended working from home and avoiding public transport if possible, the team ranked Denmark ‘0’ (“No Measures”), while it kept Sweden at 1. How that made sense is anyone’s guess.

Three days later, the ranking on subindex C8 (international travel) moved Denmark from the maximum value of 4 to the same as Sweden (3) because it now allowed business travel from some neighboring countries. Never mind that Sweden allowed almost any traveler from all across the European Union.

C8 isn’t even the most mind-numbing subindex; H1, Public Information Campaigns, is. With three levels (no information; some public caution; “coordinated public information”), virtually every single country receives full marks throughout the entire pandemic. A variable that doesn’t vary ‒ neither over time nor between countries ‒ isn’t of much use. The indicator on private gatherings (C4) has ridiculously wide bins (everything between 11 and 100 people receives the same rank). The stay-at-home variable (C6) puts Sweden at 1 (“recommend not leaving house”), because authorities asked employees to consider working from home ‒ a ranking that equaled the UK’s (except for a few select weeks in spring and fall). Between Oct 21 and Nov 8, Denmark mysteriously received a 0 on this subindex, the downgrading of which referenced the Prime Minister saying nothing at all about that indicator.

The fastidious observer notes that mask mandates (indicator H6) aren’t even part of the Stringency Index presented on Our World in Data’s site. What kind of a stringency index does not include the physically and politically perhaps most invasive policy there is?

Most metrics in the Blavatnik School ranking don’t measure what thoughtful observers look for: they are clearly not nuanced enough or spaced enough to capture the differences between countries or reflect what any normal person would mean by “lockdowns” or government mandates. Their researchers seem to interpret statements and press releases a little arbitrarily (sometimes from English-speaking sources only, and two or three steps removed), and they use the strictest policy in any region as a stand-in for the entire country.

Can We Still Use It?

The benefit of trusting a third-party assessor for ranking policy responses ‒ like we do for the human development index, democracy index or economic freedom index ‒ is that the researcher using the numbers took no part in creating it. One hopes that those creating it did so with no obvious ulterior motive for how it would be used. The drawback is that you’re trusting the interpretative judgment to a third-party observer. That’s fine if that judgment is balanced and accurate, disastrous if it’s not.

The avalanche of information for every single country that the Covid pandemic released meant that trackers like the much-used Oxford Covid-19 Government Response Tracker became swamped with lots of small pieces of specialized information, all of which had to be jammed into a 3-, 4-, or 5-step scales according to 20 different indicators ‒ only some of which made it into “the” index publicly available at sites like Our World in Data.

A world of difference separates public recommendations for gyms or public transport to adjust or operate differently than to outright ban them under threats of fines, lawsuit or police violence. Being quarantined for days, required to post several positive tests, or allowed entry for reasons of business only is not the same as, for Europeans at least, the anything-goes approach in place in Sweden since March. (What they are doing in recent weeks is much stricter and more invasive and we’ll see what the Blavatnik team makes of that).

When governments recommend precautions, life may proceed almost as normal; lockdowns or prohibitive laws upend life entirely. On many accounts, those differences go almost entirely unnoticed in the Blavatnik team’s rankings. As the pandemic developed, its bins are clearly too wide and too ambiguous to capture what it is that people want such an index to display.

In the spring, when the ranking roughly seemed to reflect what was happening in various countries, nobody objected to it ‒ or even looked under its hood. Now, after its strange rewriting of history and poor use of the Nordic countries show very misleading stories, nobody should trust it without looking at the details.

Joakim Book is a writer, researcher and editor on all things money, finance and financial history. He holds a masters degree from the University of Oxford and has been a visiting scholar at the American Institute for Economic Research in 2018 and 2019.

His work has been featured in the Financial Times, FT Alphaville, Neue Zürcher Zeitung, Svenska Dagbladet, Zero Hedge, The Property Chronicle and many other outlets. He is a regular contributor and co-founder of the Swedish liberty site Cospaia.se, and a frequent writer at CapX, NotesOnLiberty, and HumanProgress.org.