Did Hamlet wish for better data?

Posted on

To open or not to open? that is the question:

Whether ’tis nobler in the mind to suffer

The slings and arrows of outrageous fortune,

Or to take arms against a sea of troubles

And by opposing end them.

William Shakespeare, Hamlet

Did Hamlet wish for better data? 

Hopefully governments are constantly wishing for better data as they ponder the trade-offs of various calculated risks, from opening restaurants to opening borders. As statisticians, we like to think we know what to do with data, how to interpret it, and how to leverage it for better decision-making, individually and collectively. COVID-19 data are a particular challenge.  We stare at them but we don’t always know what to do with them. It might be pandemic fatigue; it might be literal fatigue; it might be statistical insight.

We begin with the data we have and the understanding we don’t. We explore what at least two tired statisticians think about when they stare at COVID-19 data and try to make individual decisions.  And then, scale up to the national level – to Italy – and the current suite of new decisions that affect everyone in the country and, in fact, everyone on the planet. What are the doubts and what data would we need to remove those doubts?

A statistical anecdote about the data we have and the data we don’t

Over the holidays, we walked past a restaurant, inside was a table, a long table, filled with young people, toasting and chatting and having a lovely early (very early by Italian standards) dinner. Now, in Italy this was not allowed so we looked a little more closely and saw that it was twelve people sitting at 5 little square tables, each about a half meter apart. Technically then, there was a maximum of three people at a table (including the end tables), and about a meter of distance between people. They were following the letter of the law but, certainly not the spirit of the law. They were all masks-off and chatting, eating, laughing, shouting to a person two tiny tables down. They didn’t look like a family or even a table of roommates. It was a gathering of friends.

As two statisticians out for a walk, we tried to calculate just how risky this was, what were the odds that someone at that table had COVID-19 and was contagious?  The conversation went something like this: about 1200 new cases a day for a population of about 6M (regional scale data), a downward trend indicating that actual conditions might be a little better than what the data show; assume that they are all independent (no one lives together or spends a lot of time already with someone else in the group – probably an incorrect assumption but necessary to move the calculations forward) and assume further that no one is actually symptomatic (they are young but let’s give them credit for a little common sense). Conservatively, there are about 2 (or up to maybe 4) days of contagiousness pre-symptoms so triple the number of expected people and add a little more for all the asymptomatic people (a number we still don’t really have a grip on) and … wait!  Already two critical bits of data we don’t have, and all our calculations should also be conditional on the fact that these 12 people were willing to show up at this early evening fest and feel happy about it. Neither of us would even enter that restaurant, let alone sit down, remove masks, and eat a meal. By being at that table, these are all people comfortable with that risk which suggests that they may have taken similar risks on other days and are therefore much more likely to have COVID-19.  How much more likely? How will we ever get data on COVID-19 stratified by “willingness to take risks”?  How will we ever calculate the risk of eating at that table if those data will never be available?

To be or not to be? Individual decision-making based on COVID-19 data

We are not public health experts, virologists, or epidemiologists.  We are not qualified to tell readers what actions they should or should not take to protect themselves from COVID-19.  But we are uniquely qualified to share what at least a small sub-sample of statisticians look at in the data and how we use it to make everyday decisions.

Let’s look at the reported number of “new positives” by day. Why? Because it is an understandable measurement. If you are going to look at data, make sure you understand it. Next, separate status and trend.  Status is the value of new positive test results today; the trend is how the value changes over time. The status, or value today, of new infections is an indicator of how much virus was in circulation 10 days ago.

The smoothed trend (yes, mentally smoothed, no fancy programming required) prevents one from getting too excited or too anxious about fluctuations.  Every Sunday and Monday, rates drop.  It means nothing other than that there aren’t enough folks entering data on the weekend or, perhaps, that people tend to get tested when pharmacies are open all day (Monday – Friday).  Rates must then go up on Tuesday.  It does not mean that more virus is circulating on Tuesdays. The smoothed trend gives an indication of what conditions might be like in 10 days and, therefore, the amount of virus circulating today.  No calculations, there’s too much uncertainty in the trend for that, but it is something to consider.

Remember, also, that to make decisions based on this indicator we need to assume that the same number of tests are performed every week (or find data on the number of tests and make further assumptions that the population getting tested is staying relatively similar over time). So what do we do with this ad hoc indicator? Steep downward trend? The risk is likely lower than it seems from the data. Maybe it is a good time to finally get a haircut. Upward trend?  Minimize errands, reconsider meeting friends even for a masked walk in the park.

For an individual’s risk, it also helps to download data (if available of course) at the neighborhood level. What is the risk of going to the grocery store or browsing local shops?  To compare places, one can divide by population size. To compare over time, one might divide by the number of tests as mentioned above.  Although here we land again on data we only wish we had such as changes over time in the demographics of people seeking tests.  Are more and more healthy people getting tests just to fly on planes?  One can try to correct, lightly, mentally, for the increased testing of symptom-free people and close contacts of those who are COVID-19 positive, but the gap between what we know and what we wish to know grows larger.

So, we live with the reality of how much we don’t know, finding a balance between “we know nothing” and “there is a huge amount of data.” The precautionary principle is suggested across many disciplines under these conditions. The principle generally suggests guidelines for mitigating risk when all the data and information are not yet available.  Modifying and borrowing from other writers, one could suggest three components to the precautionary principle for everyday decision-making during a pandemic: take preventive action in the face of uncertainty; shift the burden of proof to demonstrating that an activity is safe; and apply efficient skepticism in the face of mountains of new data.

To open or not to open? What can data offer governments to support decision-making?

In Italy, starting Monday, April 26, 15 out of 20, regions will be allowed to relax several restrictions. Restaurants and bars will open. Lunch and dinner at restaurants will be allowed in the open air. Schools will allow 75% presence (100% where possible) and universities will do the same. Cinema, theaters, and museums will be open again although with limited access.

Law-makers are also creating a regional pass that will allow people to move between regions under some constraints. You could get a pass if you are vaccinated (fully vaccinated), or if you receive a negative result in a rapid or molecular test within the 48 hours preceding your trip.  And that sounds a little crazy! Who are the individuals ready to move around? Presumably, on average, they are more willing to take risks; they are more like the friends gathering inside at the restaurant. What are the odds that a population more willing to take risk encounters COVID-19 in the 24 hours prior to travel?  Experiments in The Netherlands are trying to move us closer to these answers but they too are running into obstacles.  And, what is the false negative rate in asymptomatic individuals? Finally, the pass could be offered to those who have recovered from COVID-19 within 6 months but without a mandatory antibody test. At least part of the precautionary principle will be applied. The recommendations call for continued vigilance, wearing masks, and keeping distances.

From the government, we hear that the opening is a “calculated risk.” Calculated by whom and, above all, calculated how and with what data? We are experienced with numbers and with data, but still, we are not confident in how currently available data can be used to accurately calculate these risks.


All posts are written by authors in their personal capacity and in no way represent the view of the organisations, universities, governments, or agencies where they are employed or with which they are associated, or the views of the International Statistical Institute (ISI).

One comment on this post

  1. I doubt that the “calculated risk” has actually been calculated. It simply means “we are taking a chance”

Leave a Reply

Your email address will not be published. Required fields are marked *