Author: Nadine Touma Nadine Touma is a consultant in Insights, Strategy & Innovation based in Dubai. In her blog, Let’s talk business, she shares her 20 years experience in the corporate world and 5 as an independent consultant with real-life stories, simple advice and a no-nonsense approach that characterizes her work Original Post
The series “Data are not insights” aims at explaining what, in my view, makes you go from simple data to powerful insights. This article is the last revolving around the quality of data analysis, to then move on, in the next, to the art of insights. Allow me to open here with a simple remark:
Figures are never big or small, in absolute terms, they are always relative to the value you are comparing them to.
When I was at school, I was first in my class with an 18/20. In prep school, I could be first with a 6/20. Although 20 was the common scale for both grades, 6, in prep school, could be a “big” score if all the class was scoring 5 and under. And believe me, we often scored under! Knowing that we were all presenting competitive exams, the absolute value had no real meaning, what mattered was to be ahead of others.
I can see you all rolling your eyes over this very simplistic opening… Duh!… Indeed, duh!… and yet in the illustration, I will use throughout the article it is this very simple remark that makes ME roll my eyes and go… Duh!
The example I am choosing to develop here to illustrate the importance of benchmarking is one that, in my view, exemplifies the absolute worst case of data analysis in the whole history of data analysis. I present to you: the case of COVID 19. But, before we go any further, allow me a couple of disclaimers.
Disclaimer 1: I am NOT anti-vaccine. I have been vaccinated 4 times (2 dosages of 2 different vaccines, a pure delight!).
Disclaimer 2: I do NOT think COVID does not exist. Nor do I think it gives you chickenpox, AIDS or God knows what else. I do not subscribe either to the theory that Bill Gates or any other tech entrepreneur can now listen to us through the 5G installed in our cells via the vaccine. They have enough data through Siri, Facebook, and Google anyway…
Disclaimer 3: The below does NOT aim at giving any opinion on the pandemic, the handling of such pandemic, nor does it contain a professional diagnostic of the disease. I am not a doctor. It just aims at looking at how people, and in the forefront, the media, have handled the data as it unveiled and how, along the way, they utterly neglected to present any insights whatsoever, preferring to overwhelm us with an avalanche of seemingly apocalyptic figures.
The first thing that I would like to note here is that people are convinced their opinion on the virus is based on facts while it is most certainly based on emotions and their personal experience with the disease. And that is all perfectly understandable. This microscopic thing killed people, put our economies to a halt, and locked us up in our homes. It is difficult not to feel emotional about all this.
Now, for the sake of this article, let’s put how we feel about it, the things we do or don’t do, on the side and look at it coldly from a data analysis standpoint.
Since the outbreak of COVID 19, what we all experienced was the following… Number of cases, number of hospitalizations, number of deaths. On repeat. Every day. Several times a day. 500… 1,000… 5,000… 10,000… 50,000… As the numbers grew, so did the panic. Let’s face it, something that reads in thousands seems pretty big, right? When I started seeing that, the first thing I did was to research a number of cases of comparable diseases: the regular flu, the Spanish flu. And I researched famous deadly viruses, namely Ebola. Instinctively, I looked at benchmarking the information I was bombarded with as, as I said in the beginning, no number is big or small in absolute terms. None. And again, I am not a doctor, I have no clue what those numbers mean.
Allow me to do here what I did then, the notable difference being that we have now two years worth of data, and allow me to draw, from the recap table below (all based on France’s numbers, with sources, cited, of course, reference to Part 1), the key insights. Kindly note that depending on the benchmarks you use, conclusions would be different. The point here is just to illustrate how much values taken in absolute mean nothing and how values taken relatively tell a story.
- In its first year (2020), COVID 19 infected as many people in total as the average flu(approximately 2.5M people).
- The disease was however more virulent than the average flu with 2.6% deaths over the total number of cases (vs 0.2% for the average flu).
- The virus grew more contagious in year 2 (2021) and infected 7.1% of the population (vs 3.8% in 2020). Worth noting though that the more screenings you do, the higher the probability of recording cases.
- However, it dropped in lethality: 1.1% deaths over a total number of cases (vs 2.6% in 2020). But again, more screenings increase the reference basis and would tend to lower the ratio.
- Despite being more contagious (in year 2) and more virulent than the average flu, COVID 19 remains far less contagious and deadly than the Spanish Flu that ravaged Europe in 1918-19, claiming millions of deaths. In comparison, the Spanish flu infected 40 to 50% of the French population (20 million people!), and it is estimated now that at least 5% died of it (vs an original estimation at 2%). Brought back to the size of the population at the time, the Spanish Flu killed 1% of the French population vs COVID: 0.1% in 2020, 0.08% in 2021.
And that is the basic key story, from a data analysis standpoint, derived from those specific benchmarks. I hope you can agree with me that this topic, in the way it was presented, does represent the worst illustration of how to confuse data with insights. A number of cases / hospitalizations / deaths, given daily, mean nothing with no perspective(say, how the average per month is evolving), no benchmark(how it fares compared toother diseases), and without few basic calculations that any schoolboy/girl can do (namely ratios).
You will probably argue that it is easy to say that now with two years of historical data and you would be right. However, what you can do early on is a simple thing called projections… France registered on average 3,000 daily cases March to April, 1,000 daily cases May to July, 3 to 5,000 on August, 5 to 10,000 in September, 15 to 50,000 at its peak in October-November to close the year between 10 to 20,000 in December (mind you averages and brackets like this give you a better idea than to hear every day the countdown of doom: 3,486… 4,036… 5,389… 15,987… 25,623… doesn’t it?).
When I decided to take a bit of perspective over all that we were probably in April-May 2020 and, although the cases dropped massively in May due to confinement, I decided to project with higher numbers and see how many cases we would have end of the year if we had 5 to 10,000 case a day… I ended up with a number between 1.5M and 3M. Bullseye! We finished the year at 2.5M, the same number as the average flu. We already knew then that the lethality of the disease was higher than the average flu but, I am sorry to say, it was still nothing compared to Ebola that had a 40% to 50% (40% to 50%!!!) death rate.
If you are not too shocked by my cold analysis, allow me to give one more illustration, still on the subject of COVID 19. I have kept my mouth shut whether on social media or in real life when it comes to my own (very cold, data-driven) perspective over the pandemic because I respected the fact that people were emotional and afraid. However this summer I had a mind-boggling confrontation with someone I considered, until then, a friend. That friend, like many others, grew over time into a hard-core pro-vaccine (“pro-vax”) advocate. His social media was inundated every day with countless articles, videos and graphs, sprinkled with hateful comments against “anti-vax”. Said “anti-vax” responded in kind of course and the whole “conversations” were, to me, absolutely surreal. One day, he posted a graph that gave a comparison between the number of hospitalizations for the vaccinated vs the non-vaccinated (daily average, in France, summer 2021). He commented proudly: “6 times more chances to be hospitalized due to COVID when you are not vaccinated!”. Conclusion: get vaccinated, don’t be an idiot! Game. Set. Match.
If his calculation was absolutely correct (87 for non vaccinated vs 15 for vaccinated, nearly 6 times more), I couldn’t help but notice the scale of the chart that read: “daily hospitalizations, for 10M people” and, therefore, couldn’t help but comment: “Don’t use this argument to prove your point. In my humble opinion, it is an anti-vax argument as if you calculate the ratio 87/10M, brought back to the year (X365), brought back to the French population (X6.5), you come up with a number of hospitalizations at 2% that could be considered a ‘calculated risk’ as it is – relatively – ‘statistically negligible’ ” (I’m sorry this is how it is called in the statistical lingo). Oh my Lord, what did I say! I received hateful comments from people I have never met in my life over “how poorly I regarded human life”, “a life is not a ratio!!!!!” (exclamation points are important) and so on and so forth. Well, I am sorry, if you are trying to show how clever you are by displaying figures, you should at least understand said figures. Enough said.
I have been warned about writing about COVID. I surely hope I will not receive death threats over this article. I am most definitely sensitive to matters of disease and death and value human life over anything else. However, data analysis is my field and sometimes, well, I guess you need to have the courage to say something.