COVID-19 Math: Some of the stats used to better understand the virus

James Hodgens
7 min readJun 10, 2021

Throughout the COVID-19 pandemic, many different people have contributed in the fight against the virus. One of these groups was data scientists. This blog post will explore some of the important statistics used to contribute to influence public health decisions.

One of the primary statistics used to track the spread of COVID-19 throughout the pandemic has been R0 (pronounced R-naught). R0 is the statistic used to measure how rapidly the disease is spreading. It is a fairly straightforward statistic (which is why it is a good place to start when reviewing the COVID-19 metrics) and is defined as the average number of people that one infected person will also infect. If R0 > 1, the virus is spreading at an increasing rate. If R0 < 1, the spread of the virus is slowing.

The above graph is a simplified version of one that we saw countless times on the news in 2020. The y-axis represents the number of cases and the x-axis represents time. We’ll return to this graph in a little while, but first — here’s a Biology (or Virology if you want to be more specific) crash course to cover some basic concepts before we dive deeper into the data:

  • Viruses require a host to replicate and once they have replicated they can spread new virus copies to other hosts
  • A virus will continue to replicate until the host’s immune system eradicates it (or the virus kills the host)
  • Once the immune system eradicates a virus, it gains some level of protection against future infection by this particular virus (called immunity)

Now that we covered the basics of how a virus replicates and spreads, here’s another chart:

Let’s assume this represents the spread of COVID-19 in a population with no control measures. We’ll also assume that this virus does not mutate and that once recovered from the virus, an infected individual will gain full immunity. Like a fire that eventually burns itself out, this virus will go away on its own once its infected (almost) everyone in the population (and they gain immunity from future infection).

One strategy to defeat this hypothetical virus would be just to let it run its course. Everyone gains immunity and the virus will have no future hosts to infect. This is the strategy some parents used to take when allowing their children to get chicken pox. However, in the case of COVID-19, this was not feasible for several reasons. First, the disease caused by this virus was fatal for some individuals. Another reason was that at the height of the pandemic, hospitals were so overrun with severe cases of COVID-19, that they were struggling to keep up with the demand, which is why public health officials urged citizens to adopt behaviors that would help society “flatten the curve”.

We will explore flattening the curve some more, but first to recap — the definition of R0 is the average number of individuals that one infected person will also infect. That number is not a constant and can be influenced by behaviors of the population. After over a year of COVID, we all know these behavioral factors that public health officials have pushed, including mask wearing, social distancing, and isolation/quarantining. Something I felt was lacking from the discussion; however, was a framing of these ideas probabilistically.

Theoretically, it takes just one viral particle to make someone sick (if that virus is able to get inside the host and replicate). However, we’re exposed to many things that can make us sick daily yet somehow we stay healthy most of the time. This is because our immune system fights off most pathogens. The probability of us getting sick increases when our immune system gets overworked. It is then that an invading virus would be free to replicate inside the host. We can use this chart to model how to think about this more probabilistically:

On the x-axis above, we have the amount of virus that a person is exposed to. On the y-axis, we have the probability that an individual gets sick. The more virus a person is exposed to, the more likely they are to get sick, and vice versa. Let’s use another analogy to drive this idea home. Suppose you are in an open field and there is one bee out in the field with you. Are you going to get stung? It’s possible, but not very likely. Now suppose you are in a broom closet with an entire beehive. You are much more likely to get stung in this case. Even though we can’t see them, exposure to viruses works in a similar manner to exposure to the bees.

Now reframing the control measures recommended by public health officials have advocated for — mask wearing, social distancing, and isolation/quarantining — in this probabilistic context helps paint a better picture of why these things have been important. These measures help reduce exposure to the virus, which helps lower the probability that an individual gets sick. It’s not a guarantee. Many people have heard of cases where an individual followed all of the precautions, but still got sick. On the other hand, there were cases where some more members of a household got sick, while others remained healthy even though they had very close contact with each other for long periods of time. Thinking probabilistically helps us explain these exceptions. It also helps public health officials model scenarios on a population level. As any good data scientist knows, probability might not help us much at the individual level, but as our sample size grows, we can be more confident in what we infer from the data.

Another way to think about social distancing measures is to think about them in terms of reducing R0. If we take our logic one step further, we can see how this is possible:

  • Social distancing reduces exposure to viral particles
  • This reduces the probability that someone will get infected
  • At the population level, we can infer that when thousands (or millions) of people adopt social distancing, R0 will decrease

Another way to think about R0 is it is closely tied to the rate at which a pathogen spreads throughout a population. Remember that any contagious pathogen left unchecked will eventually make its way through a population until enough people have developed immunity. Therefore, by decreasing the average number of people that catch a virus from an infected individual, we are slowing down the rate of its transmission. As calculus teaches us, the rate of something can be visually depicted by the slope of a graph over time. Let’s take a look at what happens when we decrease the rate of transmission of a virus –

In other words…

An important note on flattening the curve (that involves some more calculus) — if we calculate the total number of infected people in both of these scenarios (by taking the integral — or area under the curve — of each), we find that the same number of people get infected in each scenario.

So if in the end, the same number of people get sick in both scenarios — why don’t we just use the parent chicken pox method in order to build up to herd immunity (the point where enough people are immune that the virus begins to die out) as a society? In the case of COVID-19, there were two reasons.

The first was that we wanted to avoid hospitals from reaching their capacity. The point where the number of COVID cases surpasses hospitals’ capacity to treat patients is represented by the vertical line above. As you can see, flattening the curve prevents us from reaching that point.

The second reason is that, flattening the curve was a way to mitigate the impact of the virus until a vaccine became available (represented by the vertical line above). In a future post, I will look at how data scientists are modeling the impact of the vaccines now that they are available in order to better understand the future of COVID-19. Although we have reached a significant milestone in the fight against the virus, there is still much to be learned from the data!

Further reading:

--

--

James Hodgens

Guy interested in data science, health, and a few other things.