WD

COVID-19 waves

During the COVID-19 pandemic[1], we talk openly about waves without really thinking about what they are. At the time of this writing, most news bulletins agree that South Africa is recovering from its third wave.

I did not do research about the origin of these statements because any sideways glance at the data would convince you that they are right. Nonetheless, a wave seems to be an arbitrary concept that is rather subjective. Can we be more objective about a wave?

So, the problem I explore in this article is:
Discover an algorithm that identifies the COVID-19 waves for a given country.

Please note: all data for this article is sourced from the Humanitarian Data Exchange[2].

The wave depends on the country #

It is sometimes easy for us to make a subjective judgement call on the waves when we look at the data for a country, but at other times it can be a little tricky.

For instance, from Figure 1 below, it is easy to see that South Africa (ZA) has had three waves, but the waves for Great Britain (GB) is a little less clear. Does the GB data show 3 waves or 2 waves, or perhaps even 4?

Fig 1: The cases reported for South Africa (ZA) and Great Britain (GB)

Your answer is likely to depend on your definition of a wave but is also influenced by the specifics of the country. The same data that is a wave in one country, could simply end up being a bump in another.

What is a wave? #

This is the key question. I will answer it in terms of up-spikes.

Think of an up-spike as a triangle pointing upwards. The tip of the triangle is the maximum point of the wave. Let's call this tip the apex of the up-spike.

The left-base of the up-spike is a smaller point on the left-hand side, and the right-base is another smaller point on the right-hand side of the apex.

We call the line from the left-base to the apex, the incline and the line of the other side, which we call the decline of the spike.

It is easy to see that a COVID-19 wave is an up-spike. More precisely we should say it is the incline of an up-spike.

There are many up-spikes in the data, not all of them can be waves. So, we need to eliminate the smaller ones; and we have the first parameter algorithm. The minimum span, Ω\Omega.

The span of a spike is the difference (in days) between the right base and the left-base. In this article we use 60 days, meaning that the up-spike incline and decline should take at least 60 days for it to be considered as a wave.

We use:

The input function #

The input data can be considered as a simple sequence of integers. Each integer in the sequence is the number of cases reported on a day. For a country, the sequence starts on the first day that a non-zero day was reported, which we call day-zero.

Thus, the data can be seen as a function that maps the day index to some value. If we use I={0n1}\mathbb I = \{ 0 \ldots n-1\} where nn is the number of items in the sequence, then we get for the function that describes the data:

Using this idea, any point can be identified by its index. In general a point p:I(I,Z)p: \mathbb I \mapsto (\mathbb I, \mathbb Z) is

I cleaned up the original data a little to remove zero values because we can assume that zero numbers are missing data. Such values are filled by the average of the valid point before and the valid point after the missing value. After this, d:IRd: \mathbb I \mapsto \mathbb R and the following holds:

Step 1: Identify key trend points #

The first observation about an apex is that it appears before the point where the trend of the data changes from an upward slope to a downward slope.

In order to identify chances in the trend of the data, we use two moving averages[3]; a long moving average (with a window of 42 days, a42a_{42}) and a short one (with a window of 14 days, a14a_{14}). This idea can be seen in Figure 2.

Fig 2: The short and long moving averages for ZA

When the short moving average is above the long one, the trend is increasing, and when it is below the long one, the trend is decreasing. In Figure 2, you can see some points where the trend switches. What you cannot see clearly are the smaller trend switches in October.

We start by defining a boolean trend function, tt such it is true when the moving averages indicate an incline:

And from this, we can identify the set of breakpoints, B\bold B. These are the points where the trend slope switches from one sign to another.

Step 2: Find optimums and spikes #

For every point, ii in B\bold B, we can identify a line from p(i1)\bold p(i-1) to p(i)\bold p(i). If the line is increasing, the optimum for that line is the maximum, and if it was decreasing, the optimum for that line is a minimum point.

From these optimums, we get a zigzag picture that neatly identifies spikes. See Figure 3 for the general idea.

image
Fig 3: The optimum values and spikes for ZA

Step 3: Eliminate the short spans #

This is where Ω\Omega comes in. First, eliminate all the down-spikes that are too short and then eliminate the up-spikes that have a span less than Ω\Omega.

What we end up with are the objective waves. The end-result for Great Britain can be seen in Figure 4.

image
Fig 4: Three waves identified for GB

Conclusion #

A relatively simple process to identify waves has been found and is described here.

The algorithm has three parameters. One for the minimum spike span, the second for the window of the long moving averages and the last parameter is for the window of the short moving average.

The description provided here is semiformal but should be complete enough to follow unambiguously.

References #

  1. COVID-19 pandemic - Wikipedia
  2. From the Humanitarian Data Exchange, the COVID-19 data
  3. Moving average - Wikipedia