Exploration of COVID19 dataset: Effectiveness of Measures against COVID19

Importing everything from the previous part

First, customary importing of packages and data if you haven't already done so from the previous sections.

I also imported the exact same code as from the previous section to speed up the tutorial.

import pandas as pd
import matplotlib.pyplot as plt
ASEAN_countries_list = ['Brunei', 'Cambodia', 'Indonesia', 'Laos', 'Malaysia', 'Burma', 'Philippines', 'Singapore', 'Vietnam']

link = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
covid19_data = pd.read_csv(link)
asean_df = covid19_data[covid19_data['Country/Region'].isin(ASEAN_countries_list)]
asean_df.set_index('Country/Region', inplace = True)
asean_df_dropped = asean_df.drop(columns = ['Province/State', 'Lat', 'Long'])
asean_t = asean_df_dropped.T

def country_diff(df, country):
    df_s = df[country].copy()
    df_s_F = df_s.shift(1)
    country_df = pd.concat([df_s, df_s_F], axis=1)
    country_df.columns = [f'{country}_Cases_O', f'{country}_Cases_ShiftF']
    country_df[f'{country}_difference'] = country_df[f'{country}_Cases_O'] - country_df[f'{country}_Cases_ShiftF']
    return country_df

asean_daily_df = pd.DataFrame()
for country in ASEAN_countries_list:
    temp_df = country_diff(asean_t, country)
    asean_daily_df = pd.concat([asean_daily_df, temp_df[f'{country}_difference']],sort = False, axis = 1)
asean_daily_df = asean_daily_df.iloc[1:,]
asean_daily_df
  

	Brunei_difference	Cambodia_difference	Indonesia_difference	Laos_difference	Malaysia_difference	Burma_difference	Philippines_difference	Singapore_difference	Vietnam_difference
1/23/20	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.0	2.0
1/24/20	0.0	0.0	0.0	0.0	0.0	0.0	0.0	2.0	0.0
1/25/20	0.0	0.0	0.0	0.0	3.0	0.0	0.0	0.0	0.0
1/26/20	0.0	0.0	0.0	0.0	1.0	0.0	0.0	1.0	0.0
1/27/20	0.0	1.0	0.0	0.0	0.0	0.0	0.0	1.0	0.0
...	...	...	...	...	...	...	...	...	...
7/31/21	1.0	658.0	37284.0	380.0	17786.0	4725.0	8141.0	120.0	8938.0
8/1/21	0.0	671.0	30738.0	267.0	17150.0	3480.0	8724.0	121.0	7447.0
8/2/21	1.0	560.0	22404.0	199.0	15764.0	3689.0	8073.0	111.0	0.0
8/3/21	0.0	577.0	33900.0	250.0	17105.0	4713.0	6779.0	102.0	16954.0
8/4/21	0.0	583.0	35867.0	290.0	19819.0	4051.0	7283.0	95.0	7295.0

560 rows × 9 columns

Since we would be handling dates in this tutorial, we would need to format the index of the dataframe for easier use!

asean_daily_df.index = pd.to_datetime(asean_daily_df.index)
asean_daily_df
  

	Brunei_difference	Cambodia_difference	Indonesia_difference	Laos_difference	Malaysia_difference	Burma_difference	Philippines_difference	Singapore_difference	Vietnam_difference
2020-01-23	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.0	2.0
2020-01-24	0.0	0.0	0.0	0.0	0.0	0.0	0.0	2.0	0.0
2020-01-25	0.0	0.0	0.0	0.0	3.0	0.0	0.0	0.0	0.0
2020-01-26	0.0	0.0	0.0	0.0	1.0	0.0	0.0	1.0	0.0
2020-01-27	0.0	1.0	0.0	0.0	0.0	0.0	0.0	1.0	0.0
...	...	...	...	...	...	...	...	...	...
2021-07-31	1.0	658.0	37284.0	380.0	17786.0	4725.0	8141.0	120.0	8938.0
2021-08-01	0.0	671.0	30738.0	267.0	17150.0	3480.0	8724.0	121.0	7447.0
2021-08-02	1.0	560.0	22404.0	199.0	15764.0	3689.0	8073.0	111.0	0.0
2021-08-03	0.0	577.0	33900.0	250.0	17105.0	4713.0	6779.0	102.0	16954.0
2021-08-04	0.0	583.0	35867.0	290.0	19819.0	4051.0	7283.0	95.0	7295.0

560 rows × 9 columns

Plotting Daily Cases: Singapore

For the sake of this tutorial, we would be taking a look at the effectiveness of the different measures against COVID19 in Singapore only. Recall that we can plot the daily cases against time like so:

fig, ax = plt.subplots(figsize = (15,10))

asean_daily_df['Singapore_difference'].plot(ax=ax)
ax.set_ylabel('Confirmed Daily Cases')
ax.set_xlabel('Dates')
ax.legend()
plt.show()
  

png

From this wiki page, we can have a table of dates, types of measures and details of measures!

Date (YYYY-MM-DD)	Detail of Measure	Type of Measure (Tighten / Relax)
2020-04-07	Circuit Breaker	Tighten
2020-04-21	Circuit Breaker 2	Tighten
2020-06-02	Phase 1 Day 1	Relax
2020-06-19	Phase 2 Day 1	Relax
2020-12-28	Phase 3 Day 1	Relax
2021-05-08	Phase 2 part 2 Day 1	Tighten
2021-05-16	Phase 2 HA Day 1	Tighten
2021-06-14	Phase 3 HA Day 1	Relax
2021-07-12	Phase 3 HA Groups of 5 Allowed	Relax
2021-07-22	Phase 2 HA Day 1	Tighten

From this table, we can easily use the datetime package to label these dates for us!

from datetime import date

circuit_breaker_date = date(2020,4,7)
cb_2_date = date(2020,4 ,21)
phase_1_date = date(2020, 6, 2)
phase_2_date = date(2020, 6, 19)
phase_3_date = date(2020, 12, 28)
phase_3_HA_date =  date(2021, 5, 8)
phase_2_HA_date = date(2021, 5, 16)
phase_3_HA2_date = date(2021, 6, 14)
phase_3_5_date = date(2021, 7, 12)
phase_2_HA2_date = date(2021, 7, 22)


#We are going to use a dictionary here to identify each date's details!
dates_dist = {circuit_breaker_date:['Circuit Breaker', 'Tighten'],
              cb_2_date:['Circuit Breaker 2', 'Tighten'],
              phase_1_date:['Phase 1', 'Relax'],
              phase_2_date:['Phase 2', 'Relax'],
              phase_3_date:['Phase 3', 'Relax'],
              phase_3_HA_date:['Phase 3 HA', 'Tighten'],
              phase_2_HA_date:['Phase 2 HA', 'Tighten'],
              phase_3_HA2_date:['Phase 3 HA2', 'Relax'],
              phase_2_HA2_date:['Phase 2 HA2', 'Tighten'],
              phase_3_5_date:['Group of 5 Allowed', 'Relax']}

col_dict = {'Tighten': 'r', 'Relax':'g'}
  

fig, ax = plt.subplots(figsize = (15,10))
asean_daily_df['Singapore_difference'].plot(ax=ax, label = '')

#Iterate over the keys of the dictionary. aka the different dates
for dates in dates_dist:
    #plot the date, for colour, use the dictionary to see if "Tighten" or "Relax"
    #Then use that as a key for our col_dict to specify our colour (red / green)
    ax.axvline(dates, color=col_dict[dates_dist[dates][1]], 
               linestyle='--', lw=2)

ax.axvline(circuit_breaker_date,color = 'r', label ='Tighten',linestyle='--', lw=2)
ax.axvline(phase_3_date,color = 'g', label ='Relax',linestyle='--', lw=2)
ax.set_ylabel('Confirmed Daily Cases')
ax.set_xlabel('Dates')
ax.legend()
plt.show()
  

png

The graph might be hard to see as it is roughly 2 years worth of data squeezed into a single axis, we might want to split into two graphs to visualise.

fig, ax = plt.subplots(figsize = (15,10), nrows = 2)
asean_daily_df['Singapore_difference'].plot(ax=ax[0], label = '')
asean_daily_df['Singapore_difference'].plot(ax=ax[1], label = '')

#Iterate over the keys of the dictionary. aka the different dates
for dates in dates_dist:
    #plot the date, for colour, use the dictionary to see if "Tighten" or "Relax"
    #Then use that as a key for our col_dict to specify our colour (red / green)
    ax[0].axvline(dates, color=col_dict[dates_dist[dates][1]], 
               linestyle='--', lw=2)
    ax[1].axvline(dates, color=col_dict[dates_dist[dates][1]], 
               linestyle='--', lw=2)
    

ax[0].axvline(circuit_breaker_date,color = 'r', label ='Tighten',linestyle='--', lw=2)
ax[0].axvline(phase_3_date,color = 'g', label ='Relax',linestyle='--', lw=2)
ax[0].set_ylabel('Confirmed Daily Cases')
ax[0].set_xlabel('Dates')
ax[0].set_xlim(date(2020, 1, 22), date(2020, 12, 31))
ax[0].legend()
ax[0].set_title('Daily Cases in Singapore for 2020')


ax[1].axvline(circuit_breaker_date,color = 'r', label ='Tighten',linestyle='--', lw=2)
ax[1].axvline(phase_3_date,color = 'g', label ='Relax',linestyle='--', lw=2)
ax[1].set_ylabel('Confirmed Daily Cases')
ax[1].set_xlabel('Dates')
ax[1].set_xlim(date(2021, 1, 1), date(2021, 8, 4))
ax[1].set_ylim(-1, 400)
ax[1].legend()
ax[1].set_title('Daily Cases in Singapore for 2021')

plt.tight_layout()
plt.show()
  

png

Recap:

Date (YYYY-MM-DD)	Detail of Measure	Type of Measure (Tighten / Relax)
2020-04-07	Circuit Breaker	Tighten
2020-04-21	Circuit Breaker 2	Tighten
2020-06-02	Phase 1 Day 1	Relax
2020-06-19	Phase 2 Day 1	Relax
2020-12-28	Phase 3 Day 1	Relax
2021-05-08	Phase 2 part 2 Day 1	Tighten
2021-05-16	Phase 2 HA Day 1	Tighten
2021-06-14	Phase 3 HA Day 1	Relax
2021-07-12	Phase 3 HA Groups of 5 Allowed	Relax
2021-07-22	Phase 2 HA Day 1	Tighten

Just by inspecting the rise and fall of daily cases with regards to tightening / relaxation of measures we can summarise all of these in a table:

Date (YYYY-MM-DD)	Detail of Measure	Type of Measure (Tighten / Relax)	Effects on Daily Cases (Increase / Decrease / No Change)
2020-04-07	Circuit Breaker	Tighten	Increase
2020-04-21	Circuit Breaker 2	Tighten	Decrease
2020-06-02	Phase 1 Day 1	Relax	Decrease
2020-06-19	Phase 2 Day 1	Relax	Increase THEN Decrease
2020-12-28	Phase 3 Day 1	Relax	No Change
2021-05-08	Phase 2 part 2 Day 1	Tighten	Increase
2021-05-16	Phase 2 HA Day 1	Tighten	Decrease
2021-06-14	Phase 3 HA Day 1	Relax	No Change
2021-07-12	Phase 3 HA Groups of 5 Allowed	Relax	Increase
2021-07-22	Phase 2 HA Day 1	Tighten	Decrease

From the table, five out of ten changes in COVID19 measures resulted in a decrease in daily cases. On a superficial glance, it seems that tightening or relaxation of measures does not affect daily cases.

However, there are many more dimensions in this dataset such as, clusters, linked / unlinked cases / waves etc. We would not be going into all these things as they would turn this workshop in a statistics course and, we would probably be working for various govenments to analyse all the statistics ;).