Exploration of COVID19 dataset: Time-Series plots

Recap & Grabbing Time-Series Data

First, customary importing of packages and data if you haven't already done so from the previous sections.

import pandas as pd
import matplotlib.pyplot as plt
ASEAN_countries_list = ['Brunei', 'Cambodia', 'Indonesia', 'Laos', 'Malaysia', 'Burma', 'Philippines', 'Singapore', 'Vietnam']

In order for us to do time-series plots of COVID19, it would be very inefficient of us to use datasets where a single file is data for 1 day. Luckily for us, from the same source, there is a collated dataset that has a column for each day. Let's load it in the same way first.

We can also use the same code from previous sections to select for ASEAN countries only!

link = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
covid19_data = pd.read_csv(link)
asean_df = covid19_data[covid19_data['Country/Region'].isin(ASEAN_countries_list)]
asean_df.set_index('Country/Region', inplace = True)
asean_df_dropped = asean_df.drop(columns = ['Province/State', 'Lat', 'Long'])
1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 ... 7/25/21 7/26/21 7/27/21 7/28/21 7/29/21 7/30/21 7/31/21 8/1/21 8/2/21 8/3/21
Brunei 0 0 0 0 0 0 0 0 0 0 ... 321 333 333 333 333 336 337 337 338 338
Burma 0 0 0 0 0 0 0 0 0 0 ... 269525 274155 279119 284099 289333 294460 299185 302665 306354 311067
Cambodia 0 0 0 0 0 1 1 1 1 1 ... 72923 73701 74386 75152 75917 76585 77243 77914 78474 79051
Indonesia 0 0 0 0 0 0 0 0 0 0 ... 3166505 3194733 3239936 3287727 3331206 3372374 3409658 3440396 3462800 3496700
Laos 0 0 0 0 0 0 0 0 0 0 ... 4762 4985 5154 5434 5675 5919 6299 6566 6765 7015
Malaysia 0 0 0 3 4 4 4 7 8 8 ... 1013438 1027954 1044071 1061476 1078646 1095486 1113272 1130422 1146186 1163291
Philippines 0 0 0 0 0 0 0 0 1 1 ... 1548755 1555396 1562420 1566667 1572287 1580824 1588965 1597689 1605762 1612541
Singapore 0 1 3 3 4 5 7 7 10 13 ... 64179 64314 64453 64589 64722 64861 64981 65102 65213 65315
Vietnam 0 2 2 2 2 2 2 2 2 2 ... 101173 106347 117121 123640 133405 141122 150060 157507 157507 174461

9 rows × 560 columns


From the table, the columns are the dates, and the countries were set as the index. This would be very difficult for us to plot a time-series graph with the dates as the x-axis.

To solve this problem, we would need to transpose the dataframe so that the columns are rows.

asean_t = asean_df_dropped.T
Country/Region Brunei Burma Cambodia Indonesia Laos Malaysia Philippines Singapore Vietnam
1/22/20 0 0 0 0 0 0 0 0 0
1/23/20 0 0 0 0 0 0 0 1 2
1/24/20 0 0 0 0 0 0 0 3 2
1/25/20 0 0 0 0 0 3 0 3 2
1/26/20 0 0 0 0 0 4 0 4 2
... ... ... ... ... ... ... ... ... ...
7/30/21 336 294460 76585 3372374 5919 1095486 1580824 64861 141122
7/31/21 337 299185 77243 3409658 6299 1113272 1588965 64981 150060
8/1/21 337 302665 77914 3440396 6566 1130422 1597689 65102 157507
8/2/21 338 306354 78474 3462800 6765 1146186 1605762 65213 157507
8/3/21 338 311067 79051 3496700 7015 1163291 1612541 65315 174461

560 rows × 9 columns

Time-Series Plotting

To plot a time-series chart, it is really easy with pandas!

ax = asean_t['Singapore'].plot()
ax.set_ylabel('Confirmed Cases')


In order to plot more countries onto the same plot, we would need to make use of the plt.subplots() function call as shown in the matplotlib section previously!

fig, ax = plt.subplots()

#Iterate through our countries so we can plot automatically plot them!
for country in ASEAN_countries_list:

ax.set_ylabel('Confirmed Cases')


From the time-series plots, we can tell that currently, Indonesia has the highest cumulative confirmed cases amongst all the ASEAN countries!