Exploration of COVID19 dataset: Time-Series plots
Recap & Grabbing Time-Series Data
First, customary importing of packages and data if you haven't already done so from the previous sections.
import pandas as pd
import matplotlib.pyplot as plt
ASEAN_countries_list = ['Brunei', 'Cambodia', 'Indonesia', 'Laos', 'Malaysia', 'Burma', 'Philippines', 'Singapore', 'Vietnam']
In order for us to do time-series plots of COVID19, it would be very inefficient of us to use datasets where a single file is data for 1 day. Luckily for us, from the same source, there is a collated dataset that has a column for each day. Let's load it in the same way first.
We can also use the same code from previous sections to select for ASEAN countries only!
link = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
covid19_data = pd.read_csv(link)
asean_df = covid19_data[covid19_data['Country/Region'].isin(ASEAN_countries_list)]
asean_df.set_index('Country/Region', inplace = True)
asean_df_dropped = asean_df.drop(columns = ['Province/State', 'Lat', 'Long'])
1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | 1/26/20 | 1/27/20 | 1/28/20 | 1/29/20 | 1/30/20 | 1/31/20 | ... | 7/25/21 | 7/26/21 | 7/27/21 | 7/28/21 | 7/29/21 | 7/30/21 | 7/31/21 | 8/1/21 | 8/2/21 | 8/3/21 | |
Country/Region | |||||||||||||||||||||
Brunei | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 321 | 333 | 333 | 333 | 333 | 336 | 337 | 337 | 338 | 338 |
Burma | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 269525 | 274155 | 279119 | 284099 | 289333 | 294460 | 299185 | 302665 | 306354 | 311067 |
Cambodia | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | ... | 72923 | 73701 | 74386 | 75152 | 75917 | 76585 | 77243 | 77914 | 78474 | 79051 |
Indonesia | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 3166505 | 3194733 | 3239936 | 3287727 | 3331206 | 3372374 | 3409658 | 3440396 | 3462800 | 3496700 |
Laos | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 4762 | 4985 | 5154 | 5434 | 5675 | 5919 | 6299 | 6566 | 6765 | 7015 |
Malaysia | 0 | 0 | 0 | 3 | 4 | 4 | 4 | 7 | 8 | 8 | ... | 1013438 | 1027954 | 1044071 | 1061476 | 1078646 | 1095486 | 1113272 | 1130422 | 1146186 | 1163291 |
Philippines | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | ... | 1548755 | 1555396 | 1562420 | 1566667 | 1572287 | 1580824 | 1588965 | 1597689 | 1605762 | 1612541 |
Singapore | 0 | 1 | 3 | 3 | 4 | 5 | 7 | 7 | 10 | 13 | ... | 64179 | 64314 | 64453 | 64589 | 64722 | 64861 | 64981 | 65102 | 65213 | 65315 |
Vietnam | 0 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | ... | 101173 | 106347 | 117121 | 123640 | 133405 | 141122 | 150060 | 157507 | 157507 | 174461 |
9 rows × 560 columns
From the table, the columns are the dates, and the countries were set as the index. This would be very difficult for us to plot a time-series graph with the dates as the x-axis.
To solve this problem, we would need to transpose the dataframe so that the columns are rows.
asean_t = asean_df_dropped.T
Country/Region | Brunei | Burma | Cambodia | Indonesia | Laos | Malaysia | Philippines | Singapore | Vietnam |
1/22/20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1/23/20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 |
1/24/20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 2 |
1/25/20 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 3 | 2 |
1/26/20 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 4 | 2 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7/30/21 | 336 | 294460 | 76585 | 3372374 | 5919 | 1095486 | 1580824 | 64861 | 141122 |
7/31/21 | 337 | 299185 | 77243 | 3409658 | 6299 | 1113272 | 1588965 | 64981 | 150060 |
8/1/21 | 337 | 302665 | 77914 | 3440396 | 6566 | 1130422 | 1597689 | 65102 | 157507 |
8/2/21 | 338 | 306354 | 78474 | 3462800 | 6765 | 1146186 | 1605762 | 65213 | 157507 |
8/3/21 | 338 | 311067 | 79051 | 3496700 | 7015 | 1163291 | 1612541 | 65315 | 174461 |
560 rows × 9 columns
Time-Series Plotting
To plot a time-series chart, it is really easy with pandas!
ax = asean_t['Singapore'].plot()
ax.set_ylabel('Confirmed Cases')
In order to plot more countries onto the same plot, we would need to make use of the plt.subplots()
function call as shown in the matplotlib
section previously!
fig, ax = plt.subplots()
#Iterate through our countries so we can plot automatically plot them!
for country in ASEAN_countries_list:
ax.set_ylabel('Confirmed Cases')
From the time-series plots, we can tell that currently, Indonesia has the highest cumulative confirmed cases amongst all the ASEAN countries!