Housekeeping

First, customary importing of packages and data if you haven't already done so from the previous sections."

import pandas as pd
import matplotlib.pyplot as plt

month_int = '08'
day_int = '03'

df = pd.read_csv(f"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/{month_int}-{day_int}-2021.csv")
df
FIPS Admin2 Province_State Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active Combined_Key Incident_Rate Case_Fatality_Ratio
0 NaN NaN NaN Afghanistan 2021-08-04 04:21:25 33.939110 67.709953 148572 6804 82586.0 59182.0 Afghanistan 381.655103 4.579598
1 NaN NaN NaN Albania 2021-08-04 04:21:25 41.153300 20.168300 133211 2457 130291.0 463.0 Albania 4628.917923 1.844442
2 NaN NaN NaN Algeria 2021-08-04 04:21:25 28.033900 1.659600 175229 4370 117557.0 53302.0 Algeria 399.600529 2.493879
3 NaN NaN NaN Andorra 2021-08-04 04:21:25 42.506300 1.521800 14766 128 14348.0 290.0 Andorra 19110.852262 0.866856
4 NaN NaN NaN Angola 2021-08-04 04:21:25 -11.202700 17.873900 43070 1022 39389.0 2659.0 Angola 131.046214 2.372881
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3982 NaN NaN NaN Vietnam 2021-08-04 04:21:25 14.058324 108.277199 174461 2071 50831.0 121559.0 Vietnam 179.231087 1.187085
3983 NaN NaN NaN West Bank and Gaza 2021-08-04 04:21:25 31.952200 35.233200 317264 3609 312289.0 1366.0 West Bank and Gaza 6219.136020 1.137538
3984 NaN NaN NaN Yemen 2021-08-04 04:21:25 15.552727 48.516388 7086 1380 4232.0 1474.0 Yemen 23.757821 19.475021
3985 NaN NaN NaN Zambia 2021-08-04 04:21:25 -13.133897 27.849332 197123 3422 189341.0 4360.0 Zambia 1072.255612 1.735972
3986 NaN NaN NaN Zimbabwe 2021-08-04 04:21:25 -19.015438 29.154857 112435 3676 81570.0 27189.0 Zimbabwe 756.479528 3.269445

3987 rows × 14 columns

As you can tell, the table is huge with ~4000 rows. Let us subset our data to just ASEAN countries.

ASEAN_countries_list = ['Brunei', 'Cambodia', 'Indonesia', 'Laos', 'Malaysia', 'Burma', 'Philippines', 'Singapore', 'Vietnam']

asean_df = df[df['Country_Region'].isin(ASEAN_countries_list)]
asean_df.set_index('Country_Region', inplace = True)
asean_df
FIPS Admin2 Province_State Last_Update Lat Long_ Confirmed Deaths Recovered Active Combined_Key Incident_Rate Case_Fatality_Ratio
Country_Region
Brunei NaN NaN NaN 2021-08-04 04:21:25 4.535300 114.727700 338 3 280.0 55.0 Brunei 77.260145 0.887574
Burma NaN NaN NaN 2021-08-04 04:21:25 21.916200 95.956000 311067 10373 220887.0 79807.0 Burma 571.711409 3.334651
Cambodia NaN NaN NaN 2021-08-04 04:21:25 11.550000 104.916700 79051 1471 72145.0 5435.0 Cambodia 472.822161 1.860824
Indonesia NaN NaN NaN 2021-08-04 04:21:25 -0.789300 113.921300 3496700 98889 2873669.0 524142.0 Indonesia 1278.390505 2.828066
Laos NaN NaN NaN 2021-08-04 04:21:25 19.856270 102.495496 7015 7 3392.0 3616.0 Laos 96.418748 0.099786
Malaysia NaN NaN NaN 2021-08-04 04:21:25 4.210484 101.975766 1163291 9598 950029.0 203664.0 Malaysia 3594.176209 0.825073
Philippines NaN NaN NaN 2021-08-04 04:21:25 12.879721 121.774017 1612541 28141 1521263.0 63137.0 Philippines 1471.550496 1.745134
Singapore NaN NaN NaN 2021-08-04 04:21:25 1.283300 103.833300 65315 38 63252.0 2025.0 Singapore 1116.430267 0.058180
Vietnam NaN NaN NaN 2021-08-04 04:21:25 14.058324 108.277199 174461 2071 50831.0 121559.0 Vietnam 179.231087 1.187085

There are many useless columns as you could tell with NaN. Most likely, we would not need variables such as Lat and Long_. We can thus remove these columns.

asean_df_dropped = asean_df.drop(columns = ['FIPS', 'Admin2','Province_State', 'Last_Update', 'Lat', 'Long_', 'Combined_Key'])
asean_df_dropped
Confirmed Deaths Recovered Active Incident_Rate Case_Fatality_Ratio
Country_Region
Brunei 338 3 280.0 55.0 77.260145 0.887574
Burma 311067 10373 220887.0 79807.0 571.711409 3.334651
Cambodia 79051 1471 72145.0 5435.0 472.822161 1.860824
Indonesia 3496700 98889 2873669.0 524142.0 1278.390505 2.828066
Laos 7015 7 3392.0 3616.0 96.418748 0.099786
Malaysia 1163291 9598 950029.0 203664.0 3594.176209 0.825073
Philippines 1612541 28141 1521263.0 63137.0 1471.550496 1.745134
Singapore 65315 38 63252.0 2025.0 1116.430267 0.058180
Vietnam 174461 2071 50831.0 121559.0 179.231087 1.187085

Description of each field (from the github)

  1. Country_Region: Country, region or sovereignty name. The names of locations included on the Website correspond with the official designations used by the U.S. Department of State.

  2. Confirmed: Total Counts include confirmed and probable (where reported).

  3. Deaths: Total Counts include confirmed and probable (where reported).

  4. Recovered: Recovered cases are estimates based on local media reports, and state and local reporting when available, and therefore may be substantially lower than the true number. US state-level recovered cases are from COVID Tracking Project.

  5. Active: Active cases = total cases - total recovered - total deaths.

  6. Incident_Rate: Incidence Rate = cases per 100,000 persons.

  7. Case_Fatality_Ratio (%): Case-Fatality Ratio (%) = Number recorded deaths / Number cases.

All cases, deaths, and recoveries reported are based on the date of initial report. Exceptions to this are noted in the "Data Modification" and "Retrospective reporting of (probable) cases and deaths" subsections below.