Housekeeping
First, customary importing of packages and data if you haven't already done so from the previous sections."
import pandas as pd
import matplotlib.pyplot as plt
month_int = '08'
day_int = '03'
df = pd.read_csv(f"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/{month_int}-{day_int}-2021.csv")
df
FIPS | Admin2 | Province_State | Country_Region | Last_Update | Lat | Long_ | Confirmed | Deaths | Recovered | Active | Combined_Key | Incident_Rate | Case_Fatality_Ratio | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | NaN | NaN | Afghanistan | 2021-08-04 04:21:25 | 33.939110 | 67.709953 | 148572 | 6804 | 82586.0 | 59182.0 | Afghanistan | 381.655103 | 4.579598 |
1 | NaN | NaN | NaN | Albania | 2021-08-04 04:21:25 | 41.153300 | 20.168300 | 133211 | 2457 | 130291.0 | 463.0 | Albania | 4628.917923 | 1.844442 |
2 | NaN | NaN | NaN | Algeria | 2021-08-04 04:21:25 | 28.033900 | 1.659600 | 175229 | 4370 | 117557.0 | 53302.0 | Algeria | 399.600529 | 2.493879 |
3 | NaN | NaN | NaN | Andorra | 2021-08-04 04:21:25 | 42.506300 | 1.521800 | 14766 | 128 | 14348.0 | 290.0 | Andorra | 19110.852262 | 0.866856 |
4 | NaN | NaN | NaN | Angola | 2021-08-04 04:21:25 | -11.202700 | 17.873900 | 43070 | 1022 | 39389.0 | 2659.0 | Angola | 131.046214 | 2.372881 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3982 | NaN | NaN | NaN | Vietnam | 2021-08-04 04:21:25 | 14.058324 | 108.277199 | 174461 | 2071 | 50831.0 | 121559.0 | Vietnam | 179.231087 | 1.187085 |
3983 | NaN | NaN | NaN | West Bank and Gaza | 2021-08-04 04:21:25 | 31.952200 | 35.233200 | 317264 | 3609 | 312289.0 | 1366.0 | West Bank and Gaza | 6219.136020 | 1.137538 |
3984 | NaN | NaN | NaN | Yemen | 2021-08-04 04:21:25 | 15.552727 | 48.516388 | 7086 | 1380 | 4232.0 | 1474.0 | Yemen | 23.757821 | 19.475021 |
3985 | NaN | NaN | NaN | Zambia | 2021-08-04 04:21:25 | -13.133897 | 27.849332 | 197123 | 3422 | 189341.0 | 4360.0 | Zambia | 1072.255612 | 1.735972 |
3986 | NaN | NaN | NaN | Zimbabwe | 2021-08-04 04:21:25 | -19.015438 | 29.154857 | 112435 | 3676 | 81570.0 | 27189.0 | Zimbabwe | 756.479528 | 3.269445 |
3987 rows × 14 columns
As you can tell, the table is huge with ~4000 rows. Let us subset our data to just ASEAN countries.
ASEAN_countries_list = ['Brunei', 'Cambodia', 'Indonesia', 'Laos', 'Malaysia', 'Burma', 'Philippines', 'Singapore', 'Vietnam']
asean_df = df[df['Country_Region'].isin(ASEAN_countries_list)]
asean_df.set_index('Country_Region', inplace = True)
asean_df
FIPS | Admin2 | Province_State | Last_Update | Lat | Long_ | Confirmed | Deaths | Recovered | Active | Combined_Key | Incident_Rate | Case_Fatality_Ratio | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country_Region | |||||||||||||
Brunei | NaN | NaN | NaN | 2021-08-04 04:21:25 | 4.535300 | 114.727700 | 338 | 3 | 280.0 | 55.0 | Brunei | 77.260145 | 0.887574 |
Burma | NaN | NaN | NaN | 2021-08-04 04:21:25 | 21.916200 | 95.956000 | 311067 | 10373 | 220887.0 | 79807.0 | Burma | 571.711409 | 3.334651 |
Cambodia | NaN | NaN | NaN | 2021-08-04 04:21:25 | 11.550000 | 104.916700 | 79051 | 1471 | 72145.0 | 5435.0 | Cambodia | 472.822161 | 1.860824 |
Indonesia | NaN | NaN | NaN | 2021-08-04 04:21:25 | -0.789300 | 113.921300 | 3496700 | 98889 | 2873669.0 | 524142.0 | Indonesia | 1278.390505 | 2.828066 |
Laos | NaN | NaN | NaN | 2021-08-04 04:21:25 | 19.856270 | 102.495496 | 7015 | 7 | 3392.0 | 3616.0 | Laos | 96.418748 | 0.099786 |
Malaysia | NaN | NaN | NaN | 2021-08-04 04:21:25 | 4.210484 | 101.975766 | 1163291 | 9598 | 950029.0 | 203664.0 | Malaysia | 3594.176209 | 0.825073 |
Philippines | NaN | NaN | NaN | 2021-08-04 04:21:25 | 12.879721 | 121.774017 | 1612541 | 28141 | 1521263.0 | 63137.0 | Philippines | 1471.550496 | 1.745134 |
Singapore | NaN | NaN | NaN | 2021-08-04 04:21:25 | 1.283300 | 103.833300 | 65315 | 38 | 63252.0 | 2025.0 | Singapore | 1116.430267 | 0.058180 |
Vietnam | NaN | NaN | NaN | 2021-08-04 04:21:25 | 14.058324 | 108.277199 | 174461 | 2071 | 50831.0 | 121559.0 | Vietnam | 179.231087 | 1.187085 |
There are many useless columns as you could tell with NaN
. Most likely, we would not need variables such as Lat
and Long_
. We can thus remove these columns.
asean_df_dropped = asean_df.drop(columns = ['FIPS', 'Admin2','Province_State', 'Last_Update', 'Lat', 'Long_', 'Combined_Key'])
asean_df_dropped
Confirmed | Deaths | Recovered | Active | Incident_Rate | Case_Fatality_Ratio | |
---|---|---|---|---|---|---|
Country_Region | ||||||
Brunei | 338 | 3 | 280.0 | 55.0 | 77.260145 | 0.887574 |
Burma | 311067 | 10373 | 220887.0 | 79807.0 | 571.711409 | 3.334651 |
Cambodia | 79051 | 1471 | 72145.0 | 5435.0 | 472.822161 | 1.860824 |
Indonesia | 3496700 | 98889 | 2873669.0 | 524142.0 | 1278.390505 | 2.828066 |
Laos | 7015 | 7 | 3392.0 | 3616.0 | 96.418748 | 0.099786 |
Malaysia | 1163291 | 9598 | 950029.0 | 203664.0 | 3594.176209 | 0.825073 |
Philippines | 1612541 | 28141 | 1521263.0 | 63137.0 | 1471.550496 | 1.745134 |
Singapore | 65315 | 38 | 63252.0 | 2025.0 | 1116.430267 | 0.058180 |
Vietnam | 174461 | 2071 | 50831.0 | 121559.0 | 179.231087 | 1.187085 |
Description of each field (from the github)
-
Country_Region: Country, region or sovereignty name. The names of locations included on the Website correspond with the official designations used by the U.S. Department of State.
-
Confirmed: Total Counts include confirmed and probable (where reported).
-
Deaths: Total Counts include confirmed and probable (where reported).
-
Recovered: Recovered cases are estimates based on local media reports, and state and local reporting when available, and therefore may be substantially lower than the true number. US state-level recovered cases are from COVID Tracking Project.
-
Active: Active cases = total cases - total recovered - total deaths.
-
Incident_Rate: Incidence Rate = cases per 100,000 persons.
-
Case_Fatality_Ratio (%): Case-Fatality Ratio (%) = Number recorded deaths / Number cases.
All cases, deaths, and recoveries reported are based on the date of initial report. Exceptions to this are noted in the "Data Modification" and "Retrospective reporting of (probable) cases and deaths" subsections below.