Visualizing COVID-19 Cases on a Heat Map using GeoPandas and Python

Currently, there is a great need to provide relevant information on COVID-19. Geographic heat maps are particularly suitable for this purpose. A geographical heat map shows a map in which different regions or elements of the map, e.g. countries, cities, towns, mountains, etc. are shaded in different colours. The colour shades are defined in a colour palette and are determined by numerical values on a scale. In this way, geographic heat maps provide the viewer with a quick overview of what’s going on in different regions of the map.

This tutorial will guide you through the most important steps to visualize COVID-19 data on a geographic heat map. The heat map will show COVID-19 growth rates and total cases in different countries. For this, we use the GeoPandas package – an open source project for working with geospatial data in python.

GeoPandas is a great way for using maps, since it extends the datatypes of pandas so that they support spatial operations on geometric data types. In this way, Geopandas allows us to create maps without having to deal with other dependencies.

Prerequisites

First we need to install the geopandas packages via the console. For this we use the following command:

conda install –channel conda-forge geopandas

One can also use:

pip install geopandas

However some people reported that they get error messages, so better use the conda install comand.

Update (23.9.2020):

With the release of Python 3.8 there is a new install procedure:

conda create -n geo_env
conda activate geo_env 
conda config --env --add channels conda-forge 
conda config --env --set channel_priority strict 
conda install python=3 geopandas

2 Getting the Geographic Map Data

First, we will get the map with the geo-spatial data. To render a map with GeoPandas, we need a shapefile. A shapefile is basically a dataframe with some graphical data attached. For instance, there are shapefiles that show cities, countries, continents or maps of the whole world. The example presented in this tutorial will use a world map. So in our case, the shapefile is a list of countries, whereby each country has its own graphical representation in form of polygons.

There are many sources on the web where we can download shapefiles for different geographical regions and in varying detail. A good source from where we get a world map is natualearthdata.com. To download the map, go to the natualeathdata webpage and with a click on the green button you can download the version 4.1.0.

natualearthdata.com
natualearthdata.com

After the download is complete, unpack the files into the folder of your python notebook or into a subfolder structure such as data/shapefiles/worldmap/ in the folder of your python notebook.

3 Loading the COVID-19 Data

Next, we retrieve the COVID-19 data for all countries via an API. If you are not yet familiar with using APIs, check out my previous post on using APIs.

# Setting up Packages
import json
import country_converter as coco
from datetime import datetime, timedelta
import requests
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

# Getting the data
PAYLOAD = {'code': 'ALL'}
URL = 'https://api.statworx.com/covid'
RESPONSE = requests.post(url=URL, data=json.dumps(PAYLOAD))

# Convert the response to a data frame
covid_df = pd.DataFrame.from_dict(json.loads(RESPONSE.text))
covid_df.head(3)

This is the COVID-19 data that we will display on our map.

4 Setting Up a Geo Map

We can setup our map by specifying the filepath and loading the data via geopanda.

# Setting the path to the shapefile
SHAPEFILE = 'data/shapefiles/worldmap/ne_10m_admin_0_countries.shp'

# Read shapefile using Geopandas
geo_df = gpd.read_file(SHAPEFILE)[['ADMIN', 'ADM0_A3', 'geometry']]

# Rename columns.
geo_df.columns = ['country', 'country_code', 'geometry']
geo_df.head(3)

As we can tell from the output, we have a dataframe with three columns. The column geometry contains the graphical representation of countries. With that, we are already all setup to plot the geographic map:

# Drop row for 'Antarctica'. It takes a lot of space in the map and is not of much use
geo_df = geo_df.drop(geo_df.loc[geo_df['country'] == 'Antarctica'].index)

# Print the map
geo_df.plot(figsize=(20, 20), edgecolor='white', linewidth=1, color='lightblue')
world map vector data
World map

If you get an error: “ImportError: The descartes package is required for plotting polygons in geopandas.” you first have to install descartes package. You can do this by typing in your console: conda install descartes

5 Bringing It All Together

Next, we need to ensure that our data matches with the country codes. The dataframe with the geo-spatial data of the world map contains country codes that adhere to iso3, however, our COVID-19 data uses iso2_codes. luckily there is a country_converter available that does this job for us:

# Next, we need to ensure that our data matches with the country codes. 
iso3_codes = geo_df['country'].to_list()

# Convert to iso3_codes
iso2_codes_list = coco.convert(names=iso3_codes, to='ISO2', not_found='NULL')

# Add the list with iso2 codes to the dataframe
geo_df['iso2_code'] = iso2_codes_list

# There are some countries for which the converter could not find a country code. 
# We will drop these countries.
geo_df = geo_df.drop(geo_df.loc[geo_df['iso2_code'] == 'NULL'].index)

Now we have a list with the names (country) and codes (country_code) of all countries in the world and in an additional column a geographical representation for each country.

6 Preprocessing

Our COVID-19 data so far contains the whole history of cases. We want to drop the history and only get the data from the last day. Then we merge the data frames.

Before we will plot the heat map, we also have to specify a column with the data that determines the color of the countries on the map. We will use the growth rate of COVID-19 cases per day. The formula for the growth rate is ‘new cases’ / total present cases.

# We want to drop the history and only get the data from the last day
d = datetime.today()-timedelta(days=1)
date_yesterday = d.strftime("%Y-%m-%d")

# Preparing the data
covid_df = covid_df[covid_df['date'] == date_yesterday]

# Merge the two dataframes
merged_df = pd.merge(left=geo_df, right=covid_df, how='left', left_on='iso2_code', right_on='code')

# Delete some columns that we won't use
df = merged_df.drop(['day', 'month', 'year', 'country_y', 'code'], axis=1)

#Create the indicator values
df['case_growth_rate'] = round(df['cases']/df['cases_cum'], 2)
df['case_growth_rate'].fillna(0, inplace=True) 
df.head(3)
Dataset with geo-spatial data and COVID-19 data
Dataset with geo-spatial data and data on COVID-19

7 Creating a Geographic Heat Map

In the previous step we have setup the data for our map. Now, it’s time to create the heat map. Do this by running the following code.

# Print the map
# Set the range for the choropleth
title = 'Daily COVID-19 Growth Rates'
col = 'case_growth_rate'
source = 'Source: relataly.com \nGrowth Rate = New cases / All previous cases'
vmin = df[col].min()
vmax = df[col].max()
cmap = 'viridis'

# Create figure and axes for Matplotlib
fig, ax = plt.subplots(1, figsize=(20, 8))

# Remove the axis
ax.axis('off')
df.plot(column=col, ax=ax, edgecolor='0.8', linewidth=1, cmap=cmap)

# Add a title
ax.set_title(title, fontdict={'fontsize': '25', 'fontweight': '3'})

# Create an annotation for the data source
ax.annotate(source, xy=(0.1, .08), xycoords='figure fraction', horizontalalignment='left', 
            verticalalignment='bottom', fontsize=10)
            
# Create colorbar as a legend
sm = plt.cm.ScalarMappable(norm=plt.Normalize(vmin=vmin, vmax=vmax), cmap=cmap)

# Empty array for the data range
sm._A = []

# Add the colorbar to the figure
cbaxes = fig.add_axes([0.15, 0.25, 0.01, 0.4])
cbar = fig.colorbar(sm, cax=cbaxes)
Geographic heat map showing COVID-19 growth rates in different countries of the world
Geographic heat map showing COVID-19 growth rates in different countries of the world

There are different color palettes. You can simply use them by altering the cmap variable. Below is a sample of ready-to-use color scales. More color scales can be found on the matblotlib page.

Colormaps
Colormaps

8 Zooming in on Specific Regions

The map tells us that the highest COVID-19 growth rates are currently reported by countries in central Asia and in Africa. If we want to zoom in on a continent such as Africa, we need to filter our dataframe and only consider the columns with relevant countries. This is what we will do next for Africa. Luckily there is no need to manually enter the country codes for the filter operation. Instead we can use a list of country codes that I found on datahub.io.

The following code will filter the spatial-geo data to African countries. After this, we can plot the map using the the same code as before.

# The map shows that many african countries are currently reporting increasing case numbers
# Next we create a new df based on a filter for african countries
africa_country_list = ['ZM', 'BF', 'TZ', 'EG', 'UG', 'TN', 'TG', 'SZ', 'SD', 
                       'EH', 'SS', 'ZW', 'ZA', 'SO', 'SL', 'SC', 'SN', 'ST', 
                       'SH', 'RW', 'RE', 'GW', 'NG', 'NE', 'NA', 'MZ', 'MA', 
                       'MU', 'MR', 'ML', 'MW', 'MG', 'LY', 'LR', 'LS', 'KE', 
                       'CI', 'GN', 'GH', 'GM', 'GA', 'DJ', 'ER', 'ET', 'GQ', 
                       'BJ', 'CD', 'CG', 'YT', 'KM', 'TD', 'CF', 'CV', 'CM', 
                       'BI', 'BW', 'AO', 'DZ']
africa_map_df = df[df['iso2_code'].isin(africa_country_list)]

# Plot the map for Africa
title = 'COVID-19 Growth Rate per Day in Africa'
col = 'case_growth_rate'
source = 'Source: relataly.com \nGrowth Rate = New cases / All previous cases'
vmin = df[col].min()
vmax = df[col].max()
fig, ax = plt.subplots(1, figsize=(20, 9))
ax.axis('off')
africa_map_df.plot(column=col, ax=ax, edgecolor='0.8', linewidth=1, cmap=cmap)
ax.set_title(title, fontdict={'fontsize': '25', 'fontweight': '3'})
ax.annotate(source, xy=(0.24, .08), xycoords='figure fraction',
            horizontalalignment='left',
            verticalalignment='bottom', fontsize=10)
sm = plt.cm.ScalarMappable(norm=plt.Normalize(vmin=vmin, vmax=vmax), cmap=cmap)
cbaxes = fig.add_axes([0.35, 0.25, 0.01, 0.5])
{"type":"block","srcIndex":53,"srcClientId":"2ddd9666-6def-46e0-803e-4bf7b0366a27","srcRootClientId":""}cbar = fig.colorbar(sm, cax=cbaxes)
Geographic heat map of Africa showing COVID-19 growth rates in different countries
Geographic heat map of Africa showing COVID-19 growth rates in different countries

Voilá, now we only see the African continent. The map tells us, that the highest total case numbers in Africa are currently reported by South Africa, Algeria, Morocco, Kamerun, and Egypt.

Let’s take a look at the total cases per country in Africa:

# Insert cases per population
# Alternative: africa_map_df2['cases_population'] = round(africa_map_df['cases_cum'] / africa_map_df['population'] * 100)
africa_map_df2 = africa_map_df.copy()

# Remove NAs
africa_map_df2.loc[: , 'cases_cum'].fillna(0, inplace=True)

# Show the data
africa_map_df2.head()

# Plot the map
title = 'Total COVID-19 Cases on the African Continent'
col = 'cases_cum'
source = 'Source: relataly.com '
vmin = africa_map_df2[col].min()
vmax = africa_map_df2[col].max()
fig, ax = plt.subplots(1, figsize=(20, 9))
ax.axis('off')
africa_map_df2.plot(column=col, ax=ax, edgecolor='1', linewidth=1, cmap=cmap)
ax.set_title(title, fontdict={'fontsize': '25', 'fontweight' : '3'})
ax.annotate(
    source, xy=(0.24, .08), xycoords='figure fraction', horizontalalignment='left', 
    verticalalignment='bottom', fontsize=10)
sm = plt.cm.ScalarMappable(norm=plt.Normalize(vmin=vmin, vmax=vmax), cmap=cmap)
cbaxes = fig.add_axes([0.35, 0.25, 0.01, 0.5])
cbar = fig.colorbar(sm, cax=cbaxes)
Geographic heat map of Africa showing COVID-19 total cases in different countries
Geographic heat map of Africa showing COVID-19 total cases in different countries

The highest growth rate was reported by South Sudan, followed by Botsuana and Niger.

9 Saving a Geo-Heat Maps to Csv

If you want to save the map, you can do this with the following command.

# Safe the map to a png
fig.savefig('map_export.png', dpi=300)

Summary

This tutorial has guided you through the steps to create powerful geographic heat maps. You have learned how to prepare data for geographic heat maps by joining geo map information with other data frames as the foundation of a colorized geo map. In addition, this tutorial has shown how you can focus on certain geographical regions on a map by filtering the dataframe.

I hope you enjoyed the tutorial and if you found it helpful, please let me know in the comment section.

Author

  • Hi, my name is Florian! I am a Zurich-based Data Scientist with a passion for Artificial Intelligence and Machine Learning. After completing my PhD in Business Informatics at the University of Bremen, I started working as a Machine Learning Consultant for the swiss consulting firm ipt. When I'm not working on use cases for our clients, I work on own analytics projects and report on them in this blog.

Follow Florian Müller:

Data Scientist & Machine Learning Consultant

Hi, my name is Florian! I am a Zurich-based Data Scientist with a passion for Artificial Intelligence and Machine Learning. After completing my PhD in Business Informatics at the University of Bremen, I started working as a Machine Learning Consultant for the swiss consulting firm ipt. When I'm not working on use cases for our clients, I work on own analytics projects and report on them in this blog.

Leave a Reply