Visualizing Geographic Data

Creating Geographic Heat Maps to Visualize COVID-19 data using GeoPandas and Python

This article shows how to create geographic heat maps in Python using the GeoPandas library. Due to the worldwide spread of COVID-19, there is currently a great need to display region or country-specific information. Geographic heatmaps are particularly suitable for this purpose. They use color shadings to visualize data that includes a spatial component and refers, for example, to countries, cities, towns, mountains, etc. The color shades are defined in a color palette and determined by numerical values on a scale. In this way, geographic heat maps provide the viewer with a quick overview of what is happening in different map regions.

In the following, we will go through the steps to visualize COVID-19 data on a geographic heat map. Our heat map will use color shades to visualize growth rates and total cases of COVID-19 in different countries. To plot the maps, we will be using the GeoPandas library, an open-source project for working with geospatial data in Python.

Geographic heat map showing COVID-19 growth rates in different countries of the world
A geographic heat map, as we will create them in this article with Python.

Creating Geographic Heat Maps in Python

In the following, we implement several geographic heat maps using the GeoPandas package. GeoPandas extends the datatypes of Pandas so that they support spatial operations on geometric data types. In this way, Geopandas allows us to create maps without dealing with other dependencies, thus making it an excellent way to work with spatial information in python.

Prerequisites

Before we start the coding part, make sure that you have set up your Python 3 environment and required packages. If you don’t have an environment set up yet, you can follow the steps in this tutorial to set up the Anaconda environment.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages: 

You can install packages using console commands:

  • pip install <package name>
  • conda install <package name> (if you are using the anaconda packet manager)

We will create geographic heat maps with the GeoPandas Python library. You can install GeoPandas via the console by using the following command:

  • conda install –channel conda-forge geopandas
  • pip install geopandas

Update (23.9.2020): With the release of Python 3.8, there is a new install procedure:

  • conda create -n geo_env
  • conda activate geo_env
  • conda config --env --add channels conda-forge
  • conda config --env --set channel_priority strict
  • conda install python=3 geopandas

Download the Geographic Map Data

First, we will get the map with the geospatial data. Rendering maps with GeoPandas requires a shapefile. A shapefile is a dataframe with some graphical data attached. For instance, some shapefiles show cities, countries, continents, or maps of the whole world. The example presented in this tutorial will use a world map. So in our case, the shapefile is a list of countries, whereby each country has its graphical representation in polygons.

Various sources on the web provide shapefiles for different geographical regions and in varying detail. For example, natualearthdata.com provides a map of the world. To download the map, go to the natualearthdata webpage, and with a click on the green button, you can download version 4.1.0.

natualearthdata.com geographic shapefiles
natualearthdata.com

Once the download is complete, unpack the files into the folder of your Python notebook or a subfolder in the folder of your Python notebook (e.g., data/shapefiles/worldmap/).

Step #1 Loading the COVID-19 Data

Next, we retrieve the COVID-19 data for all countries via an API. If you are not yet familiar with using APIs, check out my previous article about using APIs.

# Setting up Packages
import json
import country_converter as coco
from datetime import datetime, timedelta
import requests
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

# Getting the data
PAYLOAD = {'code': 'ALL'}
URL = 'https://api.statworx.com/covid'
RESPONSE = requests.post(url=URL, data=json.dumps(PAYLOAD))

# Convert the response to a data frame
covid_df = pd.DataFrame.from_dict(json.loads(RESPONSE.text))
covid_df.head(3)
The covid-19 data used for visualization

We continue by preparing the COVID-19 data for visualizing them on a heat map.

Step #3 Specifying a Shapefile

The next step is to specify the shapefile for GeoPandas to plot the map. For this, we select the file path to our shapefile and loading the data via the GeoPandas read_file function.

# Setting the path to the shapefile
SHAPEFILE = 'data/shapefiles/worldmap/ne_10m_admin_0_countries.shp'

# Read shapefile using Geopandas
geo_df = gpd.read_file(SHAPEFILE)[['ADMIN', 'ADM0_A3', 'geometry']]

# Rename columns.
geo_df.columns = ['country', 'country_code', 'geometry']
geo_df.head(3)

As you can see above, we have created a dataframe with three columns. The column geometry contains the graphical representation of countries. With that, we are already all set up to plot our geographic map. We create the map by using the GeoPandas plot function.

# Drop row for 'Antarctica'. It takes a lot of space in the map and is not of much use
geo_df = geo_df.drop(geo_df.loc[geo_df['country'] == 'Antarctica'].index)

# Print the map
geo_df.plot(figsize=(20, 20), edgecolor='white', linewidth=1, color='lightblue')
world map vector data
World map

Step #4 Bringing It All Together

Next, we need to ensure that our data matches the country codes. The dataframe with the geospatial data of the world map contains country codes that adhere to iso3. However, our COVID-19 data uses iso2_codes. Luckily there is a country_converter available that does this job for us:

# Next, we need to ensure that our data matches with the country codes. 
iso3_codes = geo_df['country'].to_list()

# Convert to iso3_codes
iso2_codes_list = coco.convert(names=iso3_codes, to='ISO2', not_found='NULL')

# Add the list with iso2 codes to the dataframe
geo_df['iso2_code'] = iso2_codes_list

# There are some countries for which the converter could not find a country code. 
# We will drop these countries.
geo_df = geo_df.drop(geo_df.loc[geo_df['iso2_code'] == 'NULL'].index)

We have a list with the names (country) and codes (country_code) of all nations. An additional column includes the geographical representation for each country.

Step #5 Preprocessing

Our COVID-19 data so far contains the historical cases. We want to drop these historical cases and only get the data from the last day. Then we merge the data frames.

Before we plot the heat map, we have to specify a variable that determines the color of the countries on the map. Our goal is to color the countries depending on the growth rate of COVID-19 cases per day. The formula for the growth rate is ‘new cases’ / total present cases.

# We want to drop the history and only get the data from the last day
d = datetime.today()-timedelta(days=1)
date_yesterday = d.strftime("%Y-%m-%d")

# Preparing the data
covid_df = covid_df[covid_df['date'] == date_yesterday]

# Merge the two dataframes
merged_df = pd.merge(left=geo_df, right=covid_df, how='left', left_on='iso2_code', right_on='code')

# Delete some columns that we won't use
df = merged_df.drop(['day', 'month', 'year', 'country_y', 'code'], axis=1)

#Create the indicator values
df['case_growth_rate'] = round(df['cases']/df['cases_cum'], 2)
df['case_growth_rate'].fillna(0, inplace=True) 
df.head(3)
Dataset with geo-spatial data and data on COVID-19

Step #6 Creating a Geographic Heat Map

In the previous step, we have set up the data for our map. Next, we create the geographical heat map.

# Print the map
# Set the range for the choropleth
title = 'Daily COVID-19 Growth Rates'
col = 'case_growth_rate'
source = 'Source: relataly.com \nGrowth Rate = New cases / All previous cases'
vmin = df[col].min()
vmax = df[col].max()
cmap = 'viridis'

# Create figure and axes for Matplotlib
fig, ax = plt.subplots(1, figsize=(20, 8))

# Remove the axis
ax.axis('off')
df.plot(column=col, ax=ax, edgecolor='0.8', linewidth=1, cmap=cmap)

# Add a title
ax.set_title(title, fontdict={'fontsize': '25', 'fontweight': '3'})

# Create an annotation for the data source
ax.annotate(source, xy=(0.1, .08), xycoords='figure fraction', horizontalalignment='left', 
            verticalalignment='bottom', fontsize=10)
            
# Create colorbar as a legend
sm = plt.cm.ScalarMappable(norm=plt.Normalize(vmin=vmin, vmax=vmax), cmap=cmap)

# Empty array for the data range
sm._A = []

# Add the colorbar to the figure
cbaxes = fig.add_axes([0.15, 0.25, 0.01, 0.4])
cbar = fig.colorbar(sm, cax=cbaxes)
Geographic heat map showing COVID-19 growth rates in different countries of the world
Geographic heat map showing COVID-19 growth rates in different countries of the world

As shown in the map above, the highest COVID-19 growth rates are currently reported by countries in Central Asia and Africa.

There are different color palettes. You can use them by altering the cmap variable. Below is a sample of ready-to-use color scales. You can find more color scales on the matblotlib page.

Color scales
Colormaps

Step #7 Zooming in on Specific Regions

We can zoom in on a continent or a country by filtering our dataframe. In the following, we create a geographic map specifically for Africa. Luckily there is no need to enter the country codes for the filter operation manually. Instead, we can use a list of country codes that I found on datahub.io.

The following code will filter the spatial-geo data to African countries. After this, we can plot the map using the same code as before.

# The map shows that many african countries are currently reporting increasing case numbers
# Next we create a new df based on a filter for african countries
africa_country_list = ['ZM', 'BF', 'TZ', 'EG', 'UG', 'TN', 'TG', 'SZ', 'SD', 
                       'EH', 'SS', 'ZW', 'ZA', 'SO', 'SL', 'SC', 'SN', 'ST', 
                       'SH', 'RW', 'RE', 'GW', 'NG', 'NE', 'NA', 'MZ', 'MA', 
                       'MU', 'MR', 'ML', 'MW', 'MG', 'LY', 'LR', 'LS', 'KE', 
                       'CI', 'GN', 'GH', 'GM', 'GA', 'DJ', 'ER', 'ET', 'GQ', 
                       'BJ', 'CD', 'CG', 'YT', 'KM', 'TD', 'CF', 'CV', 'CM', 
                       'BI', 'BW', 'AO', 'DZ']
africa_map_df = df[df['iso2_code'].isin(africa_country_list)]

# Plot the map for Africa
title = 'COVID-19 Growth Rate per Day in Africa'
col = 'case_growth_rate'
source = 'Source: relataly.com \nGrowth Rate = New cases / All previous cases'
vmin = df[col].min()
vmax = df[col].max()
fig, ax = plt.subplots(1, figsize=(20, 9))
ax.axis('off')
africa_map_df.plot(column=col, ax=ax, edgecolor='0.8', linewidth=1, cmap=cmap)
ax.set_title(title, fontdict={'fontsize': '25', 'fontweight': '3'})
ax.annotate(source, xy=(0.24, .08), xycoords='figure fraction',
            horizontalalignment='left',
            verticalalignment='bottom', fontsize=10)
sm = plt.cm.ScalarMappable(norm=plt.Normalize(vmin=vmin, vmax=vmax), cmap=cmap)
cbaxes = fig.add_axes([0.35, 0.25, 0.01, 0.5])
{"type":"block","srcIndex":53,"srcClientId":"2ddd9666-6def-46e0-803e-4bf7b0366a27","srcRootClientId":""}cbar = fig.colorbar(sm, cax=cbaxes)
Geographic heat map of Africa showing COVID-19 growth rates in different countries
Geographic heat map of Africa showing COVID-19 growth rates in different countries

Voilá, now we only see the African continent. The map shows that the countries in Africa that currently report the highest total case numbers are South Africa, Algeria, Morocco, Kamerun, and Egypt.

Let’s take a look at the total cases per country in Africa:

# Insert cases per population
# Alternative: africa_map_df2['cases_population'] = round(africa_map_df['cases_cum'] / africa_map_df['population'] * 100)
africa_map_df2 = africa_map_df.copy()

# Remove NAs
africa_map_df2.loc[: , 'cases_cum'].fillna(0, inplace=True)

# Show the data
africa_map_df2.head()

# Plot the map
title = 'Total COVID-19 Cases on the African Continent'
col = 'cases_cum'
source = 'Source: relataly.com '
vmin = africa_map_df2[col].min()
vmax = africa_map_df2[col].max()
fig, ax = plt.subplots(1, figsize=(20, 9))
ax.axis('off')
africa_map_df2.plot(column=col, ax=ax, edgecolor='1', linewidth=1, cmap=cmap)
ax.set_title(title, fontdict={'fontsize': '25', 'fontweight' : '3'})
ax.annotate(
    source, xy=(0.24, .08), xycoords='figure fraction', horizontalalignment='left', 
    verticalalignment='bottom', fontsize=10)
sm = plt.cm.ScalarMappable(norm=plt.Normalize(vmin=vmin, vmax=vmax), cmap=cmap)
cbaxes = fig.add_axes([0.35, 0.25, 0.01, 0.5])
cbar = fig.colorbar(sm, cax=cbaxes)
Geographic heat map of Africa showing COVID-19 total cases in different countries
Geographic heat map of Africa showing COVID-19 total cases in different countries

The highest growth rate was reported by South Sudan, followed by Botswana and Niger.

Step #8 Saving a Geo-Heat Maps to CSV

If you want to save the map, you can do this with the following command.

# Safe the map to a png
fig.savefig('map_export.png', dpi=300)

Summary

In this article, we have created geographic heat maps using GeoPandas in Python. We have prepared spatial data and color-coded the maps using COVID-19 data. In addition, you have learned how to create maps for specific geographical regions by filtering the dataframe. With this knowledge, you are well equipped to use geographic maps to visualize other spatial data.

I hope this article was helpful. If you have any questions or remarks, please write them in the comments.

Author

  • Hi, I am Florian, a Zurich-based consultant for AI and Data. Since the completion of my Ph.D. in 2017, I have been working on the design and implementation of ML use cases in the Swiss financial sector. I started this blog in 2020 with the goal in mind to share my experiences and create a place where you can find key concepts of machine learning and materials that will allow you to kick-start your own Python projects.

Leave a Reply