Correlation Matrix in Python: How Correlated are COVID-19 Cases and Different Financial Assets?

Since the sudden emergence of COVID-19, financial markets have experienced turbulent times. A sharp slump in the stock markets was followed by a sharp rise from mid-2020 onwards, bringing stock indices back to record highs by the end of the year. A particular beneficiary of the crisis is the tech industry. For instance, online services such as Amazon or Netflix have seen a sharp rise in the number of customers and have been able to increase their valuations during the pandemic. Is this only a coincidence, or are there financial assets that correlate with the number of COVID-19 cases? In this article, we will try to answer this question by creating a stock market correlation matrix in Python. The matrix will display the correlation between COVID-19 cases and different financial assets such as Bitcoin or the Apple Stock.

The rest of this blog is structured as follows: The first part gives a brief introduction to Pearson correlation and explains how to interpret it. The second part will then guide you through the steps to create a stock market correlation matrix in Python. In this matrix, we will measure the correlation between COVID-19 cases and several financial assets such as gold, Bitcoin and different stocks.

About Pearson Correlation

The Pearson correlation coefficient r is a standard measure for the quantification of a linear relation. In other words, r is a measure of how strongly two continuous variables (for example price or volume) tend to make similar changes. For the Pearson correlation coefficient to return a meaningful value, the following conditions must be met:

  • Both variables x and y are metrically scaled and continuous.
  • The relationship between the two variables is approximately linear.
  • The two samples of the variables x and y are independent of each other.

Correlation measures how much two variables are associated. The Pearson correlation is calculated by dividing the covariance of two variables (x, y) by their standard deviations.

The standard Formula for the Pearson Correlation Coefficient
Formula for the Pearson Correlation Coefficient r

Interpreting the Pearson Correlation Coefficient

So how can we interpret the Pearson correlation coefficient? First of all, the value of r is restricted to the range between 1 and -1. Furthermore, we can differentiate the following cases:

  • The closer r is to 1, the stronger is the relationship and the closer are points (xi / yi) on the regression line together.
  • The closer r is to 0, the weaker is the correlation and the more widely spread are the points around the regression line.
  • The extreme cases r = 1 or r = -1 result from a functional relation, which is defined by a linear equation of the form y = a + bx can be described exactly. In this case, all points (xi / yi) are located on the regression line.
Graphical representation of different correlation coefficients

Be aware, that the correlation coefficient is often subject to misinterpretion! For example, an empirical correlation coefficient whose value is > 0 merely states that a relation can be proven on the basis of a sample. However, it does not give the reason why this relation exists. In addition, if r ~ 0 does not mean that two variables are not related in general. Instead, it only means that no linear relation could be proven.

To learn more about the math behind correlation and covariance, I recommend you this YouTube tutorial.

Implementing a Correlation Matrix in Python

Let’s get start with the coding part and develop a correlation matrix using Python that shows the correlation between COVID-19 cases and a selection of diverse financial assets. We will also include data on the spread of cases and COVID-19 casualties. This example was chosen, because it is currently very relevant.

Prerequisites

This tutorial assumes that you have setup a python environment. In case you have not done this yet, you can follow this tutorial on how to setup the Anaconda Python environment.

Furthermore, it is assumed that you have the following packages installed: numpy, pandas, matplot, sklearn and seaborn. You can install these packages using using the console command:

pip install <packagename>
conda install <packagename> (if you use the Anaconda Python environment)

1 Loading Data

We begin by loading historic data on COVID-19 cases and different financial assets.

1.1 Loading Historic COVID-19 Data

First, we will download historic COVID-19 cases by using the Statworx API. This API provides historic time series data on the number of COVID-19 cases in different countries. In addition, the data contains the number of casualties. If you are not yet familiar with APIs, consider my recent tutorial on working with APIs in Python.

# Setup Pakages
import pandas as pd
import pandas_datareader as web
import numpy as np
import matplotlib as matplot
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
import matplotlib.cbook as cbook
import requests
import json
from datetime import datetime
import seaborn as sns

# Load second Dataset with Corona Cases
payload = {"code": "ALL"}
URL = "https://api.statworx.com/covid"
response = requests.post(url=URL, data=json.dumps(payload))
df_covid = pd.DataFrame.from_dict(json.loads(response.text))
# df_covid = df_covid[df_covid['code'] == 'US']

# convert index to date column
df_covid["Date"] = pd.to_datetime(df_covid["date"])

# delete some columns that we won't use
df_covid.drop(
    ["day", "month", "year", "country", "code", "population", "date"],
    axis=1,
    inplace=True,
)

# Summarize cases over all countries
df_covid = df_covid.groupby(["Date"]).sum()

1.2 Loading Data on Selected Financial Assets

Next, we will make another API call to yahoo.finance to retrieve data on financial assets. The period of the data is limited to the first documented COVID-19 cases and the following assets are included:

Stock Market Indexes

  • S&P500
  • DAX
  • Niki
  • N225
  • S&P500 Futures

Stocks: Online Services

  • Amazon
  • Netflix
  • Apple
  • Google
  • Microsoft

Stocks: Airlines

  • Lufthansa Stock
  • American Airlines

Resource Futures

  • Crude Oil Price
  • Gold
  • Soybean Price

Treasury Bonds Futures

  • US Treasury Bonds

Exchange Rates

  • EUR-USD
  • CHF-EUR
  • GBP-USD
  • GBP-EUR

Crypto Currencies

  • BTC-USD
  • ETH-USD
# Read the data for different assets
today_date = datetime.today().strftime("%Y-%m-%d")
start_date = "2020-01-01"
data_source = "yahoo"
asset_list = [
    ("SP500", "^GSPC"),
    ("DAX", "DAX"),
    ("N225", "^N225"),
    ("SP500FutJune20", "ES=F"),
    ("Lufthansa", "LHA.DE"),
    ("AmericanAirlines", "AAL"),
    ("Netflix", "NFLX"),
    ("Amazon", "AMZN"),
    ("Apple", "NFLX"),
    ("Microsoft", "MSFT"),
    ("Google", "GOOG"),
    ("BTCUSD", "BTC-USD"),
    ("ETHUSD", "ETH-USD"),
    ("Oil", "CL=F"),
    ("Gold", "GC=F"),
    ("Soybean", "ZS=F"),
    ("UsTreasuryBond", "ZB=F"),
    ("GBPEUR", "GBPEUR=X"),
    ("EURUSD", "EURUSD=X"),
    ("CHFEUR", "CHFEUR=X"),
    ("GBPUSD", "GBPUSD=X"),
]
col_list = []

# Join the dataframes
for i in asset_list:
    print(i[0])
    col_list.append(i[0])
    df_temp = web.DataReader(
        i[1], start=start_date, end=today_date, data_source=data_source
    )
    df_temp.rename(columns={"Close": i[0]}, inplace=True) # Rename Close Column
    df_temp = df_temp[[i[0]]] # Select relevant columns
    df_temp.index = pd.to_datetime(df_temp.index) # convert index to Date Format
    # Merge with df_covid
    df_covid = pd.merge(
        left=df_covid,
        right=df_temp,
        how="inner",
        left_on="Date",
        right_on=df_temp.index,
    )
col_list.append("cases")
col_list.append("deaths")
col_list.append("cases_cum")
col_list.append("deaths_cum")
Data on COVID-19 and financial assets
Data on COVID-19 and financial assets

Feel free to add the assets of your choice to the asset list. If you are looking for symbols of other financial assets, you can simply search them on finance.yahoo.com.

2 Plotting the Data

Once we have obtained fresh data from the web, it is a best practice to first visualize the data, before doing any transformations. For this reason, we will create histograms for all variables in our asset list.

# First we create two separate charts
from pandas.plotting import register_matplotlib_converters
f = 0
x = df_covid["Date"]
list_length = len(col_list)
nrows = 5
ncols = int(round(list_length / nrows, 0))
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, sharex=True, figsize=(14, 14))
fig.subplots_adjust(hspace=0.5, wspace=0.5)
for i in range(nrows):
    for j in range(ncols):
        if f < list_length:
            assetname = col_list[f]
            y = df_covid[assetname]
            f+=1
            ax[i, j].plot(x, y)
            ax[i, j].set_title(assetname)
            ax[i, j].tick_params(axis="x", rotation=90, labelsize=10, length=0)
plt.show()
Histograms on COVID-19 and Financial Assets
Histograms on COVID-19 and Financial Assets

If you compare the plotlines of different assets above, you can already identify pairs from which we might assume that they are correlated. But we cannot really be sure, until we measure the correlation.

3 Creating a Correlation Matrix

Now that we have the data available, it is time to create the stock market correlation matrix. The matrix shows the Pearson correlation coefficients between all pairs of variables (X, Y) contained in our dataset.

# Plotting a diagonal correlation matrix
sns.set(style="white")

# Compute the correlation matrix
df = pd.DataFrame(df_covid, columns=col_list)
corr = df.corr()
corr
Correlation Matrix

4 Visualizing the Correlation Matrix in a Heatmap and Interpretation

Heatmaps are a common practice to visualize the correlation between multiple variables in a matrix. A heatmap applies a color palette to represent numeric values on a scale in different colors. This is especially helpful when there are many values, because in the heatmap differences and similarities can be captured faster. In Python we can create a heatmap using the seaborn package.

# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(corr, dtype=np.bool))

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))

# Generate a custom diverging colormap
cmap = 'RdBu'
#cmap = sns.diverging_palette(1000, 500, as_cmap=True)

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0, square=True, linewidths=.5, cbar_kws={"shrink": .5})
Visualization of the Correlation Matrix in Form of a Heatmap
Visualization of the Correlation Matrix in Form of a Heatmap

The correlation matrix is symmetric. This is because the correlation between a pair of variables X and Y is the same as the correlation between Y and X.

5 Interpreting the correlation matrix

The heatmap uses a color palette that ranges from blue (positive correlation) over white (no correlation) to red (negative correlation). The different shades of the three colors visualize the extent of the correlation. Consequently, we will focus on the differences between the colors to differentiate pairs that are correlated from those that are not correlated or negatively correlated. In the following we will compare different asset classes step by step.

5.1 Stock Market Indices / COVID-19

Let us start with the pairs of Stock market indices and COVID-19 data. As expected, the heatmap signals a negative correlation between the indices (DAX, S&P500, NIKI) and COVID-19. In other words, when the number of cases rises, stock market indices tend to fall in value. If we look exactly, the total number of new cases seems to be more correlated than the number of total cases (cases_cum) or deaths (deaths_cum). In addition, one can observe that the stock market indices are correlated among each other.

5.2 Stock Market Indices / Online Service Provider Stocks

When we compare the stock markets with the shares of online service providers, the situation is heterogen. The shares of Microsoft and Google are positively correlated with the overall development of the markets. On the other hand, the shares of Netflix, Amazon and Apple are hardly correlated with the market development.

5.3 Stock Market Indices / Airline Stocks

Airlines are particularly affected by the pandemic. Therefore it is only plausible that we observe a strong positive correlation between airline stocks and the stock market indices. This means, that airline stocks tend to follow the general market direction.

5.4 Stock Market Indices / Crypto-Currencies

Next, we compare Cryptocurrencies with the stock market indices. The results are surprising. BTC-USD correlates surprisingly strong positive with the general development of the stock markets. For ETH-USD and the markets, however, the correlation is only slightly positive.

5.5 COVID-19 / Currency Exchange Rates

Exchange rates and COVID-19 are rather weakly correlated. Also from the four exchange rates considered, only GBP/EUR and EUR/USD and GBP/USD show a slightly negative correlation. An exception is CHF/EUR, which shows a positive correlation to COVID-19 cases. This means, with the number of COVID-19 cases, the value of the Swiss Franc has increased against the EURO.

5.6 Treasury Bonds / Resources

When we look at the coefficients of resources and US Treasury Bonds, we can observe a strong negative correlation of COVID-19 cases with the oil price and a strong positive correlation with the gold price.

5.7 Crypto-Currencies / Resources

Finally, let us consider the coefficients of resources and cryptocurrencies. The only thing that is noticeable here is that BTCUSD correlates with the oil price. Based on the absence of a correlation with gold, one might conclude that BTC-USD is not a comparable crisis currency. However, other cryptocurrencies such as ETH-USD are relatively uncorrelated and were less affected by the recent market slump.

Summary

Congratulation, you have reached the end of this tutorial. You have learned how to load data on COVID-19 and financial assets via an API into your Jupyter Notebook. You have also learned how to create a stock market correlation matrix that contains financial assets and COVID-19 numbers. Finally, we have interpreted the matrix and drawn some conclusions on the correlation of different variables.

I hope you found this tutorial useful. Should you have any questions or suggestions, let me know in the comments! 🙂

Author

  • Hi, my name is Florian! I am a Zurich-based Data Scientist with a passion for Artificial Intelligence and Machine Learning. After completing my PhD in Business Informatics at the University of Bremen, I started working as a Machine Learning Consultant for the swiss consulting firm ipt. When I'm not working on use cases for our clients, I work on own analytics projects and report on them in this blog.

Follow Florian Müller:

Data Scientist & Machine Learning Consultant

Hi, my name is Florian! I am a Zurich-based Data Scientist with a passion for Artificial Intelligence and Machine Learning. After completing my PhD in Business Informatics at the University of Bremen, I started working as a Machine Learning Consultant for the swiss consulting firm ipt. When I'm not working on use cases for our clients, I work on own analytics projects and report on them in this blog.

Leave a Reply