## About Pearson Correlation

The Pearson correlation coefficient r is a standard measure for the quantification of a linear relation. In other words, r is a measure of how strongly two continuous variables (for example price or volume) tend to make similar changes. For the Pearson correlation coefficient to return a meaningful value, the following conditions must be met:

- Both variables x and y are metrically scaled and continuous.
- The relationship between the two variables is approximately linear.
- The two samples of the variables x and y are independent of each other.

Correlation measures how much two variables are associated. The Pearson correlation is calculated by dividing the covariance of two variables (x, y) by their standard deviations.

## Interpreting the Pearson Correlation Coefficient

So how can we interpret the Pearson correlation coefficient? First of all, the value of r is restricted to the range between 1 and -1. Furthermore, we can differentiate the following cases:

- The closer r is to 1, the stronger is the relationship and the closer are points (xi / yi) on the regression line together.
- The closer r is to 0, the weaker is the correlation and the more widely spread are the points around the regression line.
- The extreme cases r = 1 or r = -1 result from a functional relation, which is defined by a linear equation of the form y = a + bx can be described exactly. In this case, all points (xi / yi) are located on the regression line.

Be aware, that the correlation coefficient is often subject to misinterpretion! For example, an empirical correlation coefficient whose value is > 0 merely states that a relation can be proven on the basis of a sample. However, it does not give the reason why this relation exists. In addition, if r ~ 0 does not mean that two variables are not related in general. Instead, it only means that no linear relation could be proven.

To learn more about the math behind correlation and covariance, I recommend you this YouTube tutorial.

## Implementing a Correlation Matrix in Python

Let’s get start with the coding part and develop a correlation matrix using Python that shows the correlation between COVID-19 cases and a selection of diverse financial assets. We will also include data on the spread of cases and COVID-19 casualties. This example was chosen, because it is currently very relevant.

### Prerequisites

This tutorial assumes that you have setup a python environment. In case you have not done this yet, you can follow this tutorial on how to setup the Anaconda Python environment.

Furthermore, it is assumed that you have the following packages installed: *numpy*, *pandas*, *matplot*, *sklearn* and seaborn. You can install these packages using using the console command:

pip install <packagename>conda install<packagename>(if you use the Anaconda Python environment)

### 1 Loading Data

We begin by loading historic data on COVID-19 cases and different financial assets.

#### 1.1 Loading Historic COVID-19 Data

First, we will download historic COVID-19 cases by using the Statworx API. This API provides historic time series data on the number of COVID-19 cases in different countries. In addition, the data contains the number of casualties. If you are not yet familiar with APIs, consider my recent tutorial on working with APIs in Python.

# Setup Pakages import pandas as pd import pandas_datareader as web import numpy as np import matplotlib as matplot from matplotlib import pyplot as plt import matplotlib.dates as mdates import matplotlib.cbook as cbook import requests import json from datetime import datetime import seaborn as sns # Load second Dataset with Corona Cases payload = {"code": "ALL"} URL = "https://api.statworx.com/covid" response = requests.post(url=URL, data=json.dumps(payload)) df_covid = pd.DataFrame.from_dict(json.loads(response.text)) # df_covid = df_covid[df_covid['code'] == 'US'] # convert index to date column df_covid["Date"] = pd.to_datetime(df_covid["date"]) # delete some columns that we won't use df_covid.drop( ["day", "month", "year", "country", "code", "population", "date"], axis=1, inplace=True, ) # Summarize cases over all countries df_covid = df_covid.groupby(["Date"]).sum()

#### 1.2 Loading Data on Selected Financial Assets

Next, we will make another API call to yahoo.finance to retrieve data on financial assets. The period of the data is limited to the first documented COVID-19 cases and the following assets are included:

**Stock Market Indexes**

- S&P500
- DAX
- Niki
- N225
- S&P500 Futures

**Stocks: Online Services**

- Amazon
- Netflix
- Apple
- Microsoft

**Stocks: Airlines**

- Lufthansa Stock
- American Airlines

**Resource** **Futures**

- Crude Oil Price
- Gold
- Soybean Price

**Treasury Bonds** **Futures**

- US Treasury Bonds

**Exchange Rates**

- EUR-USD
- CHF-EUR
- GBP-USD
- GBP-EUR

**Crypto Currencies**

- BTC-USD
- ETH-USD

# Read the data for different assets today_date = datetime.today().strftime("%Y-%m-%d") start_date = "2020-01-01" data_source = "yahoo" asset_list = [ ("SP500", "^GSPC"), ("DAX", "DAX"), ("N225", "^N225"), ("SP500FutJune20", "ES=F"), ("Lufthansa", "LHA.DE"), ("AmericanAirlines", "AAL"), ("Netflix", "NFLX"), ("Amazon", "AMZN"), ("Apple", "NFLX"), ("Microsoft", "MSFT"), ("Google", "GOOG"), ("BTCUSD", "BTC-USD"), ("ETHUSD", "ETH-USD"), ("Oil", "CL=F"), ("Gold", "GC=F"), ("Soybean", "ZS=F"), ("UsTreasuryBond", "ZB=F"), ("GBPEUR", "GBPEUR=X"), ("EURUSD", "EURUSD=X"), ("CHFEUR", "CHFEUR=X"), ("GBPUSD", "GBPUSD=X"), ] col_list = [] # Join the dataframes for i in asset_list: print(i[0]) col_list.append(i[0]) df_temp = web.DataReader( i[1], start=start_date, end=today_date, data_source=data_source ) df_temp.rename(columns={"Close": i[0]}, inplace=True) # Rename Close Column df_temp = df_temp[[i[0]]] # Select relevant columns df_temp.index = pd.to_datetime(df_temp.index) # convert index to Date Format # Merge with df_covid df_covid = pd.merge( left=df_covid, right=df_temp, how="inner", left_on="Date", right_on=df_temp.index, ) col_list.append("cases") col_list.append("deaths") col_list.append("cases_cum") col_list.append("deaths_cum")

Feel free to add the assets of your choice to the asset list. If you are looking for symbols of other financial assets, you can simply search them on finance.yahoo.com.

### 2 Plotting the Data

Once we have obtained fresh data from the web, it is a best practice to first visualize the data, before doing any transformations. For this reason, we will create histograms for all variables in our asset list.

# First we create two separate charts from pandas.plotting import register_matplotlib_converters f = 0 x = df_covid["Date"] list_length = len(col_list) nrows = 5 ncols = int(round(list_length / nrows, 0)) fig, ax = plt.subplots(nrows=nrows, ncols=ncols, sharex=True, figsize=(14, 14)) fig.subplots_adjust(hspace=0.5, wspace=0.5) for i in range(nrows): for j in range(ncols): if f < list_length: assetname = col_list[f] y = df_covid[assetname] f+=1 ax[i, j].plot(x, y) ax[i, j].set_title(assetname) ax[i, j].tick_params(axis="x", rotation=90, labelsize=10, length=0) plt.show()

If you compare the plotlines of different assets above, you can already identify pairs from which we might assume that they are correlated. But we cannot really be sure, until we measure the correlation.

### 3 Creating a Correlation Matrix

Now that we have the data available, it is time to create the stock market correlation matrix. The matrix shows the Pearson correlation coefficients between all pairs of variables (X, Y) contained in our dataset.

# Plotting a diagonal correlation matrix sns.set(style="white") # Compute the correlation matrix df = pd.DataFrame(df_covid, columns=col_list) corr = df.corr() corr

### 4 Visualizing the Correlation Matrix in a Heatmap and Interpretation

Heatmaps are a common practice to visualize the correlation between multiple variables in a matrix. A heatmap applies a color palette to represent numeric values on a scale in different colors. This is especially helpful when there are many values, because in the heatmap differences and similarities can be captured faster. In Python we can create a heatmap using the seaborn package.

# Generate a mask for the upper triangle mask = np.triu(np.ones_like(corr, dtype=np.bool)) # Set up the matplotlib figure f, ax = plt.subplots(figsize=(11, 9)) # Generate a custom diverging colormap cmap = 'RdBu' #cmap = sns.diverging_palette(1000, 500, as_cmap=True) # Draw the heatmap with the mask and correct aspect ratio sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0, square=True, linewidths=.5, cbar_kws={"shrink": .5})

The correlation matrix is symmetric. This is because the correlation between a pair of variables X and Y is the same as the correlation between Y and X.

### 5 Interpreting the correlation matrix

The heatmap uses a color palette that ranges from blue (positive correlation) over white (no correlation) to red (negative correlation). The different shades of the three colors visualize the extent of the correlation. Consequently, we will focus on the differences between the colors to differentiate pairs that are correlated from those that are not correlated or negatively correlated. In the following we will compare different asset classes step by step.

#### 5.1 Stock Market Indices / COVID-19

Let us start with the pairs of Stock market indices and COVID-19 data. As expected, the heatmap signals a negative correlation between the indices (DAX, S&P500, NIKI) and COVID-19. In other words, when the number of cases rises, stock market indices tend to fall in value. If we look exactly, the total number of new cases seems to be more correlated than the number of total cases (cases_cum) or deaths (deaths_cum). In addition, one can observe that the stock market indices are correlated among each other.

#### 5.2 Stock Market Indices / Online Service Provider Stocks

When we compare the stock markets with the shares of online service providers, the situation is heterogen. The shares of Microsoft and Google are positively correlated with the overall development of the markets. On the other hand, the shares of Netflix, Amazon and Apple are hardly correlated with the market development.

#### 5.3 Stock Market Indices / Airline Stocks

Airlines are particularly affected by the pandemic. Therefore it is only plausible that we observe a strong positive correlation between airline stocks and the stock market indices. This means, that airline stocks tend to follow the general market direction.

#### 5.4 Stock Market Indices / Crypto-Currencies

Next, we compare Cryptocurrencies with the stock market indices. The results are surprising. BTC-USD correlates surprisingly strong positive with the general development of the stock markets. For ETH-USD and the markets, however, the correlation is only slightly positive.

#### 5.5 COVID-19 / Currency Exchange Rates

Exchange rates and COVID-19 are rather weakly correlated. Also from the four exchange rates considered, only GBP/EUR and EUR/USD and GBP/USD show a slightly negative correlation. An exception is CHF/EUR, which shows a positive correlation to COVID-19 cases. This means, with the number of COVID-19 cases, the value of the Swiss Franc has increased against the EURO.

#### 5.6 Treasury Bonds / Resources

When we look at the coefficients of resources and US Treasury Bonds, we can observe a strong negative correlation of COVID-19 cases with the oil price and a strong positive correlation with the gold price.

#### 5.7 Crypto-Currencies / Resources

Finally, let us consider the coefficients of resources and cryptocurrencies. The only thing that is noticeable here is that BTCUSD correlates with the oil price. Based on the absence of a correlation with gold, one might conclude that BTC-USD is not a comparable crisis currency. However, other cryptocurrencies such as ETH-USD are relatively uncorrelated and were less affected by the recent market slump.

## Summary

Congratulation, you have reached the end of this tutorial. You have learned how to load data on COVID-19 and financial assets via an API into your Jupyter Notebook. You have also learned how to create a stock market correlation matrix that contains financial assets and COVID-19 numbers. Finally, we have interpreted the matrix and drawn some conclusions on the correlation of different variables.

I hope you found this tutorial useful. Should you have any questions or suggestions, let me know in the comments! 🙂

## Phil the Lobster

Awesome Dude!!