APIs are a modern shortcut to consume data sources and services on the internet or make them available to others. Regardless of whether these sources are publicly available or reside in corporate networks, it is important for data scientists to understand how APIs can be used to access data sources or interact with services. In this tutorial I will therefore show how to use APIs to access data. This tutorials gives a brief introduction on how to use public APIs in Python to access interesting data sources. This tutorial covers the following two API packages:
After completing this tutorial, you will know how make a request to an API and get interesting data that you can use directly in your Python project for statistics.
What are APIs?
In a wider sense, an API is a contract between the provider and the consumer of a web service, who communicate with each other and exchange data. The contract is necessary because communication can lead to misunderstandings, for example when one party sends information that is not as expected. To avoid such misunderstandings, the contract defines standards for what is communicated between the parties and how it is communicated. The standards are specified in the API documentation in the form of input and output parameters. It is because of this standardization that it is possible to automate communication between different parties.

APIs can be designed using different standards. An important architectural style used to design APIs is the Representational State Transfer (REST) standard. REST has become very popular in recent year and is often considered a simpler and modern alternative to the traditional Simple Object Access Protocol (SOAP). Both SOAP and REST are based on established rules, compliance with which is the basis of automated information exchange.
REST APIs
If you work with public APIs in Python, you will most likely – knowingly or unknowingly – use the REST protocol. A service provider that uses a REST API exposes an URL that receives requests. Requests made to the resource URL can have a payload in JSON, HTML, or XML format. A REST API will typically return the response in JSON format, but other formats, such as comma-separated values (CSV) are also possible. REST defines different HTTP methods (GET, HEAD, POST, PUT, PATCH, DELETE, CONNECT, OPTIONS and TRACE). However, the most important ones are:
- GET: Used to receive data.
- POST: Typically, used to send new data to the service provider. Sometimes also used to define how data should be retrieved, in subsequent requests.
- PUT: Used to update data at the service provider.
There are various packages available for Python that offer build-in functionality for interacting with REST-based APIs. So we don’t need to program everything from scratch. Common packages are the pandas webreader and the requests package, which we will use in the following.
Using REST APIs in Python
Prerequisites
This tutorial assumes that you have setup a python environment. In case you have not yet set the environment up, you can follow this tutorial on how to setup the Anaconda Python environment. Furthermore, it is assumed that you have the following packages installed: requests, json, pandas, matplotlib
Access Data using the Requests Package
The Requests Package is a http library and we will use it as a first method to make an API call. According to github statistics, it ranks among the most downloaded python packages. It provides functionality to send a HTTP request to an API and receive the response, e.g., in Json format.
There are many APIs out there which provide more or less reliable data on COVID-19 cases. I found that a good one to use is api.statworx.com/covid. This API provides historic data on the number of COVID-19 cases. Since the API does not require an authentication key, it is quickly set up.
In the following, we will send a post request to the statworx.com API and get back COVID-19 data in JSON format as a response. We can then convert the response into a dataframe. With the following code we can send a HTTP request to the URL provided and will get the requested data back in JSON format:
import requests import json # Call the API payload = {'code': 'US'} # or {'country': 'Germany'} # To retrieve data for all countries use {'code': 'ALL'} URL = 'https://api.statworx.com/covid' response = requests.post(url=URL, data=json.dumps(payload)) # Convert to data frame df = pd.DataFrame.from_dict(json.loads(response.text)) df.head()

By altering the information in our request, we can change the data that we will get back in the response. For example, we have specified the country code to “US” in the payload. Thus, the response will contain only COVID-19 data for the US. In contrast, if we want to retrieve data for all countries, we can do this by setting the code to ‘ALL’.
# We will get data for 'all' countries payload = {'code': 'ALL'} URL = 'https://api.statworx.com/covid' response = requests.post(url=URL, data=json.dumps(payload)) # Convert the response to a data frame df = pd.DataFrame.from_dict(json.loads(response.text)) # Convert date column to date format df.loc[:,'date']=pd.to_datetime(df['date'], format='%Y-%m-%d') # Filter specific countries # The following countries will be included list_of_countries = ['Germany', 'Switzerland', 'France', 'Spain', 'China', 'United_States_of_America', 'Canada'] dff = df[df['country'].isin(list_of_countries)] # Filter the data to a specific timeframe date_start = '2020-01-15' date_today = df[df['date']==df['date'].max()]['date'].iloc[0] dff = dff[dff['date'] > date_start]{"type":"block","srcIndex":18,"srcClientId":"435d5f11-36ab-4ab1-999a-9430218a7a55","srcRootClientId":""}
If you like, you can use the following code to export the dataframe with the COVID-19 data to a csv file. The csv file will then be exported to the folder of you python notebook.
# Save the file as csv df.to_csv('youfilename.csv', index = False)
Now let’s make something out of the data and create a simple plot.
# Plot the data import matplotlib.pyplot as plt import matplotlib.dates as mdates x = dff[dff['country']==list_of_countries[1]]['date'] years = mdates.YearLocator() fig, ax1 = plt.subplots(figsize=(16,10)) plt.ylabel('Total Cases', fontsize=20, color = 'black') plt.grid() for countryname in list_of_countries: print(countryname) countrydata = dff[dff['country']==countryname]['cases_cum'] plt.plot(x, countrydata, label=countryname) plt.legend(list_of_countries, loc='upper left') plt.show()

Access Financial Data using Pandas Webreader and the Yahoo Finance API
A second way to make API Calls is by using the webreader that is part of the famous Pandas package. The following code example demonstrates how to use this package in order to retrieve data on the German Stock Index DAX from yahoo finance.
import pandas as pd from datetime import date # Get the quote today = date.today() date_today = today.strftime("%Y-%m-%d") date_start = '2010-01-01' # Get S&P500 quote symbol = 'DAX' df = webreader.DataReader(symbol, start=date_start, end=date_today, data_source='yahoo') df.head(5)

Pandas webreader supports a wide range of service providers, including the Worldbank, Eurostat, the OECD, and several stock markets such as the NASDAQ. Each source has its own data reader that requires specific input arguments. If you want to learn more about the supported service providers and Pandas Webreader, check out the pandas webreader documentation.
Summary
In this quick tutorial you have learned how to access data sources using REST APIs in Python. The blog post has presented two different ways of making API calls: First, we have used JSON http requests to retrieve COVID-19 data. And second, we have used Pandas Webreader to retrieve data on the German stock market index.
I hope you learned something relevant. Let me know in the comments if you found the post helpful.
Visualizing Geographic Data – relataly.com
[…] Next, we retrieve the COVID-19 data for all countries via an API. If you are not yet familiar with using APIs, check out my previous post on using APIs. […]
Correlation Matrix in Python: How correlated are COVID-19 and Financial Assets? - relataly.com
[…] We begin by loading the data on COVID-19 cases via an API. The API is hosted by statworx and provides time series data on the number of COVID-19 cases in different countries. In addition, the data contains the number of casualties. If you are not yet familiar with APIs, check out my post on using APIs in Python. […]
Streaming Tweets and Images via the Twitter API v2
[…] If you haven’t worked with APIs before, it might be a good idea to first take a look at one of my previous posts in which I describe the basics of working with APIs in Python. […]