Getting Started with REST APIs in Python

0

APIs are a modern shortcut to consume data sources and services on the internet or make them available to others. Regardless of whether these sources are publicly available or reside in corporate networks, it is important for data scientists to understand how APIs can be used to access data sources or interact with services. In this tutorial I will therefore show how to use APIs to access data. This tutorials gives a brief introduction on how to use public APIs in Python to access interesting data sources. This tutorial covers the following two API packages:

  • In Part A we use the statworx API to access COVID-19 data
  • In the second Part, we use the Pandas Webreader to access Financial Data from Yahoo Finance

After completing this tutorial, you will know how make a request to an API and get interesting data that you can use directly in your Python project for statistics.

What are APIs?

In a wider sense, an API is a contract between the provider and the consumer of a web service, who communicate with each other and exchange data. The contract is necessary because communication can lead to misunderstandings, for example when one party sends information that is not as expected. To avoid such misunderstandings, the contract defines standards for what is communicated between the parties and how it is communicated. The standards are specified in the API documentation in the form of input and output parameters. It is because of this standardization that it is possible to automate communication between different parties.

Communication via an API

APIs can be designed using different standards. An important architectural style used to design APIs is the Representational State Transfer (REST) standard. REST has become very popular in recent year and is often considered a simpler and modern alternative to the traditional Simple Object Access Protocol (SOAP). Both SOAP and REST are based on established rules, compliance with which is the basis of automated information exchange.

About RESTful APIs

If you work with public APIs in Python, you will most likely – knowingly or unknowingly – use the REST protocol. A service provider that uses a REST API exposes an URL that receives requests. Requests made to the resource URL can have a payload in JSON, HTML, or XML format. A REST API will typically return the response in JSON format, but other formats, such as comma-separated values (CSV) are also possible. REST defines different HTTP methods (GET, HEAD, POST, PUT, PATCH, DELETE, CONNECT, OPTIONS and TRACE). However, the most important ones are:

  • GET: Used to receive data.
  • POST: Typically, used to send new data to the service provider. Sometimes also used to define how data should be retrieved, in subsequent requests.
  • PUT: Used to update data at the service provider.

There are various packages available for Python that offer build-in functionality for interacting with REST-based APIs. So we don’t need to program everything from scratch. Common packages are the pandas webreader and the requests package, which we will use in the following.

Two API Examples in Python

Prerequisites

Before we start the coding part, make sure that you have setup your Python 3 environment and required packages. If you don’t have an environment set up yet, you can follow this tutorial to setup the Anaconda environment.

Also make sure you install all required packages. In this tutorial, we will be working with the following standard packages: pandas, numpy, math and matplotlib

In addition, we will be working with the Requests Package, which is a standard http library used to interact with REST APIs. It provides functionality to send a HTTP request to an API and receive a response.

You can install packages using console commands:

  • pip install <package name>
  • conda install <package name> (if you are using the anaconda packet manager)

Example A: Pulling COVID-19 Data using the Statworx API

There are many APIs out there which provide more or less reliable data on COVID-19 cases. I found that a good one to use is api.statworx.com/covid. This API provides historic data on the number of COVID-19 cases. Since the API does not require an authentication key, it is quickly set up.

Step #1 Define the Payload

In the following, we will send a post request to the statworx.com API and get back COVID-19 data in JSON format as a response. We can then convert the response into a dataframe. With the following code we can send a HTTP request to the URL provided and will get the requested data back in JSON format:

# the data is provided from the european centre for disease prevention and control
from datetime import date
import requests
import json
import pandas as pd
# import matplotlib.pyplot as plt

# define the payload that will be sent to the api endpoint, and the endpoint url
# code defines the countries for which we will retrieve data
# to retrieve data for specific countries use e.g. {'country': 'Germany'}
payload = {"code": "ALL"}  
URL = "https://api.statworx.com/covid"

By altering the information in our request, we can change the data that we will get back in the response. For example, we have specified the country code to “US” in the payload. Thus, the response will contain only COVID-19 data for the US. In contrast, if we want to retrieve data for all countries, we can do this by setting the code to ‘ALL’.

Step #2 Call the REST API Endpoint

# call the api
response = requests.post(url=URL, data=json.dumps(payload))
response # if the request was successful, you should see a response code 200

Step #3 Convert the Data to a DataFrame

# convert the response data to a data frame
df = pd.DataFrame.from_dict(json.loads(response.text))
df

Step #4 Filter the Data

Now let’s make something out of the data and create a simple plot.

# convert date column to date format
df.loc[:, "date"] = pd.to_datetime(df["date"], format="%Y-%m-%d")

# filter specific countries
list_of_countries = ["Germany", "Switzerland", "France", "Spain", "Canada"]
df_1 = df[df["country"].isin(list_of_countries)]

# filter the data to a specific timeframe
df_new = df_1[df_1["date"] > "2020-01-15"]
df_new.head()

Step #5 Plot the Case Counts

# create separate plot lines for each country in the dataset
fig, ax1 = plt.subplots(figsize=(12, 8))
plt.ylabel("Total Cases", fontsize=20, color="black")
for countryname in list_of_countries:
    x = df_new[df_new["country"] == countryname]["date"]
    y = df_new[df_new["country"] == countryname]["cases_cum"]
    plt.plot(x, y, label=countryname)

plt.legend(list_of_countries, loc="upper left")
plt.show()

Example B: Access Financial Data using Pandas Webreader and the Yahoo Finance API

A second way to make API Calls is by using the webreader that is part of the famous Pandas package. The following code example demonstrates how to use this package in order to retrieve data on the German Stock Index DAX from yahoo finance.

Step #1: Define the API Request Parameters

# the pandas webreader provides remote data access to apis
import pandas_datareader as webreader

date_today = "2020-01-01"
date_start = "2010-01-01"

# set the symbol to bitcoin-usd quotes
# for more symbols check yahoo.finance.com
symbol = "BTC-USD"

Step #2: Send the Request to the API Endpoint

Pandas webreader supports a wide range of service providers, including the Worldbank, Eurostat, the OECD, and several stock markets such as the NASDAQ. Each source has its own data reader that requires specific input arguments. If you want to learn more about the supported service providers and Pandas Webreader, check out the pandas webreader documentation.

# now we will send the request to the yahoo finance api endpoint
df = webreader.DataReader(symbol, start=date_start, end=date_today, data_source="yahoo")
df.head(5)

Step #3: Save the Data to a CSV File

# finally we save the data to a csv file
df.to_csv("price_quotes.csv", index=False)

Summary

In this quick tutorial you have learned how to access data sources using REST APIs in Python. The blog post has presented two different ways of making API calls: First, we have used JSON http requests to retrieve COVID-19 data. And second, we have used Pandas Webreader to retrieve data on the German stock market index.

Let me know in the comments if you found the post helpful.

Author

  • Hi, I am a Zurich-based Data Scientist with a passion for Machine Learning and Investing. After completing my Ph.D. in Business Informatics at the University of Bremen, I started working as a Machine Learning Consultant for the swiss consulting firm ipt. When I'm not working on use cases for our clients, I work on my own analytics projects and report on them in this blog.

Follow Florian Müller:

Data Scientist & Machine Learning Consultant

Hi, I am a Zurich-based Data Scientist with a passion for Machine Learning and Investing. After completing my Ph.D. in Business Informatics at the University of Bremen, I started working as a Machine Learning Consultant for the swiss consulting firm ipt. When I'm not working on use cases for our clients, I work on my own analytics projects and report on them in this blog.

Leave a Reply