Accessing Remote Data Sources via REST APIs in Python

REST APIs provide straightforward access to remote data sources. It is therefore essential for data scientists to understand how they work and how to use them. This article briefly describes how to access remote data sources via REST APIs in Python.

The article proceeds as follows: We begin with a brief introduction to RESTful APIs. Then, we will look at two examples of how to access remote APIs. We will define API requests, learn about parameters, and request data from an API. We will handle the response and use either save it to a dataframe or a local CSV file. Two specific examples are given:

What is an API?

An API (Application Programming Interface) is a modern shortcut to consume remote data sources and services on the internet or make them available to others. Regardless of whether these sources are publicly available or reside in corporate networks, data scientists need to understand how to work with API.

In the broader sense, an API is a contract between the provider and the consumer of a web service, who communicate and exchange data. The agreement is necessary because communication can lead to misunderstandings, for example, when one party sends information that is not as expected. The contract avoids such misunderstandings by defining standards for what the parties communicate and how it is communicated.

A popular architectural style used to design APIs is the Representational State Transfer (REST) standard. REST has become very popular in recent years and is often considered a more straightforward and modern alternative to the traditional Simple Object Access Protocol (SOAP). Both SOAP and REST are based on established rules, compliance with which is the basis of automated information exchange. However, in data science, REST is now most common.

Working with REST APIs in Python is easy
Communication via an API

About RESTful APIs

If you work with public APIs in Python, you will most likely use the REST protocol. A service provider that operates a REST API exposes an URL that receives requests. Requests made to the resource URL can have a payload in JSON, HTML, or XML format. A REST API will typically return the response in JSON format, but other formats, such as comma-separated values (CSV), are also possible. REST defines different HTTP methods (GET, HEAD, POST, PUT, PATCH, DELETE, CONNECT, OPTIONS and TRACE). However, the most important ones are:

  • GET: Used to request data from a data provider.
  • POST: Typically, used to send new data to the service provider. Sometimes also used to define how data should be retrieved, in subsequent requests.
  • PUT: Used to update data at the service provider.

There are various packages available for Python that offer build-in functionality for interacting with REST-based APIs. So we don’t need to program everything from scratch. Standard packages are the pandas webreader and the requests package, which we will use in the following.

Two REST API Examples in Python

Prerequisites

Before we start the coding part, make sure that you have set up your Python 3 environment and required packages. If you don’t have an environment set up yet, you can follow the steps in this tutorial to set up the Anaconda environment.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages: 

In addition, we will be working with the Requests Package, a standard HTTP library used to interact with REST APIs. It provides functionality to send an HTTP request to an API and receive a response.

You can install packages using console commands:

  • pip install <package name>
  • conda install <package name> (if you are using the anaconda packet manager)

Example A: Pulling COVID-19 Data using the Statworx API

There are many APIs out there that provide more or less reliable data on COVID-19 cases. I found that a good one to use is api.statworx.com/covid. This API offers historical data on the number of COVID-19 cases. The API is very accessible since it does not require an authentication key.

We will call this API using the request package, a standard package to interact with APIs in Python. An alternative would be to use a Python package that provides API-specific functions to interact with the API. We will look at this case in the second example.

Step #1 Define the Payload

In the following, we send a post request to the statworx.com API and get back COVID-19 data in JSON format as a response. We can then convert the response into a dataframe. With the following code, we can send an HTTP request to the URL provided and will get the requested data back in JSON format:

# the data is provided from the european centre for disease prevention and control
from datetime import date
import requests
import json
import pandas as pd
# import matplotlib.pyplot as plt

# define the payload that will be sent to the api endpoint, and the endpoint url
# code defines the countries for which we will retrieve data
# to retrieve data for specific countries use e.g. {'country': 'Germany'}
payload = {"code": "ALL"}  
URL = "https://api.statworx.com/covid"

We can change the data to get back in the response by altering the parameters in our request. For example, we have specified the country code to “US” in the payload. Thus, the response contains only COVID-19 data for the US. If we want data for all countries, we need to set the code to ALL.

Step #2 Call the REST API Endpoint

Next, we define the URL in our API request.

# call the api
response = requests.post(url=URL, data=json.dumps(payload))
response # if the request was successful, you should see a response code 200

Step #3 Convert the Data to a DataFrame

The API call returns a response in JSON format. A common way of handling the response from an API is to convert it into a Dataframe and then proceed to process it further.

# convert the response data to a data frame
df = pd.DataFrame.from_dict(json.loads(response.text))
df

Step #4 Filter the Data

Now let’s make something out of the data and create a simple plot.

# convert date column to date format
df.loc[:, "date"] = pd.to_datetime(df["date"], format="%Y-%m-%d")

# filter specific countries
list_of_countries = ["Germany", "Switzerland", "France", "Spain", "Canada"]
df_1 = df[df["country"].isin(list_of_countries)]

# filter the data to a specific timeframe
df_new = df_1[df_1["date"] > "2020-01-15"]
df_new.head()

Step #5 Plot the Case Counts

Finally, we can create a plot to visualize the data that we have received from the API.

# create separate plot lines for each country in the dataset
fig, ax1 = plt.subplots(figsize=(12, 8))
plt.ylabel("Total Cases", fontsize=20, color="black")
for countryname in list_of_countries:
    x = df_new[df_new["country"] == countryname]["date"]
    y = df_new[df_new["country"] == countryname]["cases_cum"]
    plt.plot(x, y, label=countryname)

plt.legend(list_of_countries, loc="upper left")
plt.show()

Example B: Access Financial Data using Pandas Webreader and the Yahoo Finance REST API in Python

A second way of using REST APIs in Python is with an API-specific Python package. Many API providers offer such standard packages. These facilitate the interaction with the API by providing customized functions and parameters.

An example of such as package is the Pandas Datareader package, which provides functions that let us easily interact with several popular remote data sources on the web. In this example, we use the pandas_webreader to request data from the German stock market index DAX from the Yahoo finance API.

Step #1: Define the API Request Parameters

Many APIs enable parameters to specify which data the API should return. In our case, the Yahoo Finance API provides the option to limit the period for which we want to retrieve price data. Furthermore, we can define the ticker symbol for the financial instrument, for which we wish to request the price data. The ticker symbol for the German stock market index is ^GDAXI. If you want to retrieve price data for other stocks or indices, you can search for the respective ticker symbols on Yahoo finance.

# the pandas webreader provides remote data access to apis
import pandas_datareader as webreader

date_today = "2020-01-01"
date_start = "2010-01-01"

# set the symbol to bitcoin-usd quotes
# for more symbols check yahoo.finance.com
symbol = "^GDAXI"

Step #2: Send the Request to the REST API Endpoint

Pandas Webreader supports several remote data providers, including the Worldbank, Eurostat, the OECD, and several stock markets such as the NASDAQ. Each source has a separate data reader class that takes specific input arguments. If you want to learn more about the supported service providers and Pandas Webreader, check out the pandas webreader documentation.

# now we will send the request to the yahoo finance api endpoint
df = webreader.DataReader(symbol, start=date_start, end=date_today, data_source="yahoo")
df.head(5)
output of the request to the REST API
Dataframe with the response from the REST API

Step #3: Save the Data to a CSV File

Finally, we will save the data to a local CSV file.

# finally we save the data to a csv file
df.to_csv("price_quotes.csv", index=False)

Summary

This article has shown how to access remote data sources using REST APIs in Python. We have looked into two different ways of requesting remote data: First, we used JSON HTTP requests to retrieve COVID-19 data from the statworx API and save it to a Pandas DataFrame. In the second example, we requested data on the German stock market index using the Pandas Datareader.

I hope this post was helpful. If you have any remarks or questions, let me know in the comments.

Author

  • Hi, I am Florian, a Zurich-based consultant for AI and Data. Since the completion of my Ph.D. in 2017, I have been working on the design and implementation of ML use cases in the Swiss financial sector. I started this blog in 2020 with the goal in mind to share my experiences and create a place where you can find key concepts of machine learning and materials that will allow you to kick-start your own Python projects.

Leave a Reply