Accessing Remote Data Sources via REST APIs in Python

REST APIs provide straightforward access to remote data sources. Data scientists should learn about REST APIs because APIs (Application Programming Interfaces) are an important way for data scientists to access data from other sources. By using REST APIs, data scientists can access data from a wide range of sources, including databases, web services, and other applications. This allows data scientists to easily integrate data from different sources and use it in their analysis and modeling. Additionally, knowing how to use REST APIs can also be useful for data scientists who want to share their own data or models with others by building APIs that can be accessed by other applications. This article briefly describes how to access remote data sources via REST APIs in Python.

The article proceeds as follows: We begin with a brief introduction to RESTful APIs. Then, we will look at two examples of how to access remote APIs. We will define API requests, learn about parameters, and request data from an API. We will handle the response and save it to a dataframe or a local CSV file.

Also: Streaming Tweets and Images via the Twitter API in Python

What is an API?

In Data Science, REST APIs are often used as a modern shortcut to consume remote data sources and services on the internet or make them available to others. Whether these sources are publicly available or reside in corporate networks, data scientists need to understand how to work with API. In the broader sense, an API is a contract between the provider and the consumer of a web service, who communicate and exchange data. The agreement is necessary because communication can lead to misunderstandings, for example, when one party sends information that is not as expected. The contract avoids such misunderstandings by defining standards for what the parties communicate and how it is communicated.

What are RESTful APIs?

A popular architectural style used to design APIs is the Representational State Transfer (REST) standard. REST has become very popular in recent years and is often considered a more straightforward and modern alternative to the traditional Simple Object Access Protocol (SOAP). Both SOAP and REST are based on established rules, compliance with which is the basis of automated information exchange. However, in data science, REST is now the most common.

Working with REST APIs in Python
Communication via an API

If you work with public APIs in Python, you will most likely use the REST protocol. A service provider that operates a REST API exposes an URL that receives requests. Requests to the resource URL can have a JSON, HTML, or XML format payload. A REST API will typically return the response in JSON format, but other formats, such as comma-separated values (CSV), are also possible. REST defines different HTTP methods (GET, HEAD, POST, PUT, PATCH, DELETE, CONNECT, OPTIONS and TRACE). However, the most important ones are:

  • GET: Used to request data from a data provider.
  • POST: Typically used to send new data to the service provider. Sometimes also used to define the data that the APIs returns in response to subsequent requests.
  • PUT: Used to update data at the service provider.

Next, let’s look at how we can interact with RET APIs in Python.

How can we interact with REST APIs in Python?

There are several ways to access REST APIs in Python, including:

  1. Using the requests library: The requests library is a popular Python library for making HTTP requests. It provides a simple, intuitive API for sending and receiving requests, and supports a wide range of HTTP methods, such as GET, POST, and DELETE.
  2. Using the urllib library: The urllib library is a built-in Python library for working with URLs and HTTP requests. It provides a lower-level API than the requests library but is still relatively easy to use and allows you to customize your requests in more detail.
  3. Using the HTTP.client library: The HTTP.client library is another built-in Python library for working with HTTP requests. It provides a more comprehensive and robust set of tools for working with HTTP requests but may require more programming effort to use than the other options.
  4. Service-specific libraries: Many popular online services can be accessed using Python libraries that offer built-in functionality for interacting with REST-based APIs. For example, Twitter, Reddit, or Yahoo Finance offer libraries that make interacting with their REST APIs much easier.

The choice of which Python library to use for accessing REST APIs will depend on your specific needs and preferences. You may want to try out a few different options and see which one works best for your project. In the following tutorial, we will work with the requests library.

Requesting historical COVID-19 Data from the Statworx API with the Request Library

There are many APIs out there that provide more or less reliable data on COVID-19 cases. A good one to use is api.statworx.com/covid. This API offers historical data on the number of COVID-19 cases. The API is very accessible since it does not require an authentication key.

In the following, we will go through two different cases on how to request data from REST APIs in Python. First, we use the Requests library to request COVID-19 data from the statworx library. We will call this API using the request package, a standard package to interact with APIs in Python. An alternative would be to use a Python package that provides API-specific functions to interact with the API. We will look at this case in the second example.

The code is available on the GitHub repository.

Prerequisites

Before starting the coding part, ensure you have set up your Python 3 environment and required libraries. If you don’t have an environment, follow the steps in this tutorial to set up the Anaconda environment.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages: 

You can install packages using console commands:

  • pip install <package name>
  • conda install <package name> (if you are using the anaconda packet manager)

In addition, we will be working with the Requests library. The requests library is a standard HTTP library that interacts with REST APIs. It provides functionality to send an HTTP request to an API and receive a response. Since the requests library is a native Python, you usually won’t need to install it.

Step #1 Define the Payload

In order to make a request to an API, it is necessary to specify the URL that the request should be sent to. This URL is typically provided by the API documentation and may include certain parameters or queries to specify the specific data that you want to retrieve. For example, you might want to specify a specific date range or filter the results by a certain category.

In the following, we send a post request to the statworx.com API and get back COVID-19 data in JSON format as a response. We can then convert the response into a dataframe. With the following code, we can send an HTTP request to the URL provided and will get the requested data back in JSON format:

# the data is provided from the european centre for disease prevention and control
from datetime import date
import requests
import json
import pandas as pd
# import matplotlib.pyplot as plt

# define the payload that will be sent to the api endpoint, and the endpoint url
# code defines the countries for which we will retrieve data
# to retrieve data for specific countries use e.g. {'country': 'Germany'}
payload = {"code": "ALL"}  
URL = "https://api.statworx.com/covid"

We can change the data to get back in the response by altering the parameters in our request. For example, we have specified the country code as “US” in the payload. Thus, the response contains only COVID-19 data for the US. If we want data for all countries, we need to set the code to ALL.

Step #2 Call the REST API Endpoint

Once you have defined the URL, you can use Python’s built-in “requests” library to send an HTTP request to the API and retrieve the data. This can be done using the “get” method, which sends a GET request to the specified URL and returns a response object containing the data returned by the API.

It is also important to consider any authentication or authorization requirements that the API may have. The statworx API does not require authentication. However, some APIs may require you to provide a valid API key or other credentials in order to access the data. These requirements will also be specified in the API documentation.

# call the api
response = requests.post(url=URL, data=json.dumps(payload))
response # if the request was successful, you should see a response code 200
<Response [200]>

Step #3 Convert the Data to a DataFrame

When making an API request using Python, the API will typically return a response in JSON format. JSON (JavaScript Object Notation) is a lightweight data interchange format that is easy for humans to read and write and easy for machines to parse and generate. It is often used as a default data format in REST APIs because it is easy to work with and can be easily converted into other data formats such as CSV or Pandas DataFrames.

A common way to handle the response from an API is to convert it into a DataFrame using the “JSON” method of the Pandas library. This method takes a JSON object or a list of JSON objects and converts it into a DataFrame. Once the data is in a DataFrame, it can be easily processed and analyzed using the various tools and functions provided by Pandas.

It is also possible to directly parse the JSON response using Python’s built-in “JSON” module. This can be useful if you want to extract specific pieces of data from the response or if you want to perform additional processing on the data before converting it into a DataFrame.

# convert the response data to a data frame
df = pd.DataFrame.from_dict(json.loads(response.text))
df
	date		day	month	year	cases	deaths	country		code	population	continent	cases_cum	deaths_cum
0	2019-12-31	31	12		2019	0		0		Afghanistan	AF		38041757.0	Asia		0			0
1	2020-01-01	01	01		2020	0		0		Afghanistan	AF		38041757.0	Asia		0			0
2	2020-01-02	02	01		2020	0		0		Afghanistan	AF		38041757.0	Asia		0			0
3	2020-01-03	03	01		2020	0		0		Afghanistan	AF		38041757.0	Asia		0			0
4	2020-01-04	04	01		2020	0		0		Afghanistan	AF		38041757.0	Asia		0			0
...	...			...	...		...		...		...		...			...		...			...			...			...

Step #4 Filter the Data

Now let’s make something out of the data and create a simple plot. Our goal is to create a lineplot that shows how cases have developed for different countries. As part of the data preprocessing, we will reduce the data to include a smaller number of selected countries.

# convert date column to date format
df.loc[:, "date"] = pd.to_datetime(df["date"], format="%Y-%m-%d")

# filter specific countries
list_of_countries = ["Germany", "Switzerland", "France", "Spain", "Canada"]
df_1 = df[df["country"].isin(list_of_countries)]

# filter the data to a specific timeframe
df_new = df_1[df_1["date"] > "2020-01-15"]
df_new.head()
		date		day	month	year	cases	deaths	country	code	population	continent	cases_cum	deaths_cum
10332	2020-01-16	16	01		2020	0		0		Canada	CA		37411038.0	America		0			0
10333	2020-01-17	17	01		2020	0		0		Canada	CA		37411038.0	America		0			0
10334	2020-01-18	18	01		2020	0		0		Canada	CA		37411038.0	America		0			0
10335	2020-01-19	19	01		2020	0		0		Canada	CA		37411038.0	America		0			0
10336	2020-01-20	20	01		2020	0		0		Canada	CA		37411038.0	America		0			0

Step #5 Plot the Case Counts

Finally, we use Matplotlib to create a lineplot that visualizes the data that we have received from the API.

# create separate plot lines for each country in the dataset
fig, ax1 = plt.subplots(figsize=(12, 8))
plt.ylabel("Total Cases", fontsize=20, color="black")
for countryname in list_of_countries:
    x = df_new[df_new["country"] == countryname]["date"]
    y = df_new[df_new["country"] == countryname]["cases_cum"]
    plt.plot(x, y, label=countryname)

plt.legend(list_of_countries, loc="upper left")
plt.show()

If your chart looks similar to the one above, you can be confident that you have successfully loaded the data into our project.

Summary

In conclusion, this tutorial has demonstrated how to use REST APIs in Python to access and retrieve data from remote sources. We used the statworx API to fetch COVID-19 data and used the Pandas library to convert the data into a DataFrame. We also created a small sample plot to visualize the number of COVID-19 cases in different countries.

By following the steps outlined in this tutorial, you should now be able to use Python to access and work with data from a variety of sources using REST APIs. This can be a powerful tool for data scientists and analysts, allowing them to easily retrieve and analyze data from a wide range of sources in order to gain insights and make informed decisions.

I hope this post was helpful. If you have any remarks or questions, let me know.

Sources and Further Reading

  1. statworx API
  2. https://pypi.org/project/requests/

Relataly API Tutorials

Author

  • Florian Follonier

    Hi, I am Florian, a Zurich-based Cloud Solution Architect for AI and Data. Since the completion of my Ph.D. in 2017, I have been working on the design and implementation of ML use cases in the Swiss financial sector. I started this blog in 2020 with the goal in mind to share my experiences and create a place where you can find key concepts of machine learning and materials that will allow you to kick-start your own Python projects.

    View all posts
0 0 votes
Article Rating
Subscribe
Notify of

0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x