Forecasting Beer Sales with ARIMA in Python

Sales Forecasting with Autoregressive Integrated Moving Average (ARIMA) in Python

This article explains the basics of ARIMA modeling and shows how you can develop your own ARIMA model in Python. The case study is predicting the volume of US beer sales. ARIMA (Auto-Regressive Integrated Moving Average) is a powerful statistical modeling technique for time series analysis and forecasting. Compared to machine learning, ARIMA is a classical modeling technique that is particularly powerful when the time series being analyzed follows a clear pattern. For example, when the time series exhibits seasonality. Due to its effectiveness, ARIMA is still frequently used in various economic and scientific applications. Examples are forecasting weather data, predicting order quantities, and sales forecasting.

The remainder of this article has two parts: The first part covers the theory. You will learn the basics of how ARIMA works, what stationarity means, and when it is appropriate to use ARIMA. The second part is the practical implementation of an auto-ARIMA model in Python. We load a set of sample time series data, then check if the data is stationary and train an ARIMA prediction model.

sales forecasting with ARIMA at the example of forecasting beer sales
Cheers 🙂

Introduction to ARIMA Time Series Modelling

Working with ARIMA requires a good understanding of the concept of stationarity and knowledge of the transformations used to make time-series data stationary. The concept of stationarity is therefore first on our agenda.

The Concept of Stationarity

Stationarity is an essential concept in stochastic processes that describes the nature of a time series. We consider a time series strictly stationary if its statistical properties do not change over time. In this case, summary statistics such as the mean and variance do not change over time. However, the time-series data we encounter in the real world is often characterized by a trend or significant irregular fluctuations, making them non-stationary or weakly stationary.

So why is stationarity so important? If a time series is stationary, we can assume that the past values of the time series are highly predictive of the future development of the time series. In other words, a stationary time series exhibits consistent behavior that makes it predictable. On the other hand, a non-stationary time series is characterized by a kind of random behavior that will be difficult to capture in modeling. Namely, if random movements characterized the past, there is a high probability that the future will be no different.

Fortunately, in many cases, it is possible to transform a time series that is non-stationary to a stationary form, and in this way, to build better prediction models.

Testing for Stationarity

There are different ways how we can determine whether a time series is stationary.

  • Plotting: We can plot the time series and check visually if it shows consistent behavior or is changing over a longer period of time.
  • Summary statistics: We can split the time series into different periods and calculate the summary statistics such as the variance. If these metrics are subject to major changes, this is an indication that the time series is non-stationary. However, the results will also depend on the respective periods, which can lead to false conclusions.
  • Statistic tests: There are various tests to determine the stationary of a time series, such as Kwiatkowski–Phillips–Schmidt–Shin, Augmented Dickey-Fuller, or Phillips–Perron. These tests systematically check a time series and measure the results against the null hypothesis, in this way providing an indicator of the trustworthiness of the results

What is an (S)ARIMA Model?

As the name implies, ARIMA uses autoregression (AR), integration (differencing), and moving averages (MA) to fit a linear regression model to a time series.

ARIMA Parameters

The ARIMA(p, d, q) modeling function has three parameters:

d (differencing): In the case of a non-stationary time series, there is a chance to remove a trend from the data by differencing once or several times, thus bringing the data to a stationary state. The model parameter d determines the order of the differentiation. A value of d = 0 simplifies the ARIMA model to an ARMA model, lacking the integration aspect. In this case, integration is not needed because the time series is already stationary.

p (order of the AR terms): The autoregressive process describes the dependent relationship between an observation and several lagged observations (lags). Predictions are then based on past data from the same time series using linear functions. A value of p = 1 means that the model uses values that lag by one period.

q (order of the MA terms): The parameter q determines the number of lagged forecast errors in the prediction equation. In contrast to the AR process, the MA process assumes that values at a future point in time depend on the errors made by predictions at current and past points in time. This means that it is not previous events that determine the predictions but rather the previous estimation or prediction errors used in calculating the next value of a time series.


In the real world, many time series are characterized by seasonal effects. Examples are monthly retail sales figures, temperature reports, or weekly airline passenger data, etc. To take this into account, we can specify a seasonal range (e.g., m=12 for monthly data) and additional seasonal AR or MA components for our model that deal with the seasonality. Such a model is also called a SARIMA model, and it is defined as a model(p, d, q)(P, D, Q)[m].


When working with ARIMA, we can set the model parameters manually or use auto-ARIMA and let the model search for the optimal parameters. We do this by varying the parameters and then testing against stationarity. With the seasonal option enabled, the process tries to identify the optimal hyperparameters for the seasonal components of the model. Auto-ARIMA works by conducting differencing tests to determine the order of differencing, d and then fitting models with parameters in defined ranges, e.g., start_pmax_p as well as start_qmax_q. If our model has a seasonal component, we can also define parameter ranges for the seasonal part of the model.

Time Series Forecasting with ARIMA in Python

Now that we went through the theoretical context, it’s time to turn to the practical part and implement an ARIMA model in Python. Our model will generate a forecast for beer sales volumes in the US.

Several standard Python packages support ARIMA modeling. So there is no need to develop everything from scratch. In the following, we will build our ARIMA model using the statsmodels library and pmdarima.


Before we start the coding part, make sure that you have set up your Python 3 environment and required packages. If you don’t have an environment set up yet, you can follow this tutorial to set up the Anaconda environment.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages: 

In addition, we will be using the statsmodels library and pmdarima.

You can install packages using console commands:

  • pip install <package name>
  • conda install <package name> (if you are using the anaconda packet manager)

Step #1 Load the Data

In the following, we will work with a dataset from The data contains monthly beer sales in the United States (in million US $) between 1992 and 2018. We will start by loading the data into our project. We can do this by running the following code, which will download the data from the relataly Github repository, so there is no need to download the data manually.

# Setting up packages for data manipulation and machine learning
import pandas as pd
import numpy as np
import math
import matplotlib.pyplot as plt
import matplotlib as mpl 
import pmdarima as pm
from statsmodels.tsa.arima.model import ARIMA
from import plot_acf, plot_pacf
import statsmodels.api as sm
from statsmodels.tsa.seasonal import seasonal_decompose

# Link to the dataset: 

path = ""
df = pd.read_csv(path)

As shown above, the sales figures in this dataset were taken at the first of each month.

Step #2 Visualize the Time Series and check it for Stationarity

Before we start modeling, we visualize our time series and test it for stationarity. Understanding the time series will help us with the choice of the parameters for our ARIMA model. Therefore, this is an essential step in modeling ARIMA.

First, we will look at the different components of the time series. We do this by using the seasonal_decompose function of the statsmodels library.

# Decompose the time series
plt.rcParams["figure.figsize"] = (10,8)
result = seasonal_decompose(df['beer'], model='multiplicative', period = 12)
forecasting us beer sales - time series decomposition

To test for stationarity, we use the ADFuller test. We often need to run this test multiple times throughout our project. Therefore, we encapsulate the code in a function.

def check_stationarity(dates, sales, title_string):
    # Visualize the data
    fig, ax = plt.subplots(figsize=(16, 8))
    plt.title(title_string, fontsize=14)
    plt.plot(dates, sales, label='monthly sales', color="black", linewidth=1.0)
    ax.fill_between(dates, 1400, sales, color='#b9e1fa')
    ma = sales.rolling(window=12).mean()
    plt.plot(dates, ma, label='12-month ma', color="red", linewidth=2.0)
    ma = sales.rolling(window=25).mean()
    plt.plot(dates, ma, label='25-month ma', color="orange", linewidth=2.0)
    sales = sales.dropna()
    # Perform an Ad Fuller Test
    # the default alpha = .05 stands for a 95% confidence interval
    adf_test = pm.arima.ADFTest(alpha = 0.05) 
df_sales = df['beer']
df_dates = pd.to_datetime(df['date']) 
title = "Beer sales in the US between 1992 and 2018 in million US$/month"
check_stationarity(df_dates, df_sales, title)
time series data - US beer sales

The data does not appear to be stationary. We can see that our time series is steadily increasing and shows annual seasonality. The steady increase indicates a continuous growth in beer consumption over the last decades. The seasonality can be explained by the fact that people drink more beer in summer than in winter.

Step #3 Exemplary Differencing and Autocorrelation

The chart from the previous section shows that our time series is non-stationary. The reason is that it follows a clear upward trend. We also know that the time series has a seasonal component. Therefore, we need to define additional parameters and construct a SARIMA model.

Before we use auto-correlation to determine the optimal parameters, we will try manual differencing to make the time series stationary. There is no guarantee that differencing works. It is essential to keep in mind that differencing can sometimes also worsen prediction performance. So be careful, not to overdifference! We could also trust that the auto-ARIMA model chooses the best parameters for us. However, I recommend validating the selected parameters.

The ideal differencing parameter is the least number of differencing steps to achieve a stationary time series. To check whether differencing was successful, we will monitor the results at autocorrelation plots.

We print the autocorrelation for the original time series and after the first and second-order differencing.

# 3.1 Non-seasonal part
def auto_correlation(df, prefix, lags):
    plt.rcParams.update({'figure.figsize':(7,7), 'figure.dpi':120})
    fig, axes = plt.subplots(3,2, sharex=False)

    # First Difference
    axes[0, 0].plot(df)
    axes[0, 0].set_title('Original' + prefix)
    plot_acf(df_sales, lags=lags, ax=axes[0, 1])

    # First Difference
    df_first_diff = df.diff().dropna()
    axes[1, 0].plot(df_first_diff)
    axes[1, 0].set_title('First Order Difference' + prefix)
    plot_acf(df_first_diff, lags=lags - 1, ax=axes[1, 1])

    # Second Difference
    df_second_diff = df.diff().diff().dropna()
    axes[2, 0].plot(df_second_diff)
    axes[2, 0].set_title('Second Order Difference' + prefix)
    plot_acf(df_second_diff, lags=lags - 2, ax=axes[2, 1])
auto_correlation(df_sales, '', 50)

(0.019143247561160443, False)

The charts above show that the time series becomes stationary after one order differencing. However, we can see that the lag goes into the negative very quickly, which indicates overdifferencing.

Next, we perform the same procedure for the seasonal part of our time series.

# 3.2 Seasonal part

# Reduce the timeframe to a single seasonal period
df_sales_s = df_sales[0:12]

# Autocorrelation for the seasonal part
auto_correlation(df_sales_s, '', 11)
time series components
# Check if the First difference is stationary
df_dates = pd.date_range(df_sales_s_first_diff.iloc[1], periods=11, freq='MS') 

title = "First Difference (Seasonal)"
check_stationarity(df_dates, df_sales_s_first_diff, title)
seasonality after differencing
(0.99, True)

After first order differencing, the seasonal part of the time series is stationary. Concerning the autocorrelation plot, the plot shows that the values go into the negative but remain within acceptable boundaries. Second-order differencing does not seem to improve these values. We conclude that first-order differencing is a good choice for the D parameter.

Step #4 Finding an Optimal Model with Auto-ARIMA

Next, we auto-fit an ARIMA model to our time series. In preparation for this, we will split our dataset into train and test. In this way, we ensure that we can later measure the performance of our model against a fresh set of data that the model has not seen so far.

Once we have created the train and test data sets, we can configure the parameters for the auto_arima stepwise optimization. By setting max_d = 1, we tell the model to test no-differencing and first-order differencing. Also, we set max_p and max_q to 3.

To deal with the seasonality in our time series, we set the “seasonal” parameter to True and the “m” parameter to 12 data points. In this way, we turn our model into a SARIMA model that allows us to configure additional D, P, and Q parameters. We define a max value for Q and P of 3. Previously we have already seen that further differencing does not improve the stationarity. Therefore, we can set the value of D to 1.

After we have configured the parameters, next, we fit the model to the time series. The model will try to find the optimal parameters and choose the model with the least AIC.

# split into train and test
pred_periods = 30
split_number = df_sales.count() - pred_periods # corresponds to a prediction horizion  of 2,5 years
df_train = df_sales[:split_number]
df_test = df_sales[split_number:]

df_train_dates = df_dates[:split_number]
df_test_dates = df_dates[split_number:] 

# auto_arima
model_fit = pm.auto_arima(df_train, test='adf', 
                         max_p=3, max_d=1, max_q=3, 
                         seasonal=True, m=12,
                         max_P=3, max_D=2, max_Q=3,

# summarize the model characteristics
outputs during the auto-arima process

Auto-ARIMA has determined that the best model is (3,0,0)(3,1,1). These results match well with the results from section 3, in which we manually performed differencing steps.

Step #5 Simulate the Time Series with in-sample Forecasting

Now that we have a trained model, we can use it to simulate the whole time series. We will do this by calling the predict in the sample function. The prediction will match the same period as the original time series with which the model was trained. Because the model predicts one step, the prediction results will naturally be close to the actual values.

# Generate in-sample Predictions
# The parameter dynamic=False means that the model makes predictions upon the lagged values.
# This means that the model is trained until a point in the time series and then tries to predict the next value.
pred = model_fit.predict_in_sample(dynamic=False) # works only with auto-arima

# Calculate the percentage difference
diff_percent = abs((df_train - pred) / df_train)* 100

# Print the predicted time series
fig, ax1 = plt.subplots(figsize=(16, 8))
plt.title("In Sample Sales Prediction", fontsize=14)
plt.plot(df_train_dates, df_train, label='train', color="black", linewidth=1.0)
plt.plot(df_train_dates, pred, label='prediction', color="green", linewidth=1.0)
ax1.fill_between(df_train_dates, 0, df_train, color='#b9e1fa')
ax1.set_ylim([1000, 6500])
ax1.set_xlim([df_train_dates.min(), df_train_dates.max()])

# Print percentage prediction errors on a separate axis (ax2)
ax2 = ax1.twinx() 
ax2.set_ylabel('Prediction Errors in %', color='tab:purple', fontsize=14)  
ax2.set_yticks(np.arange(0, 50, 5.0))
ax2.set_ylim([0, 50]), diff_percent, width=20, color='purple', label='absolute errors')
ax2.set_xlim([df_train_dates[20], df_train_dates.max()])
time series forecast on us beer sales with arima

Next, we take a look at the prediction errors.

Step #6 Generate and Visualize a Sales Forecast

Now that we have trained an optimal model, we are ready to generate a sales forecast. First, we specify the number of periods that we want to predict. In addition, we create an index from the number of predictions, which is adjacent to the original time series and continues it (prediction_index).

# Generate prediction for n periods, 
# Predictions start from the last date of the training data
pred = model_fit.predict(n_periods=pred_periods, dynamic=False)

# Print the predicted time series
fig, ax = plt.subplots(figsize=(16, 8))
plt.title("Test/Pred Comparison", fontsize=14)
plt.plot(df_train_dates, df_train, label='train', color="black", linewidth=1.0)
plt.plot(df_test_dates, df_test, label='test', color="blue", linewidth=1.0)
plt.plot(df_test_dates, pred, label='forecast', color="green", linewidth=1.0)
ax.fill_between(df_train_dates, 1400, df_train, color='#b9e1fa')
ax.fill_between(df_test_dates, 1400, pred, color='lightgreen')
time series forecast on us beer sales with arima

As shown above, our model’s forecast continues the seasonal pattern of the beer sales time series. On the one hand, this indicates that US beer sales will continue to rise and, on the other hand, that our model works just fine 🙂

Step #7 Measure Prediction Errors

The chart with the model simulation from the previous section shows few outliers among the prediction errors. Therefore, we focus our analysis on the percentage errors by calculating the mean absolute error (MAPE) and the mean absolute percentage error (MDAPE).

# Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(df_test, pred)/ df_test))) * 100
print(f'Mean Absolute Percentage Error (MAPE): {np.round(MAPE, 2)} %')

# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(df_test, pred)/ df_test)) ) * 100
print(f'Median Absolute Percentage Error (MDAPE): {np.round(MDAPE, 2)} %')
Mean Absolute Percentage Error (MAPE): 3.94 % 
Median Absolute Percentage Error (MDAPE): 3.49 %

The percent errors show that our ARIMA model has decent predictive performance.


This article has analyzed a seasonal time series with (S)ARIMA in this post. Our goal was to create a forecast for US beer sales. For this purpose, we created an in-sample forecast and used Auto-ARIMA to find the optimal parameters for our forecasting model. I hope you gained some insights from this article that will help you solve your use cases.

If you liked this post, smash the like button! Also, if you have any questions or suggestions, please let me know in the comments.

Want to learn more about time series analysis and prediction?
Check out these recent posts:


  • Hi, I am Florian, a Zurich-based consultant for AI and Data. Since the completion of my Ph.D. in 2017, I have been working on the design and implementation of ML use cases in the Swiss financial sector. I started this blog in 2020 with the goal in mind to share my experiences and create a place where you can find key concepts of machine learning and materials that will allow you to kick-start your own Python projects.

Leave a Reply