## Time series forecasting models

Evaluating the performance of forecasting models is important and a crucial step in their development. This is especially the case for time series forecasting models. Compared to classification models, time series predictions cannot easily be divided into right and wrong. Instead, the deviation between predictions and actual values is measured in numeric values for each prediction. This makes it possible that prediction errors are heterogeneously distributed over the course of a time series and are therefore difficult to measure. This blog post demonstrates the use of different error metrics that are commonly used to evaluate time series forecasting models with Python.

To realistically determine the range of possible prediction errors, data scientists and researcher should be familiar with different error metrics. One of my earlier blog posts: Time Series Forecasting – Error Metrics Cheat Sheet presents a cheat sheet with the most frequently used error metrics in time series analysis along with the formulas and specificities of these metrics. If you are not yet familiar with this topic, I recommend you to start with this previous article.

## Sample time series project

The sample evaluation of a time series forecasting model will use data from two added sine curves. The data will be used to train a neural network that will predict the further course of the sine curve time series. Using this example, the post will cover the steps to calculate different error metrics and use them to evaluate the performance of the forecasting model.

*This post covers the following steps:*

- Creating a sample time series
- Preparing the data
- Training a time series forecasting model
- Making predictions
- Calculating performance metrics
- Evaluating model performance
- Summary

### Python Environment

This tutorial assumes that you have setup your python environment. I recommend using the Anaconda environment. If you have not yet set the environment up, you can follow this tutorial. It is also assumed that you have the following packages installed: *keras *(2.0 or higher) with *Tensorflow *backend, *numpy*, *pandas*, *matplot*, *sklearn*. The packages can be installed using the console command:

*pip install <packagename>*

### 1) Creating a sample time series

We start by creating some artificial sample data based on three multiplied sine curves.

# Setting up packages for data manipulation and machine learning import math import numpy as np import pandas as pd import matplotlib.pyplot as plt import matplotlib as mpl from keras.models import Sequential from sklearn.preprocessing import MinMaxScaler from keras.layers import LSTM, Dense, TimeDistributed, Dropout, Activation # Creating the sample sinus curve dataset steps = 1000; gradient = 0.002 list_a = [] for i in range(0, steps, 1): y = 100 * round(math.sin(math.pi * i * 0.02 + 0.01), 4) * round(math.sin(math.pi * i * 0.005 + 0.01), 4) * round(math.sin(math.pi * i * 0.005 + 0.01), 4) list_a.append(y) df = pd.DataFrame({"valid": list_a}, columns=["valid"]) # Visualizing the data fig, ax1 = plt.subplots(figsize=(16, 4)) ax1.xaxis.set_major_locator(plt.MaxNLocator(30)) plt.title("Sine Curve Data", fontsize=14) plt.plot(df[["valid"]], color="black", linewidth=2.0) plt.show()

### 2) Preparing the data

The following code will prepare the data to train a recurrent neural network model.

# Settings epochs = 4; batch_size = 1; sequencelength = 15; n_features = 1 # Get the number of rows to train the model on 80% of the data npdataset = df.values training_data_length = math.ceil(len(npdataset) * 0.6) # Transform features by scaling each feature to a range between 0 and 1 mmscaler = MinMaxScaler(feature_range=(0, 1)) scaled_data = mmscaler.fit_transform(npdataset) # Create a scaled training data set train_data = scaled_data[0:training_data_length, :] # Split the data into x_train and y_train data sets x_train = []; y_train = [] trainingdatasize = len(train_data) for i in range(sequencelength, trainingdatasize-1): x_train.append(train_data[i-sequencelength : i, 0]) y_train.append(train_data[i, 0]) # contains all other values # Convert the x_train and y_train to numpy arrays x_train = np.array(x_train); y_train = np.array(y_train) # Reshape the data x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1)) print("x_tain.shape: " + str(x_train.shape) + " -- y_tain.shape: " + str(y_train.shape))

Out: x_tain.shape: (584, 15, 1) -- y_tain.shape: (584,)

### 3) Training a forecasting model

Now, we can train a forecasting model. For this, we will use a recurrent neural network. Understanding neural networks in all depth is not a prerequisite for this tutorial. If you want to learn more about the architecture and functioning of Neural Networks, I can recommend this YouTube video.

The following code will create the model architecture. The second code block will then define the input shape of the neural net:

# Configure and compile the neural network model # The number of input neurons is defined by the sequence length multiplied by the number of features lstm_neuron_number = sequencelength * n_features # Create the model model = Sequential() model.add( LSTM(lstm_neuron_number, return_sequences=False, input_shape=(x_train.shape[1], 1)) ) model.add(Dense(1)) model.compile(optimizer="adam", loss="mean_squared_error")

# Settings batch_size = 5 # Train the model history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs)

Epoch 1/4 584/584 [==============================] - 1s 2ms/step - loss: 0.1047 Epoch 2/4 584/584 [==============================] - 1s 1ms/step - loss: 0.0153 Epoch 3/4 584/584 [==============================] - 1s 1ms/step - loss: 0.0102 Epoch 4/4 584/584 [==============================] - 1s 1ms/step - loss: 0.0064

### 4) Making test predictions

# Create the data sets x_test and y_test test_data = scaled_data[training_data_length - sequencelength :, :] test_data_len = test_data.shape[0] x_test, y_test = [], [] for i in range(sequencelength, test_data_len): x_test.append(test_data[i-sequencelength:i, 0]) y_test.append(test_data[i:, 0]) # Convert the x_train and y_train to numpy arrays x_test, y_test = np.array(x_test), np.array(y_test) print(x_test.shape, y_test.shape) # Reshape x_test, so that we get an array with multiple test datasets x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))

(400, 15) (400,)

# Get the predicted values predictions = model1.predict(x_test) predictions = mmscaler.inverse_transform(predictions)

Next we plot the predictions.

# Visualize the data train = df[:training_data_length]; valid = df[training_data_length:] valid.insert(1, "y_pred", y_pred, True) fig, ax1 = plt.subplots(figsize=(16, 8), sharex=True) xt = valid.index; yt = train[["valid"]] xv = valid.index; yv = valid[["valid", "y_pred"]] ax1.tick_params(axis="x", rotation=0, labelsize=10, length=0) plt.title("y_pred vs y_test Truth", fontsize=18) plt.plot(yv["y_pred"], color="red") plt.plot(yv["valid"], color="black", linewidth=2) plt.legend(["y_pred", "y_test"], loc="upper left") # Fill between plotlines import matplotlib as mpl mpl.rc('hatch', color='k', linewidth=2) ax1.fill_between(xv, yv["valid"], yv["y_pred"], facecolor = 'white', hatch="||", edgecolor="blue", alpha=.9) plt.show()

### 5) Calculating error metrics

Now comes the interesting part. With the following code you’ll calculate five common error metrics:

y_pred = yv["y_pred"] y_test = yv["valid"] print(y_test.shape, y_pred.shape) # # Mean Absolute Error (MAE) MAE = np.mean(abs(y_pred - y_test)) print('Mean Absolute Error (MAE): ' + str(np.round(MAE, 2))) # Median Absolute Error (MedAE) MEDAE = np.median(abs(y_pred - y_test)) print('Median Absolute Error (MedAE): ' + str(np.round(MEDAE, 2))) # Mean Squared Error (MSE) MSE = np.square(np.subtract(y_pred, y_test)).mean() print('Mean Squared Error (MSE): ' + str(np.round(MSE, 2))) # Root Mean Squarred Error (RMSE) RMSE = np.sqrt(np.mean(np.square(y_pred - y_test))) print('Root Mean Squared Error (RMSE): ' + str(np.round(RMSE, 2))) # Mean Absolute Percentage Error (MAPE) MAPE = np.mean((np.abs(np.subtract(y_test, y_pred)/ y_test))) * 100 print('Mean Absolute Percentage Error (MAPE): ' + str(np.round(MAPE, 2)) + ' %') # Median Absolute Percentage Error (MDAPE) MDAPE = np.median((np.abs(np.subtract(y_test, y_pred)/ y_test))) * 100 print('Median Absolute Percentage Error (MDAPE): ' + str(np.round(MDAPE, 2)) + ' %')

Mean Absolute Error (MAE): 6.95 Median Absolute Error (MedAE): 5.05 Mean Squared Error (MSE): 78.7 Root Mean Squared Error (RMSE): 8.87 Mean Absolute Percentage Error (MAPE): 10339.13 % Median Absolute Percentage Error (MDAPE): 26.8 %

### 6) Evaluating model performance

First let’s take a look at the MAE and the MedAE. The MAE is 6.95 and the MedAE is 5.05. These values are very close to each other, which is an indication that our prediction errors are equally distributed and that there might be few large outliers in the predictions.

To get a better picture of possible outliers, we take a look at the MSE. With a value of 78.7 the MAE is a little bit higher than the square of the MAE. The RMSE is slightly higher than the MAE, which is another indication that the prediction errors lie in a narrow range.

How much do the predictions of our model deviate from the actual values in percentage terms? The MAPE is typically used as a starting point to answer this question. With a value of 10339.13 percent, it is extremely high. So is our model very much mistaken? The answer is no – the MAPE is misleading. The problem is that several actual values are close to zero, e.g., 0.00001. While the predictions of our model are close to the actual values in absolute numbers, the MAPE divides the residual values by the actual values, e.g., 0.000001, and sums them up. Thus the MAPE becomes very large.

The median is an important measure. This becomes evident, if we look at the median of the MDAPE, which is 26.8%. This means 50% of our forecasting errors are higher than 26.8% and 50% are lower than this. Consequently, we can assume that when our model makes a prediction, there is a 50% probability that the deviation is 26.8% from the actual value – that is not as terrible as the MAPE would have us believe. The plotlines of the predictions and actual values reflect these findings.

### Summary

In this post, you have learned to evaluate time series forecasting models using different error metrics. You have also seen that performance metrics in time series forecasting can be misleading. Therefore, they should be used with caution and preferably in combination. If there is a key take away from this post, then it’s “never trust a single error metric”.

I hope you found this post useful. Please leave a comment if you have any remarks or questions remaining.

## Time Series Forecasting – Error Metrics Cheat Sheet – relataly.com

[…] Time Series Forecasting – How to Measure Model Performance Cheat Sheet, Machine Learning, Multi-Step Time Series Forecasting, Python, Time Series Prediction, Time Series Regression […]

## Time Series Forecasting - Error Metrics Cheat Sheet - relataly.com

[…] Evaluating Time Series Forecasting Models Cheat Sheet, Machine Learning, Multi-Step Time Series Forecasting, Python, Time Series Prediction, Time Series Regression […]

## Guide to Multi-step Time Series Forecasting - relataly.com

[…] Evaluating Time Series Forecasting Models Deeplearning with Keras, Keras, Long short-term memory, LSTM, Multi-Step Time Series Forecasting, Recurrent Neural Networks, Sinus Curve, Time Series Prediction […]