The evaluation of the prediction quality is a crucial step in the development of regression models. To evaluate regression models, we measure the deviation between predictions and actual values in numerical terms. Therefore, unlike classification models, regression models allow for different gradations of false or true. However, choosing the right metrics is a particular challenge. For example, prediction errors can be heterogeneously distributed over a time series or influenced by outliers. This blog post demonstrates various error metrics commonly used to evaluate time series prediction models with Python.

To realistically determine the range of possible prediction errors, data scientists and researchers should be familiar with different error metrics. One of my earlier blog posts: Time Series Forecasting – Error Metrics Cheat Sheet, presents a cheat sheet with the most frequently used error metrics in time series analysis along with the formulas and specificities of these metrics. If you are not yet familiar with this topic, I recommend you to start with this previous article.

## Measuring Performance the Performance of a Time Series Forecasting Model in Python

The sample evaluation of a time series forecasting model will use data from two added sine curves. We will use the data to train a neural network that will predict the further course of the sine curve time series. The article will cover the steps to calculate error metrics and use them to evaluate the performance of the forecasting model.

### Prerequisites

Before we start the coding part, make sure that you have set up your Python 3 environment and required packages. If you don’t have an environment set up yet, you can follow the steps in this tutorial to set up the Anaconda environment.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:

In addition, we will be using *Keras *(2.0 or higher) with *Tensorflow* backend and the machine learning library scikit-learn.

You can install packages using console commands:

*pip install <package name>**conda install <package name>*(if you are using the anaconda packet manager)

### Step #1 Generate Sample Time Series Data

We start by creating some artificial sample data based on three multiplied sine curves.

# Setting up packages for data manipulation and machine learning import math import numpy as np import pandas as pd import matplotlib.pyplot as plt import matplotlib as mpl from keras.models import Sequential from sklearn.preprocessing import MinMaxScaler from keras.layers import LSTM, Dense, TimeDistributed, Dropout, Activation # Creating the sample sinus curve dataset steps = 1000; gradient = 0.002 list_a = [] for i in range(0, steps, 1): y = 100 * round(math.sin(math.pi * i * 0.02 + 0.01), 4) * round(math.sin(math.pi * i * 0.005 + 0.01), 4) * round(math.sin(math.pi * i * 0.005 + 0.01), 4) list_a.append(y) df = pd.DataFrame({"valid": list_a}, columns=["valid"]) # Visualizing the data fig, ax1 = plt.subplots(figsize=(16, 4)) ax1.xaxis.set_major_locator(plt.MaxNLocator(30)) plt.title("Sine Curve Data", fontsize=14) plt.plot(df[["valid"]], color="black", linewidth=2.0) plt.show()

### Step #2 Data Preparation

The following code will prepare the data to train a recurrent neural network model.

# Settings epochs = 4; batch_size = 1; sequencelength = 15; n_features = 1 # Get the number of rows to train the model on 80% of the data npdataset = df.values training_data_length = math.ceil(len(npdataset) * 0.6) # Transform features by scaling each feature to a range between 0 and 1 mmscaler = MinMaxScaler(feature_range=(0, 1)) scaled_data = mmscaler.fit_transform(npdataset) # Create a scaled training data set train_data = scaled_data[0:training_data_length, :] # Split the data into x_train and y_train data sets x_train = []; y_train = [] trainingdatasize = len(train_data) for i in range(sequencelength, trainingdatasize-1): x_train.append(train_data[i-sequencelength : i, 0]) y_train.append(train_data[i, 0]) # contains all other values # Convert the x_train and y_train to numpy arrays x_train = np.array(x_train); y_train = np.array(y_train) # Reshape the data x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1)) print("x_tain.shape: " + str(x_train.shape) + " -- y_tain.shape: " + str(y_train.shape))

Out: x_tain.shape: (584, 15, 1) -- y_tain.shape: (584,)

### Step #3 Train Time Series Neural Network Regression Model

Now, we can train a forecasting model. For this, we will use a recurrent neural network. Understanding neural networks in all depth is not a prerequisite for this tutorial. If you want to learn more about the architecture and functioning of Neural Networks, I can recommend this YouTube video.

The following code will create the model architecture. The second code block will then define the input shape of the neural net:

# Configure and compile the neural network model # The number of input neurons is defined by the sequence length multiplied by the number of features lstm_neuron_number = sequencelength * n_features # Create the model model = Sequential() model.add( LSTM(lstm_neuron_number, return_sequences=False, input_shape=(x_train.shape[1], 1)) ) model.add(Dense(1)) model.compile(optimizer="adam", loss="mean_squared_error")

# Settings batch_size = 5 # Train the model history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs)

Epoch 1/4 584/584 [==============================] - 1s 2ms/step - loss: 0.1047 Epoch 2/4 584/584 [==============================] - 1s 1ms/step - loss: 0.0153 Epoch 3/4 584/584 [==============================] - 1s 1ms/step - loss: 0.0102 Epoch 4/4 584/584 [==============================] - 1s 1ms/step - loss: 0.0064

### Step #4 Making test predictions

# Create the data sets x_test and y_test test_data = scaled_data[training_data_length - sequencelength :, :] test_data_len = test_data.shape[0] x_test, y_test = [], [] for i in range(sequencelength, test_data_len): x_test.append(test_data[i-sequencelength:i, 0]) y_test.append(test_data[i:, 0]) # Convert the x_train and y_train to numpy arrays x_test, y_test = np.array(x_test), np.array(y_test) print(x_test.shape, y_test.shape) # Reshape x_test, so that we get an array with multiple test datasets x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))

(400, 15) (400,)

# Get the predicted values predictions = model1.predict(x_test) predictions = mmscaler.inverse_transform(predictions)

Next, we plot the predictions.

# Visualize the data train = df[:training_data_length]; valid = df[training_data_length:] valid.insert(1, "y_pred", y_pred, True) fig, ax1 = plt.subplots(figsize=(16, 8), sharex=True) xt = valid.index; yt = train[["valid"]] xv = valid.index; yv = valid[["valid", "y_pred"]] ax1.tick_params(axis="x", rotation=0, labelsize=10, length=0) plt.title("y_pred vs y_test Truth", fontsize=18) plt.plot(yv["y_pred"], color="red") plt.plot(yv["valid"], color="black", linewidth=2) plt.legend(["y_pred", "y_test"], loc="upper left") # Fill between plotlines import matplotlib as mpl mpl.rc('hatch', color='k', linewidth=2) ax1.fill_between(xv, yv["valid"], yv["y_pred"], facecolor = 'white', hatch="||", edgecolor="blue", alpha=.9) plt.show()

### Step #5 Calculating error metrics

Now comes the exciting part. With the following code, you’ll calculate five standard error metrics:

y_pred = yv["y_pred"] y_test = yv["valid"] print(y_test.shape, y_pred.shape) # # Mean Absolute Error (MAE) MAE = np.mean(abs(y_pred - y_test)) print('Mean Absolute Error (MAE): ' + str(np.round(MAE, 2))) # Median Absolute Error (MedAE) MEDAE = np.median(abs(y_pred - y_test)) print('Median Absolute Error (MedAE): ' + str(np.round(MEDAE, 2))) # Mean Squared Error (MSE) MSE = np.square(np.subtract(y_pred, y_test)).mean() print('Mean Squared Error (MSE): ' + str(np.round(MSE, 2))) # Root Mean Squarred Error (RMSE) RMSE = np.sqrt(np.mean(np.square(y_pred - y_test))) print('Root Mean Squared Error (RMSE): ' + str(np.round(RMSE, 2))) # Mean Absolute Percentage Error (MAPE) MAPE = np.mean((np.abs(np.subtract(y_test, y_pred)/ y_test))) * 100 print('Mean Absolute Percentage Error (MAPE): ' + str(np.round(MAPE, 2)) + ' %') # Median Absolute Percentage Error (MDAPE) MDAPE = np.median((np.abs(np.subtract(y_test, y_pred)/ y_test))) * 100 print('Median Absolute Percentage Error (MDAPE): ' + str(np.round(MDAPE, 2)) + ' %')

Mean Absolute Error (MAE): 6.95 Median Absolute Error (MedAE): 5.05 Mean Squared Error (MSE): 78.7 Root Mean Squared Error (RMSE): 8.87 Mean Absolute Percentage Error (MAPE): 10339.13 % Median Absolute Percentage Error (MDAPE): 26.8 %

### Step #6 Evaluating model performance

Let’s take a look at the MAE and the MedAE. The MAE is 6.95, and the MedAE is 5.05. These values are very close to each other, which is an indication that our prediction errors are equally distributed and that there might be few significant outliers in the predictions.

To get a better picture of possible outliers, we take a look at the MSE. With a value of 78.7, the MAE is a little bit higher than the square of the MAE. The RMSE is slightly higher than the MAE, which is another indication that the prediction errors lie in a narrow range.

How much do the predictions of our model deviate from the actual values in percentage terms? The MAPE is typically used as a starting point to answer this question. With 10339.13 percent, it is incredibly high. So is our model very much mistaken? The answer is no – the MAPE is misleading. The problem is that several actual values are close to zero, e.g., 0.00001. While the predictions of our model are close to the real values in absolute numbers, the MAPE divides the residual values by the actual values, e.g., 0.000001, and sums them up. Thus the MAPE becomes very large.

The Median plays an important role in measuring the quality of a forecast. This becomes evident if we look at the median of the MDAPE, which is 26.8%. So, 50% of our forecasting errors are higher than 26.8%, and 50% are lower. Consequently, we can assume that when our model makes a prediction, there is a 50% probability that the deviation is 26.8% from the actual value – that is not as terrible as the MAPE would have us believe. The plotlines of the predictions and actual values reflect these findings.

## Summary

In this post, you have learned to evaluate time series forecasting models using different error metrics. You have also seen that performance metrics in time series forecasting can be misleading. Therefore, they should be used with caution and preferably in combination. If there is a crucial takeaway from this post, then it’s “never trust a single error metric.”

I hope this article was helpful. If you have any remarks or questions remaining, write them in the comments. I try to respond within two days.

## Leave a Reply