Rolling Time Series Forecasting: Creating a Multi-Step Prediction for a Rising Sine Curve using Neural Networks in Python

We can solve many time forecasting problems by looking at a single step into the future. However, some forecasting problems require us to understand how a signal will develop over a more extended period. Such cases require a multi-step time series forecasting approach that generates a forecast for multiple time steps at once. Multi-step time series forecasting is about modeling the distribution of future values of a signal over a prediction horizon. This article will cover this multi-step prediction approach with the example of a rising sine curve. We create a rolling forecast for the sine curve using Keras neural networks with LSTM layers in Python. We generate several single-step predictions and iteratively reuse them as input to predict further steps in the future.

The remainder of this article proceeds as follows: We begin by looking at the sine curve problem and provide a quick intro to recurrent neural networks. After this conceptual introduction, we turn to the hands-on part in Python. We generate synthetic sine curve data and preprocess the data to train univariate models using a Keras neural network. We thereby experiment with different architectures and hyperparameters. Then, we use these model variants to create rolling multi-step forecasts. Finally, we evaluate the model performance.

A multi-step time series forecast for a rising sine curve, as we will create it in this article
A multi-step time series forecast for a rising sine curve, as we will create in this article.

If you are just getting started with time-series forecasting, we have covered the single-step forecasting approach in two previous articles:

Predicting Stock Markets with Neural Networks – A Step-by-Step Guide
Stock Market Prediction – Adjusting Time Series Prediction Intervals in Python

The Problem of a Rising Sine Curve

The line plot below illustrates the sample of a rising sine curve. The goal is to use the ground truth values (blue) to forecast several points in this curve (purple).

Traditional mathematical methods can resolve this function by decomposing it into constituent parts. Thus, we could easily foresee the further course of the curve. But in practice, the function might not be precisely periodic and change over a more extended period. Therefore, recurrent neural networks can achieve better results than traditional mathematical approaches, especially when they train on extensive data.

A time series forecasting problem: predicting sine curve data, Rolling forecasting
A time series forecasting problem: predicting sine curve data

A rising sine wave may sound like an abstract problem at first. However, similar issues are widespread. Imagine you are the owner of an online shop. The users who visit your shop and buy something fluctuate depending on the time and the weekday. For instance, there are fewer visitors at night, and on weekends, the number of visitors rises sharply. At the same time, the overall number of users increases over a more extended period as the shop becomes known to a broader audience. To plan the number of goods to be held in stock, you need to see the number of orders at any point in time during several weeks. It is a typical multi-step time series problem, and similar problems exist in various domains:

  • Healthcare: e.g., forecasting of health signals such as heart, blood, or breathing signals
  • Network Security: e.g., analysis of network traffic in intrusion detection systems
  • Sales and Marketing: e.g., forecasting of market demand
  • Production demand forecasting: e.g., for power consumption and capacity planning
  • Prediction and filtering of sensor signals, e.g., of audio signals

Recurrent Neural Networks

The model used in this article is a recurrent neural network with Long short-term memory (LSTM) layers. Unlike feedforward neural networks, recurrent networks with LSTM layers have loops to pass output values from one training instance to the next.

LSTM layer as part of a time series regression model, Rolling forecasting

The Training Process of a Recurrent Neural Network

The training process of neural networks covers several epochs. An epoch is a training iteration over the whole input data. The entire training dataset is passed forward and backward in multiple slices through the neural network during an epoch. The network adjusts the weights throughout this process. In addition, the batch size determines after how many examples the model updates the weights between its neurons.

After one epoch, a model will typically underfit the data, resulting in lousy prediction performance. Therefore, one iteration is often not enough, and we need to pass the whole dataset multiple times through the neural network to learn. On the other hand, one should be careful not to choose the number of epochs too high. The reason is that a model tends to overfit after some time. Such a model will achieve excellent performance on the training data but poor performance on any other data.

About LSTM Layers

In recent years, recurrent neural networks have achieved excellent results. The LSTM architecture enables the network to preserve specific learnings throughout the training process. What the network learned in a previous iteration informs later epochs. In this way, the network considers information on patterns on different levels of abstraction. This chain-like structure predestines recurrent neural networks for working with sequences and lists.

This article uses LSTM layers combined with a rolling forecast approach to predict the course of a sinus curve with a linear slope. As illustrated below, this approach generates predictions for multiple timesteps by iteratively reusing the model outputs of the previous training run.

Functioning of an LSTM layer as it is used for time-series prediction models. Rolling forecasting
Functioning of an LSTM layer

Creating a Rolling Multi-Step Time Series Forecast in Python

In the following, we will use Python to create a rolling multi-step forecast for a synthetically generated rising sine curve. After completing this tutorial, you should understand the steps involved in multi-step time series forecasting.

The code is available on the GitHub repository.

Prerequisites

Before starting the coding part, make sure that you have set up your Python 3 environment and required packages. If you don’t have an environment set up yet, you can follow this tutorial to set up the Anaconda environment.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages: 

In addition, we will be using Keras (2.0 or higher) with Tensorflow backend and the machine learning library Scikit-learn.

You can install packages using console commands:

  • pip install <package name>
  • conda install <package name> (if you are using the anaconda packet manager)

Step #1 Generating Synthetic Data

We begin by creating a synthetic dataset. The data contains 300 values of the sinus function combined with a slight linear upward slope of 0.02. Running the code below creates the data and visualizes it in a line plot.

# A tutorial for this file is available at www.relataly.com

import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, TimeDistributed, Dropout, Activation
from sklearn.preprocessing import RobustScaler
from sklearn.metrics import mean_absolute_error
import tensorflow as tf
import seaborn as sns
sns.set_style('white', { 'axes.spines.right': False, 'axes.spines.top': False})

# check the tensorflow version and the number of available GPUs
print('Tensorflow Version: ' + tf.__version__)
physical_devices = tf.config.list_physical_devices('GPU')
print("Num GPUs:", len(physical_devices))

# Creating the synthetic sinus curve dataset
steps = 300
gradient = 0.02
list_a = []
for i in range(0, steps, 1):
    y = round(gradient * i + math.sin(math.pi * 0.125 * i), 5)
    list_a.append(y)
df = pd.DataFrame({"sine_curve": list_a}, columns=["sine_curve"])

# Visualizing the data
fig, ax1 = plt.subplots(figsize=(16, 4))
ax1.xaxis.set_major_locator(plt.MaxNLocator(30))
plt.title("Sinus Data")
sns.lineplot(data=df[["sine_curve"]], color="#039dfc")
plt.legend()
plt.show()

The signal curve oscillates and is steadily moving upward.

Step #2 Preprocessing

Next, we preprocess the data to bring it into the shape required by our neural network model. Running the code below will perform the following tasks:

  • Scale the data to a standard range.
  • Partition the data into multiple periods with a predefined sequence length. Each partition series contains 110 data points. The number of steps needs to match the number of neurons in the first layer of the neural network architecture.
  • Split the data into two separate datasets for training and testing.
data = df.copy()

# Get the number of rows in the data
nrows = df.shape[0]

# Convert the data to numpy values
np_data_unscaled = np.array(df)
np_data_unscaled = np.reshape(np_data_unscaled, (nrows, -1))
print(np_data_unscaled.shape)

# Transform the data by scaling each feature to a range between 0 and 1
scaler = RobustScaler()
np_data_scaled = scaler.fit_transform(np_data_unscaled)

# Set the sequence length - this is the timeframe used to make a single prediction
sequence_length = 110

# Prediction Index
index_Close = 0

# Split the training data into train and train data sets
# As a first step, we get the number of rows to train the model on 80% of the data 
train_data_len = math.ceil(np_data_scaled.shape[0] * 0.8)

# Create the training and test data
train_data = np_data_scaled[0:train_data_len, :]
test_data = np_data_scaled[train_data_len - sequence_length:, :]

# The RNN needs data with the format of [samples, time steps, features]
# Here, we create N samples, sequence_length time steps per sample, and 6 features
def partition_dataset(sequence_length, data):
    x, y = [], []
    data_len = data.shape[0]
    for i in range(sequence_length, data_len):
        x.append(data[i-sequence_length:i,:]) #contains sequence_length values 0-sequence_length * columsn
        y.append(data[i, index_Close]) #contains the prediction values for validation (3rd column = Close),  for single-step prediction
    
    # Convert the x and y to numpy arrays
    x = np.array(x)
    y = np.array(y)
    return x, y

# Generate training data and test data
x_train, y_train = partition_dataset(sequence_length, train_data)
x_test, y_test = partition_dataset(sequence_length, test_data)

# Print the shapes: the result is: (rows, training_sequence, features) (prediction value, )
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# Validate that the prediction value and the input match up
# The last close price of the second input sample should equal the first prediction value
print(x_test[1][sequence_length-1][index_Close])
print(y_test[0])
multi-step time series regression. data processing output

Step #3 Preparing Data and Model

Now that we have prepared the synthetic data, we need to determine the architecture of our neural network. There are no clear guidelines available on how to configure a neural network. Most problems are unique, and the model settings depend on the nature and extent of the data. Thus, configuring LSTM layers can be a real challenge.

3.1 Overview of Model Parameters

Finding the optimal configuration is often a process of trial and error. Below you find a list of model parameters with which you can experiment:

Source: keras.io/layers/recurrent – view the Keras documentation for a complete list of parameters.

3.2 Choosing Model Parameters

Finding a suitable architecture for a neural network is not easy and requires a systematic approach. A common practice is to start with an established standard architecture. We can then change the model parameters slightly and record the configurations and outcomes. We can also automate configuring and testing the model (Hyperparameter Tuning). Still, because this is not the focus of this article, we will use a manual approach.

We start with five epochs, a batch_size of 1, and configure our recurrent model with one LSTM layer with 110 neurons, corresponding to a data slice of 110 input values. In addition, we add a dense layer that provides us with a single output value.

# Configure the neural network model
epochs = 12; batch_size = 1;

# Model with n_neurons = inputshape Timestamps, each with x_train.shape[2] variables
n_neurons = x_train.shape[1] * x_train.shape[2]
model = Sequential()
model.add(LSTM(n_neurons, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(LSTM(n_neurons, return_sequences=False))
model.add(Dense(5))
model.add(Dense(1))
model.compile(optimizer="adam", loss="mean_squared_error")

Step #3 Training the Prediction Model

After designing the architecture of our recurrent neural network, the next step is to train the model. Training may take some time, depending on the performance of your computer.

# Train the model
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs)

The complexity of the model is relatively low, and we only use five training epochs. It should thus only take a couple of minutes to train the model.

Step #4 Predicting a Single-step Ahead

We continue by making single-step predictions based on the training data. Then we calculate the mean squared error and the median error to measure the performance of our model.

# Reshape the data, so that we get an array with multiple test datasets
x_test_np = np.array(x_test)
x_test_reshape = np.reshape(x_test_np, (x_test_np.shape[0], x_test_np.shape[1], 1))

# Get the predicted values
y_pred = model.predict(x_test_reshape)
y_pred_unscaled = scaler.inverse_transform(y_pred)
y_test_unscaled = scaler.inverse_transform(y_test.reshape(-1, 1))

# Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_unscaled, y_pred_unscaled)
print(f'Median Absolute Error (MAE): {np.round(MAE, 2)}')

# Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(y_test_unscaled, y_pred_unscaled)/ y_test_unscaled))) * 100
print(f'Mean Absolute Percentage Error (MAPE): {np.round(MAPE, 2)} %')

# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(y_test_unscaled, y_pred_unscaled)/ y_test_unscaled)) ) * 100
print(f'Median Absolute Percentage Error (MDAPE): {np.round(MDAPE, 2)} %')

Out: me: 0.0223, rmse: 0.0206

Both the mean error and the squared mean error are pretty small. As a result, the values predicted by our model are close to the actual values of the ground truth. Even though it is unlikely because of the small number of epochs, it could still be that the model is over-fitting.

Step #5 Visualizing Predictions and Loss

Next, we look at the quality of the training predictions by plotting the training predictions and the actual values (i.e., ground truth).

# Visualize the data
#train = df[:train_data_len]
df_valid_pred = df[train_data_len:]
df_valid_pred.insert(1, "y_pred", y_pred_unscaled, True)

# Create the lineplot
fig, ax1 = plt.subplots(figsize=(32, 5), sharex=True)
ax1.tick_params(axis="x", rotation=0, labelsize=10, length=0)
plt.title("Predictions vs Ground Truth")
sns.lineplot(data=df_valid_pred)
plt.show()
multi-step time series regression
Training Predictions vs. Ground Truth

The smaller the area between the two lines, the better our model predictions. So we can tell from the plot that the predictions are not entirely wrong. We also check the learning path of the regression model.

# Plot training & validation loss values
fig, ax = plt.subplots(figsize=(10, 5), sharex=True)
sns.lineplot(data=history.history["loss"])
plt.title("Model loss")
plt.ylabel("Loss")
plt.xlabel("Epoch")
ax.xaxis.set_major_locator(plt.MaxNLocator(epochs))
plt.legend(["Train"], loc="upper left")
plt.grid()
plt.show()
loss function of our neural network model - multi-step time series regression

The loss drops quickly, and after five epochs, the model seems to have converged.

Step #6 Rolling Forecasting: Creating Multi-step Time Series Predictions

Next, we will generate the rolling multi-step forecast. This approach is different from a single-step approach in that we predict several points of a signal within a prediction window and not just a single value. However, the prediction quality decreases over more extended periods because reusing predictions creates a feedback loop that amplifies potential errors over time.

The forecasting process begins with an initial prediction for a single time step. After that, we add the predicted value to the input values for another projection, and so on. In this way, we create the rolling forecast with multiple time steps.

# Settings and Model Labels
rolling_forecast_range = 30
titletext = "Forecast Chart Model A"
ms = [
    ["epochs", epochs],
    ["batch_size", batch_size],
    ["lstm_neuron_number", n_neurons],
    ["rolling_forecast_range", rolling_forecast_range],
    ["layers", "LSTM, DENSE(1)"],
]
settings_text = ""
lms = len(ms)
for i in range(0, lms):
    settings_text += ms[i][0] + ": " + str(ms[i][1])
    
    if i < lms - 1:
        settings_text = settings_text + ",  "

# Making a Multi-Step Prediction
# Create the initial input data
new_df = df.filter(["sine_curve"])
for i in range(0, rolling_forecast_range):
    # Select the last sequence from the dataframe as input for the prediction model
    last_values = new_df[-n_neurons:].values
    
    # Scale the input data and bring it into shape
    last_values_scaled = scaler.transform(last_values)
    X_input = np.array(last_values_scaled).reshape([1, 110, 1])
    X_test = np.reshape(X_input, (X_input.shape[0], X_input.shape[1], 1))
    
    # Predict and unscale the predictions
    pred_value = model.predict(X_input)
    pred_value_unscaled = scaler.inverse_transform(pred_value)
    
    # Add the prediction to the next input dataframe
    new_df = pd.concat([new_df, pd.DataFrame({"sine_curve": pred_value_unscaled[0, 0]}, index=new_df.iloc[[-1]].index.values + 1)])
    new_df_length = new_df.size
forecast = new_df[new_df_length - rolling_forecast_range : new_df_length].rename(
    columns={"sine_curve": "Forecast"}
)

We can plot the forecast together with the ground truth.

#Visualize the results
validxs = df_valid_pred.copy()
dflen = new_df.size - 1
dfs = pd.concat([validxs, forecast], sort=False)
dfs.at[dflen, "y_pred_train"] = dfs.at[dflen, "y_pred"]
dfs

# Zoom in to a closer timeframe
df_zoom = dfs[dfs.index > 200]

# Visualize the data
fig, ax = plt.subplots(figsize=(16, 5), sharex=True)
ax.tick_params(axis="x", rotation=0, labelsize=10, length=0)
ax.xaxis.set_major_locator(plt.MaxNLocator(rolling_forecast_range))
plt.title('Forecast And Predictions (y_pred)')
sns.lineplot(data=df_zoom[["sine_curve", "y_pred"]], ax=ax)
sns.scatterplot(data=df_zoom[["Forecast"]], x=df_zoom.index, y='Forecast', linewidth=1.0, ax=ax)
plt.legend(["x_train", "y_pred", "y_test_rolling"], loc="lower right")
ax.annotate('ModelSettings: ' + settings_text, xy=(0.06, .015),  xycoords='figure fraction', horizontalalignment='left', verticalalignment='bottom', fontsize=10)
plt.show()
Forecast for the sinus curve with 30 timesteps (initial try)

The model learned to predict the periodic movement of the curve and thus succeeds in modeling its further course. However, the amplitude of the predicted curve increases over time, which increases the prediction errors.

Step #7 Comparing Results for Different Parameters

There is plenty of room to improve the forecasting model further. So far, our model seems not to consider that the amplitude of the sinus curve is gradually increasing. Over time, errors are amplified, leading to a growing deviation from the ground truth signal.

We can improve the model by changing the model parameters and the model architecture. However, I would not recommend changing them one at a time.

I tested several model configurations with varying epochs and neuron numbers/sample sizes to further optimize the model. Parameters such as Batch_size (=1), the timeframe for the forecast (=30 timesteps), and the model architecture (=1 LSTM Layer, 1 DENSE Layer) were left unchanged. The illustrations below show the results:

Different configurations of the time-series forecasting model
Different designs of the rolling time series forecasting model

The model that looks most promising is model #6. This model considers both the periodic movements of the sinus curve and the steadily increasing slope. The configuration of this model uses 15 epochs and 115 neurons/sample values.

Errors are amplified over more extended periods when they enter a feedback loop. For this reason, we can see in the graph below that the prediction accuracy decreases over time.

Long-time multi-step forecast
Long-time multi-step forecast (model #6)

Summary

In this tutorial, we have created a rolling time-series forecast for a rising sine curve. A multi-step forecast helps to understand better how a signal will develop throughout a more extended period. Finally, we have tested and compared different model variants and selected the best-performing model.

There are several ways to improve model accuracy further. For example, the current network architecture was deliberately kept simple and had only a single LSTM layer. Consequently, we could try to add additional layers. Another possibility is to experiment with different hyperparameters. For example, we could increase the training epochs and use dropout to prevent overfitting. In addition, we could experiment with other activation functions. Feel free to try it out.

Another forecasting approach is to train a neural network model that predicts multiple outputs per input batch. Another relataly article demonstrates how this multi-output regression approach works: Stock Market Prediction – Multi-output regression in Python

Also consider non-neural network approaches to time series forecasting, such as ARIMA, which can achieve great results too.

I hope this article was helpful. If you have questions remaining, let me know in the comments.

Author

  • Hi, I am Florian, a Zurich-based consultant for AI and Data. Since the completion of my Ph.D. in 2017, I have been working on the design and implementation of ML use cases in the Swiss financial sector. I started this blog in 2020 with the goal in mind to share my experiences and create a place where you can find key concepts of machine learning and materials that will allow you to kick-start your own Python projects.

2 thoughts on “Rolling Time Series Forecasting: Creating a Multi-Step Prediction for a Rising Sine Curve using Neural Networks in Python”

  1. If I wanted to include in the graph a prediction of the train-subset (not only the test-subset) how could that be achieved. I’m asking because the way that the arrays are built makes the x_train have a different length from the train.

    Reply
  2. Hi, I was trying to combine it with 007 “Multivariate time series” but in the section Step6 I am always getting errors that operand with shape (1,1) doesn’t match the broadcast shape (1,6) … How can Step 6 be adjusted to be able to work with mulitvariate series? I would highly appreciate your feedback!

    Reply

Leave a Comment