Rolling Time Series Forecasting: Creating a Multi-Step Prediction for a Rising Sine Curve using Neural Networks in Python

We can solve many time forecasting problems by looking at a single step into the future. However, some forecasting problems require us to understand how a signal will develop over a more extended period so that it is not enough to forecast one step at a time. Such problems require a multi-step time series forecasting approach. Multi-step time series forecasting is about modeling the distribution of future values of a signal over a prediction horizon. This article will cover this approach at the example of a rising sine curve. We create a rolling forecast for the sine curve using Keras neural networks with LSTM layers in Python. In this approach, we generate several single-step predictions and iteratively reuse them as input to predict further steps in the future.

The remainder of this article proceeds as follows: We begin by looking at the sine curve problem and provide a quick intro to recurrent neural networks. After this conceptual introduction, we turn to the hands-on part in Python. We generate synthetic sine curve data and preprocess the data to train univariate models using a Keras neural network. We thereby experiment with different architectures and hyperparameters. Then, we use these model variants to create rolling multi-step forecasts. Finally, we evaluate the model performance.

A multi-step time series forecast for a rising sine curve, as we will create it in this article
A multi-step time series forecast for a rising sine curve, as we will create in this article.

If you are just getting started with time-series forecasting, we have covered the single-step forecasting approach in two previous articles:

Predicting Stock Markets with Neural Networks – A Step-by-Step Guide
Stock Market Prediction – Adjusting Time Series Prediction Intervals in Python

The Problem of a Rising Sine Curve

The line plot below illustrates the sample of a rising sine curve. The goal is to use the ground truth values (blue) to forecast several points in this curve (purple).

A time series forecasting problem: predicting sine curve data
A time series forecasting problem: predicting sine curve data

Traditional mathematical methods can resolve this function by decomposing it into constituent parts. Thus, we could easily foresee the further course of the curve. But in practice, the function might not be precisely periodic and change over a more extended period. Therefore, recurrent neural networks can achieve better results than traditional mathematical approaches, especially when they train on extensive data.

A rising sine wave may sound like an abstract problem at first. However, similar issues are widespread. Imagine you are the owner of an online shop. The users who visit your shop and buy something fluctuate depending on the time and the weekday. For instance, at night there are fewer visitors, and on weekends the number of visitors rises sharply. At the same time, the overall number of users increases over a more extended period as the shop becomes known to a broader audience. To plan the number of goods to be held in stock, you need to see the number of orders at any point in time during several weeks. It is a typical multi-step time series problem, and similar problems exist in various domains:

  • Healthcare: e.g., forecasting of health signals such as heart, blood, or breathing signals
  • Network Security: e.g., analysis of network traffic in intrusion detection systems
  • Sales and Marketing: e.g., forecasting of market demand
  • Production demand forecasting: e.g., for power consumption and capacity planning
  • Prediction and filtering of sensor signals, e.g., of audio signals

Recurrent Neural Networks

The model used in this article is a recurrent neural network with Long short-term memory (LSTM) layers. Unlike feedforward neural networks, recurrent networks with LSTM layers have loops that enable them to pass output values from one training instance to the next.

lstm layer

The Training Process of a Recurrent Neural Network

The training process of neural networks covers several epochs. An epoch is a training iteration over the whole input data. During an epoch, the entire training dataset is passed forward and backward in multiple slices through the neural network. The network adjusts the weights throughout this process. In addition, the batch size determines after how many examples the model updates the weights between its neurons.

After one epoch, a model will typically underfit the data, resulting in lousy prediction performance. Therefore, one iteration is often not enough, and we need to pass the whole dataset multiple times through the neural network to learn. On the other hand, one should be careful not to choose the number of epochs too high. The reason is that a model tends to overfit after some time. Such a model will achieve excellent performance on the training data but poor performance on any other data.

About LSTM Layers

The LSTM architecture enables the network to preserve specific learnings throughout the whole training process. What the network learned in a previous iteration informs later epochs. In this way, the network considers information on patterns on different levels of abstraction. This chain-like structure predestines recurrent neural networks for working with sequences and lists. In recent years, recurrent neural networks have achieved excellent results in these areas.

This article uses LSTM layers combined with a rolling forecast approach to predict the course of a sinus curve with a linear slope. As illustrated below, this approach generates predictions for multiple timesteps by iteratively reusing the model outputs of the previous training run.

Functioning of an LSTM layer
Functioning of an LSTM layer

Creating a Rolling Multi-Step Time Series Forecast in Python

In the following, we will use Python to create a rolling multi-step forecast for a synthetically generated rising sine curve. After completing this tutorial, you should understand the steps involved in multi-step time series forecasting.

Prerequisites

Before we start the coding part, make sure that you have set up your Python 3 environment and required packages. If you don’t have an environment set up yet, you can follow this tutorial to set up the Anaconda environment.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages: 

In addition, we will be using Keras (2.0 or higher) with Tensorflow backend and the machine learning library Scikit-learn.

You can install packages using console commands:

  • pip install <package name>
  • conda install <package name> (if you are using the anaconda packet manager)

Step #1 Generating Synthetic Data

We will kick off this tutorial by creating a synthetic dataset. The data contains 300 values of the sinus function combined with a slight linear upward slope of 0.02.

import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from keras.models import Sequential
from sklearn.preprocessing import MinMaxScaler
from keras.layers import LSTM, Dense, TimeDistributed, Dropout, Activation
from sklearn.preprocessing import RobustScaler

# Creating the sample sinus curve dataset
steps = 300
gradient = 0.02
list_a = []
for i in range(0, steps, 1):
    y = round(gradient * i + math.sin(math.pi * 0.125 * i), 5)
    list_a.append(y)
df = pd.DataFrame({"valid": list_a}, columns=["valid"])

# Visualizing the data
fig, ax1 = plt.subplots(figsize=(16, 4))
ax1.xaxis.set_major_locator(plt.MaxNLocator(30))
plt.title("Sinus Data")
plt.plot(df[["valid"]], color="#039dfc", linewidth=3.0)
plt.grid()
plt.show()

As shown below, the signal curve is steadily moving upward.

Step #2 Preparing Data and Model

There are no clear guidelines available on how to configure a neural network. Most problems are unique, and the model settings depend on the nature and extent of the data. Thus, configuring LSTM layers can be a real challenge.

2.1 Overview of Model Parameters

Finding the optimal configuration is often a process of trial and error. Below you find a list of model parameters with which you can experiment:

Source: keras.io/layers/recurrent – for a complete list of parameters, view the Keras documentation.

2.2 Choosing Model Parameters

Finding a suitable architecture for a neural network is not easy and requires a systematic approach.

We can often achieve good results by slightly changing the model parameters and recording the configurations and outcomes. You can also try to automate the process of configuring and testing the model (Hyperparameter Tuning). Still, because this is not the focus of this article, we will use a manual approach.

We start with five epochs, a batch_size of 1, and configure our recurrent model with one LSTM layer with 100 neurons, corresponding to a data slice of 100 input values. In addition, we add a dense layer that provides us with a single output value.

# Get the number of rows in the data
nrows = df.shape[0]

# Convert the data to numpy values
np_data_unscaled = np.array(df)
np_data_unscaled = np.reshape(np_data_unscaled, (nrows, -1))
print(np_data_unscaled.shape)

# Transform the data by scaling each feature to a range between 0 and 1
scaler = RobustScaler()
np_data = scaler.fit_transform(np_data_unscaled)

# Set the sequence length - this is the timeframe used to make a single prediction
sequence_length = 110

# Prediction Index
index_Close = 0

# Split the training data into train and train data sets
# As a first step, we get the number of rows to train the model on 80% of the data 
train_data_len = math.ceil(np_data.shape[0] * 0.8)

# Create the training and test data
train_data = np_data[0:train_data_len, :]
test_data = np_data[train_data_len - sequence_length:, :]

# The RNN needs data with the format of [samples, time steps, features]
# Here, we create N samples, sequence_length time steps per sample, and 6 features
def partition_dataset(sequence_length, data):
    x, y = [], []
    data_len = data.shape[0]
    for i in range(sequence_length, data_len):
        x.append(data[i-sequence_length:i,:]) #contains sequence_length values 0-sequence_length * columsn
        y.append(data[i, index_Close]) #contains the prediction values for validation (3rd column = Close),  for single-step prediction
    
    # Convert the x and y to numpy arrays
    x = np.array(x)
    y = np.array(y)
    return x, y

# Generate training data and test data
x_train, y_train = partition_dataset(sequence_length, train_data)
x_test, y_test = partition_dataset(sequence_length, test_data)

# Print the shapes: the result is: (rows, training_sequence, features) (prediction value, )
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# Validate that the prediction value and the input match up
# The last close price of the second input sample should equal the first prediction value
print(x_test[1][sequence_length-1][index_Close])
print(y_test[0])

Step #3 Training the Prediction Model

After designing the architecture, the next step is to train the recurrent neural network.

# Configure the neural network model
epochs = 12; batch_size = 1;

# Model with n_neurons = inputshape Timestamps, each with x_train.shape[2] variables
n_neurons = x_train.shape[1] * x_train.shape[2]
model = Sequential()
model.add(LSTM(n_neurons, return_sequences=False, input_shape=(x_train.shape[1], 1)))
model.add(Dense(1))
model.compile(optimizer="adam", loss="mean_squared_error")

# Train the model
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs)

The complexity of the model is relatively low, and we only use five training epochs. It should thus only take a couple of minutes to train the model.

Step #4 Predicting a Single-step Ahead

We continue by making single-step predictions based on the training data. Then we calculate the mean squared error and the median error to measure the performance of our model.

# Reshape the data, so that we get an array with multiple test datasets
x_test = np.array(x_test)
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))

# Get the predicted values
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions)

# Get the root mean squarred error (RMSE) and the meadian error (ME)
rmse = np.sqrt(np.mean(predictions - y_test) ** 2)
me = np.median(y_test - predictions)
print("me: " + str(round(me, 4)) + ", rmse: " + str(round(rmse, 4)))

Out: me: 0.0223, rmse: 0.0206

Both the mean error and the squared mean error are pretty small. As a result, the values predicted by our model are close to the actual values of the ground truth. Even though it is unlikely because of the small number of epochs, it could still be that the model is over-fitting.

Step #5 Visualizing Predictions and Loss

Next, we look at the quality of the training predictions by plotting the training predictions and the actual values (i.e., ground truth).

# Visualize the data
train = df[:train_data_len]
valid = df[train_data_len:]
valid.insert(1, "Predictions", predictions, True)
fig, ax1 = plt.subplots(figsize=(32, 5), sharex=True)
yt = train[["valid"]]
yv = valid[["valid", "Predictions"]]
ax1.tick_params(axis="x", rotation=0, labelsize=10, length=0)
plt.title("Predictions vs Ground Truth", fontsize=18)
plt.plot(yv["Predictions"], color="#F9A048")
plt.plot(yv["valid"], color="#A951DC")
plt.legend(["Ground Truth", "Train"], loc="upper left")
plt.grid()
plt.show()
Training Predictions vs. Ground Truth

The smaller the area between the two lines, the better are the predictions of our model. So we can tell from the plot that the predictions are not entirely wrong.

We also check the learning path of the regression model.

# Plot training & validation loss values
fig, ax = plt.subplots(figsize=(5, 5), sharex=True)
plt.plot(history.history["loss"])
plt.title("Model loss")
plt.ylabel("Loss")
plt.xlabel("Epoch")
ax.xaxis.set_major_locator(plt.MaxNLocator(epochs))
plt.legend(["Train", "Test"], loc="upper left")
plt.grid()
plt.show()
loss function of our neural network model
loss function

The loss drops quickly, and after five epochs, the model seems to have converged.

Step #6 Multi-step Time Series Predictions

Next, we will generate the rolling multi-step forecast. This approach is different from a single-step approach in that we predict several points of a signal within a prediction window and not just a single value. However, the quality of the prediction decreases over more extended periods because reusing predictions creates a feedback loop that amplifies potential errors over time.

The forecasting process begins with an initial prediction for a single time step. After that, we add the predicted value to the input values for another projection, and so on. In this way, we create the rolling forecast with multiple time steps.

# Settings and Model Labels
rolling_forecast_range = 30
titletext = "Forecast Chart Model A"
ms = [
    ["epochs", epochs],
    ["batch_size", batch_size],
    ["lstm_neuron_number", n_neurons],
    ["rolling_forecast_range", rolling_forecast_range],
    ["layers", "LSTM, DENSE(1)"],
]
settings_text = ""
lms = len(ms)
for i in range(0, lms):
    settings_text += ms[i][0] + ": " + str(ms[i][1])
    
    if i < lms - 1:
        settings_text = settings_text + ",  "

# Making a Multi-Step Prediction
new_df = df.filter(["valid"])
for i in range(0, rolling_forecast_range):
    last_values = new_df[-n_neurons:].values
    last_values_scaled = scaler.transform(last_values)
    X_input = []
    X_input.append(last_values_scaled)
    X_input = np.array(X_input)
    X_test = np.reshape(X_input, (X_input.shape[0], X_input.shape[1], 1))
    pred_value = model.predict(X_input)
    pred_value_unscaled = scaler.inverse_transform(pred_value)
    pred_value_f = round(pred_value_unscaled[0, 0], 4)
    next_index = new_df.iloc[[-1]].index.values + 1
    new_df = new_df.append(pd.DataFrame({"valid": pred_value_f}, index=next_index))
    new_df_length = new_df.size
forecast = new_df[new_df_length - rolling_forecast_range : new_df_length].rename(
    columns={"valid": "Forecast"}
)

We can plot the forecast together with the ground truth.

#Visualize the results
validxs = valid.copy()
dflen = new_df.size - 1
validxs.insert(2, "Forecast", forecast, True)
dfs = pd.concat([validxs, forecast], sort=False)
dfs.at[dflen, "Forecast"] = dfs.at[dflen, "Predictions"]

# Zoom in to a closer timeframe
dfs = dfs[dfs.index > 200]
yt = dfs[["valid"]]
yv = dfs[["Predictions"]]
yz = dfs[["Forecast"]]
xz = dfs[["Forecast"]].index

# Visualize the data
fig, ax1 = plt.subplots(figsize=(16, 5), sharex=True)
ax1.tick_params(axis="x", rotation=0, labelsize=10, length=0)
ax1.xaxis.set_major_locator(plt.MaxNLocator(30))
plt.title('Forecast Basic Model', fontsize=18)
plt.plot(yt, color="#039dfc", linewidth=1.5)
plt.plot(yv, color="#F9A048", linewidth=1.5)
plt.scatter(xz, yz, color="#F332E6", linewidth=1.0)
plt.plot(yz, color="#F332E6", linewidth=0.5)
plt.legend(["Ground Truth", "TestPredictions", "Forecast"], loc="upper left")
ax1.annotate('ModelSettings: ' + settings_text, xy=(0.06, .015),  xycoords='figure fraction', horizontalalignment='left', verticalalignment='bottom', fontsize=10)
plt.grid()
plt.show()
Multi-step forecast for the sinus curve with 30 timesteps (initial try)
Forecast for the sinus curve with 30 timesteps (initial try)

The model learned to predict the periodic movement of the curve and thus succeeds in modeling its further course. However, the amplitude of the predicted curve increases over time, which increases the prediction errors.

Step #7 Comparing Results for Different Parameters

There is plenty of room to improve the forecasting model further. So far, our model seems not to consider that the amplitude of the sinus curve is gradually increasing. Errors are amplified over time and lead to a growing deviation from the ground truth signal.

We can try to improve the model by changing the model parameters and the model architecture. However, I would not recommend changing them one at a time.

I tested several model configurations with varying epochs and neuron numbers/sample sizes to further optimize the model. Parameters such as Batch_size (=1), the timeframe for the forecast (=30 timesteps), and the model architecture (=1 LSTM Layer, 1 DENSE Layer) were left unchanged. The illustrations below show the results:

Different configurations of the time-series forecasting model
Different designs of the time series forecasting model

The model that looks most promising is model #6. This model considers both the periodic movements of the sinus curve and the steadily increasing slope. The configuration of this model uses 15 epochs and 115 neurons/sample values.

Errors are amplified over more extended periods when they enter a feedback loop. For this reason, we can see in the graph below that the prediction accuracy decreases over time.

Long-time multi-step forecast
Long-time multi-step forecast (model #6)

Summary

Congratulations, you have reached the end of this article! We have created a rolling time-series forecast for a rising sine curve. A multi-step forecast helps to understand better how a signal will develop throughout a more extended period. Finally, we have tested and compared different model variants and selected the best-performing model.

There are several ways to improve model accuracy further. For example, the current network architecture was deliberately kept simple and had only a single LSTM layer. Consequently, we could try to add additional layers. Another possibility is to experiment with different hyperparameters. For example, we could increase the training epochs and use dropout to prevent overfitting. In addition, we could experiment with other activation functions. Feel free to try it out.

I hope this article was helpful. If you have questions remaining, let me know in the comments.

Author

  • Hi, I am Florian, a Zurich-based consultant for AI and Data. Since the completion of my Ph.D. in 2017, I have been working on the design and implementation of ML use cases in the Swiss financial sector. I started this blog in 2020 with the goal in mind to share my experiences and create a place where you can find key concepts of machine learning and materials that will allow you to kick-start your own Python projects.

  1. Vladimir

    Hi, I was trying to combine it with 007 “Multivariate time series” but in the section Step6 I am always getting errors that operand with shape (1,1) doesn’t match the broadcast shape (1,6) … How can Step 6 be adjusted to be able to work with mulitvariate series? I would highly appreciate your feedback!

Leave a Reply