Multi-step Time Series Forecasting with Python: Step-by-Step Guide

Multi-step time series forecasting

Time series forecasting is about estimating the future value of a time series on the basis of past data. Many time series problems can be solved by looking a single step into the future. I have recently covered this topic in two blog posts (Part I, Part II). However, for some problems it is not enough to know one value at a time. Instead it is more important to know how a signal will develop over a longer period. Such problem require a multi-step time series forecasting approach and it will be covered in this blog post.

Multi-step time series prediction models the distribution of future values of a signal over a prediction horizon. In other words, this approach predicts multiple output values at the same time. In this tutorial you will learn how to apply a multi-step time series forecasting approach to predict the further course of a gradually rising sine wave. The model that will be used in this tutorial is a recurrent neural network with a single LSTM layer. As we will see, teaching the course of a rising sine curve to a neural network can be a certain challenge.

The problem of a rising sine curve

The histogram below illustrates the sample of a rising sine curve. The goal is to use the ground truth values (blue) to predict the values of this curve for several points in the future (purple).

A time series forecasting problem: predicting sine curve data
A time series forecasting problem: predicting sine curve data

Traditional mathematical methods can resolve this function by decomposing it into constituent parts. Thus, we could easily foresee the further course of the curve. But in practice, the function might not be exactly periodic and also change over a longer period of time. Recurrent neural networks therefore have been able to achieve better results as traditional mathematical approaches, when they are trained with large amounts of data.

Application fields

A rising sine wave may sound like an abstract problem at first. However, similar problems are very common. Imagine you are the owner of an online shop. The number of users who visit your shop and make orders fluctuates depending on the time of day and day of the week. For instance, at night there are fewer visitors and on weekends the number of visitors rises sharply. At the same time, the overall number of users increases over a longer period of time, as the shop becomes known to a wider audience. To plan the number of goods to be held in stock you need to know the number of orders at any point in time during several weeks.

This is a typical multi-step time series problem and similar problems exist in various domains:

  • Healthcare: e.g., forecasting of health signals such as heart, blood or breathing signals
  • Network Security: e.g., analysis of network traffic in intrusion detection systems
  • Sales and Marketing: e.g., forecasting of market demand
  • Production demand forecasting: e.g., for power consumption and capacity planning
  • Prediction and filtering of sensor signals, e.g., of audio signals

About Recurrent Neural Networks

The model used in this tutorial will be a recurrent neural network with Long short-term memory (LSTM) layers. Unlike feedforward neural networks, recurrent networks with LSTM layers have loops that enable them to pass output values from one training instance to iterative instances.

Neural Networks are trained in multiple epochs. An epoch is a training iteration over the whole input data. During an epoch, the entire training dataset is passed forward and backward in multiple slices through the neural network. The weights of the network are adjusted throughout this process. In addition, the batch size determines after how many examples the model updates the weights between its neurons.

However, after one epoch a network is typically underfitting the data, resulting in bad prediction performance. Therefore, one iteration is typically not enough and we need to pass the whole dataset multiple times through the neural network to enable it to learn. On the other hand, one should be careful not to choose the number of epochs to high. The reason is that a model tends to overfit after some time. Such a model will achieve great performance on the training data, but poor performance on any other data.

About LSTM layers

The LSTM architecture enables the network to preserve certain learnings throughout the whole training process. What the network learned in a previous iteration informs later epochs. This allows the network to consider information on patterns on different levels of abstraction. Because of this chain like nature, recurrent neural networks are predestined to work on sequences and lists. In recent years, recurrent neural networks have achieved excellent results in these areas.

In this tutorial, we will use LSTM layers in combination with a rolling forecast approach to forecast the sinus curve with a linear slope. As illustrated below, this approach generates predictions for multiple timesteps by iteratively reusing the model outputs of the previous training run.

Functioning of an LSTM layer
Functioning of an LSTM layer

Creating a multi-step time series forecasting model in Python

After completing this tutorial, you should be able to understand the steps involved in multi-step time series forecasting. In addition, you should be able to apply them to other domains of your choice.

The steps covered in this tutorial are as follows:

  1. Generating sample time series data
  2. Configuring the time series prediction model
  3. Training the model
  4. Generating a single-step prediction
  5. Predicting a single-step value
  6. Multi-step time series predictions
  7. Comparing results for different parameters
  8. Ideas to further optimize the prediction model


This tutorial assumes that you have setup your python environment. I recommend using the Anaconda environment. If you have not yet set the environment up, you can follow this tutorial.

It is also assumed that you have the following packages installed: keras (2.0 or higher) with Tensorflow backend, numpypandasmatplotsklearn. The packages can be installed using the console command:

pip install <packagename> or when you are using the anaconda environment, conda install <packagename>

1) Generating sample time series data

Let’s kick of this tutorial by creating a test dataset: This set contains 300 values of the sinus function combined with a slight linear slope of 0.02.

# Setting up packages for data manipulation and machine learning
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from keras.models import Sequential
from sklearn.preprocessing import MinMaxScaler
from keras.layers import LSTM, Dense, TimeDistributed, Dropout, Activation

# Creating the sample sinus curve dataset
steps = 300
gradient = 0.02
list_a = []
for i in range(0, steps, 1):
    y = round(gradient * i + math.sin(math.pi * 0.125 * i), 5)
df = pd.DataFrame({"valid": list_a}, columns=["valid"])

# Visualizing the data
fig, ax1 = plt.subplots(figsize=(16, 4))
plt.title("Sinus Data")
plt.plot(df[["valid"]], color="#039dfc", linewidth=3.0)

As shown below, now we have created a dataset with a signal curve that is steadily moving upward.

2) Preparing data and model

Configuring LSTM layers can be a challenge because there are no clear guidelines available on how to configure the network. In fact, most problems are quite unique and the model settings depend on the nature and extent of the data. Therefore, finding the optimal configuration is often a process of trial and error.

Source: – for a full list of parameters view the keras documentation

Finding the optimal parameter range requires a systematic approach. Good results can be achieved by slightly altering model parameters and keeping track of which configurations were already tried along with the results.

We start with five epochs, a batch_size of 1, and configure our recurrent model with one LSTM layer with 100 neurons, which corresponds to a data slice of 100 input values. In addtion, we add a dense layer that provides us with a single output value.

# Settings
epochs = 12; batch_size = 1; lstm_neuron_number = 110

# Get the number of rows to train the model on 80% of the data
npdataset = df.values
training_data_length = math.ceil(len(npdataset) * 0.8)

# Transform features by scaling each feature to a range between 0 and 1
mmscaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = mmscaler.fit_transform(npdataset)

# Create a scaled training data set
train_data = scaled_data[0:training_data_length, :]

# Split the data into x_train and y_train data sets
x_train = []
y_train = []
trainingdatasize = len(train_data)
for i in range(lstm_neuron_number, trainingdatasize):
        train_data[i - lstm_neuron_number : i, 0]
    )  # contains lstm_neuron_number values 0-lstm_neuron_number
    y_train.append(train_data[i, 0])  # contains all other values

# Convert the x_train and y_train to numpy arrays
x_train = np.array(x_train)
y_train = np.array(y_train)

# Reshape the data
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
print("x_tain.shape: " + str(x_train.shape) + " -- y_tain.shape: " + str(y_train.shape))

# Configure and compile the neural network model
model1 = Sequential()
    LSTM(lstm_neuron_number, return_sequences=False, input_shape=(x_train.shape[1], 1))
model1.compile(optimizer="adam", loss="mean_squared_error")

# Create the test data set
test_data = scaled_data[training_data_length - lstm_neuron_number :, :]

# Create the data sets x_test and y_test
x_test = []
y_test = npdataset[training_data_length:, :]
for i in range(lstm_neuron_number, len(test_data)):
    x_test.append(test_data[i - lstm_neuron_number : i, 0])

3) Training the model

The next step is to train the recurrent neural network model.

# Train the model
history =, y_train, batch_size=batch_size, epochs=epochs)

It should only take a couple of minutes to train the model. This is because the complexity of the model is rather low and we only use five training epochs,

4) Predicting a single-step value

We continue by making single-step predictions based on the training data. We also calculate mean squared error and median error as measures for the performance of our model.

# Reshape the data, so that we get an array with multiple test datasets
x_test = np.array(x_test)
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))

# Get the predicted values
predictions = model1.predict(x_test)
predictions = mmscaler.inverse_transform(predictions)

# Get the root mean squarred error (RMSE) and the meadian error (ME)
rmse = np.sqrt(np.mean(predictions - y_test) ** 2)
me = np.median(y_test - predictions)
print("me: " + str(round(me, 4)) + ", rmse: " + str(round(rmse, 4)))

Out: me: 0.0223, rmse: 0.0206

Both the mean error and the squared mean error are quite small. This means that the values predicted by our model are close to the actual values of the ground truth. Even though it is unlikely because of the small number of epochs, it could still be that the model is over-adapting.

5) Visualizing predictions and loss

Next we will get an overview of how good the training predictions are. We will do this by plotting the train predictions and the actual values (i.e., ground truth).

# Visualize the data
train = df[:training_data_length]
valid = df[training_data_length:]
valid.insert(1, "Predictions", predictions, True)
fig, ax1 = plt.subplots(figsize=(32, 5), sharex=True)
yt = train[["valid"]]
yv = valid[["valid", "Predictions"]]
ax1.tick_params(axis="x", rotation=0, labelsize=10, length=0)
plt.title("Predictions vs Ground Truth", fontsize=18)
plt.plot(yv["Predictions"], color="#F9A048")
plt.plot(yv["valid"], color="#A951DC")
plt.legend(["Ground Truth", "Train"], loc="upper left")
Training Predictions vs Ground Truth

Take a look at the illustration above. The smaller the area between the two lines, the better are the predictions of our model. We can tell from the plot that the predictions of our model are not completely wrong.

Because it is a best practice, we will also check the learning path of the neural network model.

# Plot training & validation loss values
fig, ax = plt.subplots(figsize=(5, 5), sharex=True)
plt.title("Model loss")
plt.legend(["Train", "Test"], loc="upper left")
loss function
loss function

Loss drops quickly and after five epochs, our model seems to have reached quite a good fit.

6) Multi-step time series predictions

Next, we will generate a recursive multi-step forecast. These models are different from single-step models in that they predict several points of a signal within a prediction window and not just a single value. However, reusing predictions creates a feedback loop that amplifies potential errors over time. Therefore, the quality of the prediction decreases over longer periods of time.

We will start with an initial prediction for a single time-step. After that we will add the predicted value to the input values for another prediction, and so on. In this way, we create predictions for multiple time steps.

# Settings and Model Labels
rolling_forecast_range = 30
titletext = "Forecast Chart Model A"
ms = [
    ["epochs", epochs],
    ["batch_size", batch_size],
    ["lstm_neuron_number", lstm_neuron_number],
    ["rolling_forecast_range", rolling_forecast_range],
    ["layers", "LSTM, DENSE(1)"],
settings_text = ""
lms = len(ms)
for i in range(0, lms):
    settings_text += ms[i][0] + ": " + str(ms[i][1])
    if i < lms - 1:
        settings_text = settings_text + ",  "

# Making a Multi-Step Prediction
new_df = df.filter(["valid"])
for i in range(0, rolling_forecast_range):
    last_values = new_df[-lstm_neuron_number:].values
    last_values_scaled = mmscaler.transform(last_values)
    X_input = []
    X_input = np.array(X_input)
    X_test = np.reshape(X_input, (X_input.shape[0], X_input.shape[1], 1))
    pred_value = model1.predict(X_input)
    pred_value_unscaled = mmscaler.inverse_transform(pred_value)
    pred_value_f = round(pred_value_unscaled[0, 0], 4)
    next_index = new_df.iloc[[-1]].index.values + 1
    new_df = new_df.append(pd.DataFrame({"valid": pred_value_f}, index=next_index))
    new_df_length = new_df.size
forecast = new_df[new_df_length - rolling_forecast_range : new_df_length].rename(
    columns={"valid": "Forecast"}

Now we can plot the forecasted values together with the ground truth and the test predictions.

#Visualize the results
validxs = valid.copy()
dflen = new_df.size - 1
validxs.insert(2, "Forecast", forecast, True)
dfs = pd.concat([validxs, forecast], sort=False)[dflen, "Forecast"] =[dflen, "Predictions"]

# Zoom in to a closer timeframe
dfs = dfs[dfs.index > 200]
yt = dfs[["valid"]]
yv = dfs[["Predictions"]]
yz = dfs[["Forecast"]]
xz = dfs[["Forecast"]].index

# Visualize the data
fig, ax1 = plt.subplots(figsize=(16, 5), sharex=True)
ax1.tick_params(axis="x", rotation=0, labelsize=10, length=0)
plt.title('Forecast Basic Model', fontsize=18)
plt.plot(yt, color="#039dfc", linewidth=1.5)
plt.plot(yv, color="#F9A048", linewidth=1.5)
plt.scatter(xz, yz, color="#F332E6", linewidth=1.0)
plt.plot(yz, color="#F332E6", linewidth=0.5)
plt.legend(["Ground Truth", "TestPredictions", "Forecast"], loc="upper left")
ax1.annotate('ModelSettings: ' + settings_text, xy=(0.06, .015),  xycoords='figure fraction', horizontalalignment='left', verticalalignment='bottom', fontsize=10)
Forecast for the sinus curve with 30 timesteps (initial try)

Not bad for a first try. We can see that the model learned to predict the periodic movement of the curve and thus succeeds in modeling the curve further. However, what we can also see is that the amplitude of the curve increases over time. One reason is certainly that errors are amplified over time and lead to a growing deviation from the ground truth signal. Another reason is that the model fails to take into account the gradually increasing slope of the sinus curve.

7) Comparing results for different parameters

There is definitely room for further improvement of our model. We can try to improve the model by changing the model parameters and the model architecture. However, I would not recommend to change them one at a time.

To further optimize the model, I tested several model configurations with varying epochs and neuron-numbers/sample-sizes. Parameters such as Batch_size (=1), the timeframe for the forecast (=30 timesteps) and the model architecture (=1 LSTM Layer, 1 DENSE Layer) were left unchanged.

I tested several model configurations. The results are shown below:

Different configurations of the time-series forecasting model
Different configurations of the time series forecasting model

The model that looks most promising is model #6. This model considers both the periodic movements of the sinus curve and the steadily increasing slope. The configuration of this model uses 15 epochs and 115 neurons/sample values.

Errors are amplified over time when they enter a feedback loop. For this reason, we can see in the graph below that the accuracy of the prediction decreases over time.

Long-time multi-step forecast
Long-time multi-step forecast (model #6)

8) Ideas to further optimize the prediction model

there is definitely room for further improving the model. For instance, the current network architecture was deliberately kept simple. For instance, it has only a single LSTM layer. Consequently, a first way to optimize the model could be to add additional layers. However, this might also require a larger amount of training data. In addition, one could try to increase the training epochs. However, then one should also add a dropout rate to prevent the model from overfitting. A third option that could lead to improvements is to try different parameter configurations. For instance, one could use another activation function. Feel free to try it out.


This tutorial has guided you through the steps to create multi-step time series forecasting model. We have compared different settings and finally selected a best-performing model.

Please let me know in the comments if you found this tutorial useful or if you have questions remaining. And if you want to learn more about time series forecasting, you might want to check out my article on evaluating time series forecasting models. 🙂

Follow Florian Müller:

Data Scientist & Machine Learning Consultant

Hi, my name is Florian! I am a Zurich-based Data Scientist with a passion for Artificial Intelligence and Machine Learning. After completing my PhD in Business Informatics at the University of Bremen, I started working as a Machine Learning Consultant for the swiss consulting firm ipt. When I'm not working on use cases for our clients, I work on own analytics projects and report on them in this blog.

2 Responses

  1. Recurrent Neural Networks Part 2: Looking further ahead –

    […] Multi-Step Rolling Forecasting: Another way is to train the model on its own output. We do this by maintaining the predictions from one output and reuse them as input in the subsequent training run. In this way, with each iteration the model looks one time step further ahead. Based on daily input timesteps, after seven iterations, the model will have provided the output for a weekly prediction. I have covered this topic in a separate post. […]

Leave a Reply