Time series prediction is a hot topic of machine learning. In a previous post on stock market forecasting, I have shown how you can build a prediction model for the S&P500 Stock Market Index. The prediction interval used in this example was a single day. This means, the predictions reached one day ahead. But what if we want to look further into the future? Let’s say, weeks or months? Depending on the use case, you may want to use a different prediction interval. This blog post shows how you can adjust the length of the time steps to make predictions that ranger further into the future.

Warning: Stock markets can be highly volatile and are generally difficult to predict. The prediction model developed in this post only serves to illustrate a use case for Time Series models. It should not be assumed that a simple neural network as described in this blog post is capable of fully mapping the complexity of price development.

## Prediction Intervals

When forecasting one step ahead, the prediction interval is the point in time for which a prediction model will simulate the next value. Basically, there are three different ways to change the prediction interval:

**Single-step Forecasting with bigger timesteps:**In a single-stage forecasting approach, the length of a time step is defined by the input data. For example, a model that uses daily prices as input data will also provide daily forecasts. Changing the length of the input steps will change the output steps to the same extent.**Multi-Step Rolling Forecasting:**Another way is to train the model on its own output. We do this by maintaining the predictions from one output and reuse them as input in the subsequent training run. In this way, with each iteration the predictions range one time step further ahead. Based on daily input time-steps, after seven iterations, the model will have provided the output for a weekly prediction. I have covered this topic in a separate post.**Deep Multi-Output Forecasting:**A third option is to create a NN that does not only predict a single output value for one time-step, but instead provides a whole series of predictions for multiple timesteps.

In this post, I will cover the first pathway – single-step forecasting with bigger timesteps. An advantage of this procedure is that it can be implemented quickly and we can reuse most of the code from the previous example. I will show how you can increase the timesteps and create a model that looks further ahead.

## Predicting the S&P500 for Next Week

This post builds up on the code from a previous post. If you don’t have the python code yet, you can find it here on GitHub. At the end of this tutorial we will have modified the NN from the previous post, so that it predicts the market price of the S&P500 Index for the coming week.

### Prerequisites

This tutorial assumes that you have setup your python environment. I personally use the Anaconda environment. It is also assumed that you have the following packages installed: *keras *(2.0 or higher) with *Tensorflow *backend, *numpy*, *pandas*, *matplot*, *sklearn*. The packages can be installed using the console command:

*pip install <packagename>*

## 1 Adjusting the shape of the input data

As before we start by setting up some basic imports and getting the data via an API. Then we retrieve the S&P500 data.

#remote data access for pandas. We use this to get the data import pandas_datareader as webreader #mathematical functions import math #fundamental package for scientific computing with Python import numpy as np #additional functions for analysing and manipulating data import pandas as pd #visualization - we use this to plot the market data import matplotlib.pyplot as plt #formatting dates import matplotlib.dates as mdates #Deep learning library, used for neural networks from keras.models import Sequential #tools for predictive data analysis. We will use the MinMaxScaler to normalize the price data from sklearn.preprocessing import MinMaxScaler #Base class for recurrent layers from keras.layers import LSTM #The regular densely-connected Neural Network layer from keras.layers import Dense

#calender function for obtaining todays date from datetime import date today = date.today() date_today = today.strftime("%Y-%m-%d") date_start = '2010-01-01' stockname = 'S&P500' # Get S&P500 quote df = webreader.DataReader('^GSPC', start=date_start, end=date_today, data_source='yahoo')

Now we have a dataframe that contains the daily price quotes for the S&P500. If we want our model to provide weekly price predictions, we need to change the data so that the input contains weekly price quotes. A simple way to achieve this is to iterate through the rows and only keep every 7th row.

#next we have to change the data structure to a dataframe with weekly price quotes df['index1'] = range(1, len(df) + 1) rownumber = df.shape[0] lst = list(range(rownumber)) list_of_relevant_numbers = lst[0::7] list_of_relevant_numbers wdf = df[df['index1'].isin(list_of_relevant_numbers)] wdf.head(5)

In the next code section, we will define the training data. Because we will use weekly price quotes instead of daily quotes, we will reduce the timeframe to 50 weeks.

#Create a new dataframe with only the Close column data = df.filter(['Close']) #Convert the dataframe to a numpy array npdataset = data.values #Get the number of rows to train the model on 80% of the data training_data_length = math.ceil(len(npdataset) * 0.8) #Transform features by scaling each feature to a range between 0 and 1 mmscaler = MinMaxScaler(feature_range=(0,1)) scaled_data = mmscaler.fit_transform(npdataset) #Create a scaled training data set train_data = scaled_data[0:training_data_length, :] #Split the data into x_train and y_train data sets x_train = [] y_train = [] trainingdatasize = len(train_data) for i in range(50, trainingdatasize): x_train.append(train_data[i-50: i, 0]) #contains 50 values 0-50 y_train.append(train_data[i, 0]) #contains all other values #Convert the x_train and y_train to numpy arrays x_train = np.array(x_train) y_train = np.array(y_train) #Reshape the data x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1)) x_train.shape

** Out:** (245, 50, 1)

## 2 Building a time series prediction model

The first layer of neurons in our neural network needs to fit the number of input values from the data. This means, that we also need to put 50 neurons in place – one neuron for each input price quote.

We use the following input arguments for the model fit:

**x_train:**Vector, matrix, or array of training data. Can also be a list (as in our case) if the model has multiple inputs.**y_train:**Vector, matrix, or array of target data. This is the labeled data, the model tries to predict, so in other words, these are the results to x_train.**epochs:**Integer value that defines how many times the model goes through the training set.**batch size:**Integer value that defines the number of samples that will be propagated through the network. After each propagation the network adjusts the weights of the nodes in each layer.

#Configure the neural network model model = Sequential() #Model with 50 Neurons #inputshape = 50 weekly price quotes model.add(LSTM(50, return_sequences=True, input_shape=(x_train.shape[1], 1))) model.add(LSTM(50)) model.add(Dense(25, activation = 'relu')) model.add(Dense(1)) #Compile the model model.compile(optimizer='adam', loss='mean_squared_error') #Train the model model.fit(x_train, y_train, batch_size=1, epochs=5)

## 3 Validating the model

In the next code section, we will change the number of values in our test data to 50, so that it matches the input size of our training data set. Then we will validate the model, by calculating the mean squarred error and the root mean squared error for our predictions.

#Create the test data set #Create a new array containing test_data = scaled_data[training_data_length - 50: , :] #Create the data sets x_test and y_test x_test = [] y_test = scaled_data[training_data_length:, :] for i in range(50, len(test_data)): x_test.append(test_data[i-50:i, 0]) #Convert the data to a numpy array x_test = np.array(x_test) #Reshape the data x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1)) #Get the models predicted price values predictions = model.predict(x_test) predictions = mmscaler.inverse_transform(predictions) #Get the root mean squarred error (RMSE) rmse = np.sqrt (np.mean(predictions-y_test)**2) round(rmse,1) #Get the root mean squarred error (RMSE) me = np.median(y_test-predictions) round(me,1)

114.9 (RMSE) /Out:Out:155.9 (ME)

## 4 Evaluating model performance

Next, we can plot the data and see how well our model has performed over the training-timeframe.

#Add the difference between the valid and predicted prices train = valid[valid.index > '2018-01-01'] train = data[:training_data_length] valid = data[training_data_length:] valid.insert(1, "Predictions", predictions, True) valid.insert(1, "Difference", valid['Predictions'] - valid['Close'], True) #Visualize the data fig, ax1 = plt.subplots(figsize=(22,10), sharex=True ) ax1.xaxis.set_major_locator(years) #Zoom in to a closer timeframe #valid = valid[valid.index > '2018-01-01'] #train = train[train.index > '2018-01-01'] yt = train[['Close']] xt = train.index yv = valid[['Close', 'Predictions']] xv = valid.index plt.title(stockname + ' - Predictions vs Valid Price', fontsize=18) plt.ylabel(stockname, fontsize=18) plt.plot(yt, color='#039dfc') plt.plot(yv['Predictions'], color = '#F9A048') plt.plot(yv['Close'], color = '#A951DC') plt.legend(['Train', 'Valid', 'Predictions'], loc='upper left') ax1.fill_between(xt, 0, yt['Close'], color = '#b9e1fa') ax1.fill_between(xv, 0, yv['Predictions'], color = '#FFB579') #Create the bar plot with the differences x = valid.index y = valid['Difference'] plt.bar(x, y, width = 5, color='darkblue') plt.grid() plt.show()

The plot shows the prices as they occured in dark blue and during the test-timeframe in light blue. The orange plotline are the predictions.

With the following code we can zoom in to a closer timeframe and take a detailed look at the differences between predictions and valid values:

valid = valid[valid.index > ‘2018-01-01’]

train = train[train.index > ‘2018-01-01’]

On the bottom we can see the differences between predictions and valid data. Positive values mean that the predictions were too optimistic. Negative values mean that the predictions were too pessimistic, and that the true value turned out to be higher than the prediction.

The plot shows that our model performed not so badly when markets are gradually evolving. However, we also see that the model fails badly to predict sharp price drops, such as the recent market crash. In general, the model is often too pessimistic, unless there is a sharp market drop, then the model is too optimistic, as it continues to follow the long-term market trend.

## 5 Making a prediction for the next week

Now we can use the model to predict next week’s price for the S&P500.

#Create a new dataframe new_df = df_prices.filter(['Close']) #Get the last 50 weeks closing price values and convert the dataframe to an array last_50_weeks = new_df[-50:].values {"type":"block","srcIndex":46,"srcClientId":"a9065a5c-62e3-4e44-a81d-7c0fcd7bd61c","srcRootClientId":""} #Scale the data to be values between 0 and 1 last_50_weeks_scaled = mmscaler.transform(last_50_weeks) #Create an empty list X_test = [] #Append past 50 weeks X_test.append(last_50_weeks_scaled) #Convert the X_test data set to a numpy array X_test = np.array(X_test) #Reshape the data X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1)) #Get the predicted scaled price pred_price = model.predict(X_test) #undo the scaling pred_price = mmscaler.inverse_transform(pred_price) #print the predicted price print(pred_price) #this is what the model believes, the price will be on the next day round(pred_price[0,0],1)

**Out:** 2652.3

So for the 9th of April 2020 the model predicts that the S&P500 will close at:

2652.3

Considering that today’s (2nd of April 2020) price is 2528 points, our model expects the S&P to gain roughly 124 points in the coming 7 days. Of course this is by no means financial advice. As we have seen before, our model is often wrong.

## Summary

In this tutorial you have seen how to adjust the prediction interval for a time series prediction model. This was shown using a neural network for predicting the S&P500 stock market. For this purpose we created a Neural Network that predicts the price of the S&P500 one week in advance. We trained and validated the model. Finally we used the model to predict the price for the next week.

As mentioned before, varying the input shape is a quick approach to changing the forecasting time steps. However, it also has a disadvantage. By increasing the length of the time steps, we also reduce the amount of data we can use for training and testing. In our case we still have enough data available. But in other cases, where less data is available, this can become a problem and I would then recommend to use one of the other two approaches.

I hope you enjoyed this post. If you like, you can drop your questions or remarks in the comments.

## Recurrent Neural Networks Part 3: Multi-step time series prediction – relataly.com

[…] my two previous posts on Recurrent Neural Networks – Part I, Part II, I have described a typical signal forecasting problem, which involves estimating the future value […]

## Recurrent Neural Networks Part 1: Stock Market Prediction – relataly.com

[…] Recurrent Neural Networks Part 2: Looking further ahead Deeplearning with Keras, LSTM, Panda, Pandas, Python, recurrent neural network, sklearn, stock market prediction, TensorFlow […]

## Stock Market Prediction Using a Recurrent Neural Network - relataly.com

[…] of the time-step, so that predictions range further into the future. I have covered this topic in a separate post on time series forecasting, in which is show how we can change the prediction […]

## Stock Market Prediction with Multivariate Time Series Models - relataly.com

[…] Stock Market Prediction – Adjusting Time Series Prediction Intervals […]