Feature Engineering for Multivariate Time Series Prediction with Python

Multivariate time series predictions and especially stock market forecasts pose challenging machine learning problems. Unlike univariate forecasting models, multivariate models do not rely exclusively on historical time series data, but use additional functions that are often developed from the time series data itself. The underlying assumption is that additional indicators increase prediction accuracy by helping the model identify patterns in the historic data that indicate future price movements. This tutorial will demonstrate how feature engineering for multivariate neural network models works, using the example of stock market forecasting.

If you are new to time series forecasting, I recommend you to first read the following articles:

Feature Engineering in Stock Market Forecasting

The training of machine learning algorithms always requires some input data. This input is typically in the form of structured columns, which are the model features. Which features lead to good results depends on the application context and the data used. With the number of features, model complexity and training time increase, but not necessarily performance. This means, simply adding features, won’t do the trick and can even decrease model performance. Instead, the real challenge is about finding the right combination of features and to create an input shape which enables the model to detect meaningful patterns.

The process of checking and selecting features is exploratory and characterized by trial and error, which can be very time-consuming, especially in less familiar application areas. However, there are some common application domains where we can draw upon established features and don’t have to develop everything from scratch. Stock market forecasting is a good example for such an established domain, because in this area a large number of indicators are available from chart analysis, which we can use as features for our model.

Chart analysis aims to forecast future price developments by studying historic prices and trading volume. The underlying assumption is that there are certain patterns or chart formations in the data that can signal the timing of advantageous buying or selling decisions. When we develop predictive machine learning models, the difference to chart analysis is that we do not aim to manually analyse the chart ourselves, but try to develop a machine learning model, for example a recurrent neural network, that does the job for us.

Exemplary chart with technical indicators (bollinger bands, RSI and Double-EMA)
Exemplary chart with technical indicators (bollinger bands, RSI and Double-EMA)

Does this Really Work?

It is important to point out that the effectiveness of chart analysis and algorithmic trading is controversial and there is at least as much controversy about whether it is possible to predict the price of stock markets with neural networks. Various studies have examined the effectiveness of chart analysis without coming to a clear conclusion. One of the biggest points for criticism is that it cannot take external events into account. Nevertheless, many financial analysts take financial indicators into account when making investment decisions, which is why a lot of money is moved simply because many people believe in statistical indicators.

So without knowing how well this will work, it is definitely worth an attempt to feed a neural network with different financial indicators. But first and foremost I see this as a good way to show how the process of feature engineering works. Just make sure to not blindly rely on the predictions of these models.

Selected Statistical Indicators

Chart analysis knows a multitude of statistical indicators that are often used in combination. In the following, I will briefly introduce some indicators that are commonly used in chart analysis and that we will later use in the development of our prediction model.

Relative Strength Index (RSI)

The Relative Strength Index (RSI) is one of the most commonly used oscillating indicators. It was developed 1978 by Welles Wilder to determine the momentum of price movements and compares the strength of price losses in a period with price gains. It can take percentage values between 0 and 100.

Reference lines are used to determine how long an existing trend will last before a trend reversal is expected. In other words, when the price is heavily oversold or overbought, one should expect a trend reversal.

  • With an upward trend the reference line is at 40% (oversold) and 80% (overbought)
  • With an downtrend the reference line is at 20% (oversold) and 60% (overbought

The RSI is calculated as follows:

Calculate the sum of all positive and negative price changes in a period (e.g., 30 days):

The mean value of the sums is then calculated:

Finally, the RSI is calculated with the following formula:

Simple Moving Averages (SMA)

Simple Moving Averages (SMA) is another technical indicator that aims to determine if an asset price will continue a trend or reverse it. As the name says, the SMA is calculated as the average of the sum of all values within a certain period. Financial analysts pay close attention to the 200 day SMA (SMA-200). When the price crosses the SMA, this may signal a trend reversal. Further SMAs are often calculated for 50 (SMA-50) and 100 day (SMA-100) periods. In this regard, two popular trading patterns include the death cross and a golden cross.

  • A death cross occurs when the trend line of the SMA-50/100 crosses below the 200-day SMA. This suggests that a falling trend will likely accelerate downwards.
  • A golden cross occurs when the trend line of the SMA-50/100 crosses over the 200-day SMA. This suggests that a rising trend will likely accelerate upwards.

We can use the SMA in the input shape of our model simple by measuring the distance between two trendlines.

Exponential Moving Averages (EMA)

The exponential moving average (EMA) is another lagging trend indicator. Similar to the SMA it is used to measure the strength of a price trend. The difference between SMA and EMA is that the SMA assign equal values to all price points, while the EMA uses a multiplier that weights recent prices higher.

The EMA is calculated as follows: Calculating the EMA for a given data point requires past price values. For example, to calculate the SMA for today, based on 30 past values, we calculate the average of price values for the past 30 days. The result is then multiplied by a weighing factor that weighs the EMA. The formula for this multiplier is as follows: Smoothing factor / (1+ days)

Different smoothing factors can be defined. However, common is a smoothing factor of 2. So, for a 30-day moving average, the multiplier would be [2/(30+1)]= 0.064.

As soon as we have calculated the EMA for the first datapoint, we can use the following formula to calculate the ema for all subsequent data points: EMA = Closing price x multiplier + EMA (previous day) x (1-multiplier)

Python implementation

In the following, this tutorial will guide you through the process of implementing a multivariate time series prediction model for the NASDAQ stock market index. You will learn how to implement and use different features to train the model and measure model performance.

The example covers the following steps:

  1. Imports & Loading the Data
  2. Explore the Data
  3. Feature Engineering
  4. Scaling and Transforming the Data
  5. Train a Neural Network
  6. Evaluate Model Performance
  7. Overview of Selected Models
  8. Conclusions

Warning: Stock markets can be highly volatile and predicting price movements is an extremely difficult task. The prediction model developed in this post only serves to illustrate a use case for time series models. However, you should not take any price predictions for granted.

Python Environment

This tutorial assumes that you have setup your python environment. I recommend using the Anaconda environment. If you have not yet set the environment up, check out this tutorial. It is also assumed that you have the following packages installed: keras (2.0 or higher) with Tensorflow backend, numpy, pandas, matplot, sklearn. The packages can be installed using the console command:

pip install <packagename> or when you are using the anaconda environment, conda install <packagename>

1) Imports & Loading the Data

Let’s start by setting up the imports and loading the data. We will use price data from the NASDAQ composite index (symbol: ^IXIC) from yahoo.finance.com into our Python project.

# Remote data access for pandas
import pandas_datareader as webreader
# Mathematical functions 
import math 
# Fundamental package for scientific computing with Python
import numpy as np 
# Additional functions for analysing and manipulating data
import pandas as pd 
# Date Functions
from datetime import date, timedelta, datetime
# This function adds plotting functions for calender dates
from pandas.plotting import register_matplotlib_converters
# Important package for visualization - we use this to plot the market data
import matplotlib.pyplot as plt 
# Formatting dates
import matplotlib.dates as mdates
# Packages for measuring model performance / errors
from sklearn.metrics import mean_absolute_error, mean_squared_error
# Deep learning library, used for neural networks
from keras.models import Sequential 
# Deep learning classes for recurrent and regular densely-connected layers
from keras.layers import LSTM, Dense, Dropout
# EarlyStopping during model training
from keras.callbacks import EarlyStopping
# This Scaler removes the median and scales the data according to the quantile range to normalize the price data 
from sklearn.preprocessing import RobustScaler

# Setting the timeframe for the data extraction
today = date.today()
date_today = today.strftime("%Y-%m-%d")
date_start = '2010-01-01'

# Getting NASDAQ quotes
stockname = 'NASDAQ'
symbol = '^IXIC'
df = webreader.DataReader(
    symbol, start=date_start, end=date_today, data_source="yahoo"
)

# Quick overview of dataset
train_dfs = df.copy()
train_dfs

2) Explore the Data

Let’s take a quick look at the data by creating histograms for the columns of our data set.

# Plot each column
register_matplotlib_converters()
nrows = 3
ncols = int(round(train_dfs.shape[1] / nrows, 0))
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, sharex=True, figsize=(16, 7))
fig.subplots_adjust(hspace=0.3, wspace=0.3)
x = train_dfs.index
columns = train_dfs.columns
f = 0
for i in range(nrows):
    for j in range(ncols):
        ax[i, j].xaxis.set_major_locator(mdates.YearLocator())
        assetname = columns[f]
        y = train_dfs[assetname]
        f += 1
        ax[i, j].plot(x, y, color='#039dfc', label=stockname, linewidth=1.0)
        ax[i, j].set_title(assetname)
        ax[i, j].tick_params(axis="x", rotation=90, labelsize=10, length=0)   
#plt.show()
Histograms of the columns in our initial dataset

We see that we have already initial features included in the initial dataset. This looks good, so we will proceed.

3) Feature Engineering

Now comes the interesting part – we will implement the features on which we will then train our prediction model. Various indicators from the chart analysis are available, also of which can be calculated using the initial stock market dataset. For example, as described at the beginning, there are different averages for different periods, as well as stochastic oscillators to measure the momentum.

# Indexing Batches
train_df = train_dfs.sort_values(by=['Date']).copy()

# We safe a copy of the dates index, before we need to reset it to numbers
date_index = train_df.index
date_index_df = pd.DataFrame(date_index)

# Adding Month and Year in separate columns
d = pd.to_datetime(train_df.index)
train_df['Day'] = d.strftime("%d") 
train_df['Month'] = d.strftime("%m") 
train_df['Year'] = d.strftime("%Y") 

# We reset the index, so we can convert the date-index to a number-index
train_df.reset_index(level=0, inplace=True)
train_df.tail(5)

Which combinations of indicators lead to good results can hardly be estimated in advance. So we have to test different variants. It is true that more indicators are not necessarily better, because more indicators make it more difficult for the Modell to separate important influencing factors from less important ones. Moreover, each indicator we add to the model also increases the complexity of the model. Thus, with each additional indicator, the time required to train the model increases.

Not to forget, of course, that the hyperparameters such as, for example, learn rate, optimizer, batch size, as well as the selected time frame of the data (sequence_length) have a great influence on the performance of the model. There is therefore a lot of scope for tuning and improving the model.

With the following code we create a set of indicators for the training data. In the following step, however, we will make one more restriction, since a model with all these indicators does not achieve good results and would take far too long to train on a local computer.

# Feature Engineering
def createFeatures(df):
    df = pd.DataFrame(df)

    # Moving averages - different periods
    df['MA200'] = df['Close'].rolling(window=200).mean() 
    df['MA100'] = df['Close'].rolling(window=100).mean() 
    df['MA50'] = df['Close'].rolling(window=50).mean() 
    df['MA26'] = df['Close'].rolling(window=26).mean() 
    df['MA20'] = df['Close'].rolling(window=20).mean() 
    df['MA12'] = df['Close'].rolling(window=12).mean() 
    
    # SMA Differences - different periods
    df['DIFF-MA200-MA50'] = df['MA200'] - df['MA50']
    df['DIFF-MA200-MA100'] = df['MA200'] - df['MA100']
    df['DIFF-MA200-CLOSE'] = df['MA200'] - df['Close']
    df['DIFF-MA100-CLOSE'] = df['MA100'] - df['Close']
    df['DIFF-MA50-CLOSE'] = df['MA50'] - df['Close']
    
    # Moving Averages on high, lows, and std - different periods
    df['MA200_low'] = df['Low'].rolling(window=200).min()
    df['MA14_low'] = df['Low'].rolling(window=14).min()
    df['MA200_high'] = df['High'].rolling(window=200).max()
    df['MA14_high'] = df['High'].rolling(window=14).max()
    df['MA20dSTD'] = df['Close'].rolling(window=20).std() 
    
    # Exponential Moving Averages (EMAS) - different periods
    df['EMA12'] = df['Close'].ewm(span=12, adjust=False).mean()
    df['EMA20'] = df['Close'].ewm(span=20, adjust=False).mean()
    df['EMA26'] = df['Close'].ewm(span=26, adjust=False).mean()
    df['EMA100'] = df['Close'].ewm(span=100, adjust=False).mean()
    df['EMA200'] = df['Close'].ewm(span=200, adjust=False).mean()

    # Shifts (one day before and two days before)
    df['close_shift-1'] = df.shift(-1)['Close']
    df['close_shift-2'] = df.shift(-2)['Close']

    # Bollinger Bands
    df['Bollinger_Upper'] = df['MA20'] + (df['MA20dSTD'] * 2)
    df['Bollinger_Lower'] = df['MA20'] - (df['MA20dSTD'] * 2)
    
    # Relative Strength Index (StochRSI)
    df['K-ratio'] = 100*((df['Close'] - df['MA14_low']) / (df['MA14_high'] - df['MA14_low']) )
    df['StochRSI'] = df['K-ratio'].rolling(window=3).mean() 

    # Moving Average Convergence/Divergence (MACD)
    df['MACD'] = df['EMA12'] - df['EMA26']
    
    # Replace nas 
    nareplace = df.at[df.index.max(), 'Close']    
    df.fillna((nareplace), inplace=True)
    
    return df

Now that we have created a number of features, we are going to limit them. Then we create a plot that shows us, as in a typical chart view, which features are actually taken into account when training the model.

# List of considered Features
FEATURES = [
#             'High',
#             'Low',
#             'Open',
              'Close',
#             'Volume',
              'Date',
#             'Day',
#             'Month',
#             'Year',
#             'Adj Close',
#             'close_shift-1',
#             'close_shift-2',
#             'MACD',
#             'RSI',
#             'MA200',
#             'MA200_high',
#             'MA200_low',
              'Bollinger_Upper',
              'Bollinger_Lower',
#             'MA100',            
#             'MA50',
#             'MA26',
#             'MA14_low',
#             'MA14_high',
#             'MA12',
#             'EMA20',
#             'EMA100',
#             'EMA200',
#             'DIFF-MA200-MA50',
#             'DIFF-MA200-MA10',
#             'DIFF-MA200-CLOSE',
#             'DIFF-MA100-CLOSE',
#             'DIFF-MA50-CLOSE'
           ]

# Create the dataset with features
data = createFeatures(train_df)

# Shift the timeframe by 10 month
use_start_date = pd.to_datetime("2010-11-01" )
data = data[data['Date'] > use_start_date].copy()

# Filter the data to the list of FEATURES
data_filtered = data[FEATURES]

# We add a prediction column and set dummy values to prepare the data for scaling
data_filtered_ext = data_filtered.copy()
data_filtered_ext['Prediction'] = data_filtered_ext['Close'] 
print(data_filtered_ext.tail().to_string())


# remove Date column before training
dfs = data_filtered_ext.copy()
del dfs[('Date')]
del dfs[('Prediction')]

# Register matplotlib converters
register_matplotlib_converters()

# Define plot parameters 
nrows = dfs.shape[1]
fig, ax = plt.subplots(figsize=(16, 8))
x = data_filtered_ext['Date']
assetname_list = []

# Plot each column
for i in range(nrows):
    assetname = dfs.columns[i-1]
    y = data_filtered_ext[assetname]
    ax.plot(x, y, label=assetname, linewidth=1.0)
    assetname_list.append(assetname)

# Configure and show the plot    
ax.set_title(stockname + ' price chart')
ax.legend()
ax.tick_params(axis="x", rotation=90, labelsize=10, length=0)   
plt.show
NASDAQ Price Chart with upper and lower Bollinger Bands and the simple MA200
NASDAQ Price Chart with upper and lower Bollinger Bands and the simple MA200

4) Scaling and Transforming the Data

Before we can start training our model, we need to scale and transform the data. This step also includes dividing the data into training and test set.

Most of the code used in this section is taken from my previous article on multivariate time-series prediction, where you can find more information about the different steps of transforming the data.

# Calculate the number of rows in the data
nrows = dfs.shape[0]
np_data_unscaled = np.reshape(np.array(dfs), (nrows, -1))
print(np_data_unscaled.shape)

# Transform the data by scaling each feature to a range between 0 and 1
scaler = RobustScaler()
np_data = scaler.fit_transform(np_data_unscaled)

# Creating a separate scaler that works on a single column for scaling predictions
scaler_pred = RobustScaler()
df_Close = pd.DataFrame(data_filtered_ext['Close'])
np_Close_scaled = scaler_pred.fit_transform(df_Close)
Out: (2619, 6)

After we have scaled the data to a range from 0 to 1, we will split the data into train and test set. x_train and x_test contain the data with our selected features. The two sets y_train and y_test contain the actual values, which our model will try to predict.

# Define Sequence Length
sequence_length = 40

# Get the number of rows to train the model on 80% of the data 
train_data_len = math.ceil(np_data.shape[0] * 0.8) 

# Create the training data with 80% of the full dataset
train_data = np_data[0:train_data_len, :]
x_train, y_train = [], []

# The RNN needs data with the format of [samples, time steps, features].
# Here, we create N samples, sequence_length time steps per sample, and z features
for i in range(sequence_length, train_data_len):
    x_train.append(train_data[i-sequence_length:i,:]) #contains n values 0-sequence_length * columns
    y_train.append(train_data[i, 0]) #contains the prediction values for validation

# Convert the x_train and y_train to numpy arrays
x_train, y_train = np.array(x_train), np.array(y_train)

# Create the test data with the remaining 20% of the dataset
test_data = np_data[train_data_len - sequence_length:, :]

# Split the test data into x_test and y_test
x_test, y_test = [], []
test_data_len = test_data.shape[0]
for i in range(sequence_length, test_data_len):
    x_test.append(test_data[i-sequence_length:i,:]) #contains n values 0-sequence_length * columns
    y_test.append(test_data[i, 0]) #contains the prediction values for validation
    
# Convert the x_train and y_train to numpy arrays
x_test, y_test = np.array(x_test), np.array(y_test)

# Convert the x_train and y_train to numpy arrays
x_test = np.array(x_test); y_test = np.array(y_test)

# Print Shapes
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)
Out:
(1914, 30, 1) (1914,)
(486, 30, 1) (486,)

5) Train the Prediction Model

Next, it’s time to train our prediction model! Our prediction model will use a neural network with three LSTM layers and two dense layers. This shape is not too complex and well suited to experiment with different features. The input shape of our model spans of the multivariate matrix structure. It is defined by the number of features multiplied by the period of time.

The following code will define the model, train it and then print the training loss curve.

# Configure the neural network model
model = Sequential()

# Configure the Neural Network Model with n Neurons - inputshape = t Timestamps x f Features
n_neurons = x_train.shape[1] * x_train.shape[2]
print('timesteps: ' + str(x_train.shape[1]) + ',' + ' features:' + str(x_train.shape[2]))
model.add(LSTM(n_neurons, return_sequences=True, input_shape=(x_train.shape[1], x_train.shape[2]))) 
#model.add(Dropout(0.1))
model.add(LSTM(n_neurons, return_sequences=True))
#model.add(Dropout(0.1))
model.add(LSTM(n_neurons, return_sequences=False))
model.add(Dense(32))
model.add(Dense(1, activation='relu'))

# Configure the Model  
optimizer='adam'; loss='mean_squared_error'; epochs = 100; batch_size = 32; patience = 8; learn_rate = 0.06
adam = Adam(lr=learn_rate)
parameter_list = ['epochs ' + str(epochs), 'batch_size ' + str(batch_size), 'patience ' + str(patience), 'optimizer ' + str(optimizer) + ' with learn rate ' + str(learn_rate), 'loss ' + str(loss)]
print('Parameters: ' + str(parameter_list))

# Compile and Training the model
model.compile(optimizer=optimizer, loss=loss)
early_stop = EarlyStopping(monitor='loss', patience=patience, verbose=1)
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, callbacks=[early_stop], shuffle = True,
                  validation_data=(x_test, y_test))

# Plot training & validation loss values
fig, ax = plt.subplots(figsize=(6, 6), sharex=True)
plt.plot(history.history["loss"])
plt.title("Model loss")
plt.ylabel("Loss")
plt.xlabel("Epoch")
ax.xaxis.set_major_locator(plt.MaxNLocator(epochs))
plt.legend(["Train", "Test"], loc="upper left")
plt.grid()
plt.show()
Loss Curve

Loss drops quickly and the training process looks promising.

6) Evaluate Model Performance

f we test a feature we also want to know how it impacts the performance of our model. Feature Engineering is therefore closely related to evaluating model performance. So, let’s take a look at the performance of our prediction model. For this purpose we score the model with the test data set (x_test). Then we can visualize the predictions together with the actual values (y_test) as a plot chart.

# Get the predicted values
y_pred = model.predict(x_test)

# Get the predicted values
pred_unscaled = scaler_pred.inverse_transform(y_pred.reshape(-1, 1))
y_test_unscaled = scaler_pred.inverse_transform(y_test.reshape(-1, 1))

# Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(y_test_unscaled, pred_unscaled)/ y_test_unscaled))) * 100
print('Mean Absolute Percentage Error (MAPE): ' + str(np.round(MAPE, 2)) + ' %')

# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(y_test_unscaled, pred_unscaled)/ y_test_unscaled)) ) * 100
print('Median Absolute Percentage Error (MDAPE): ' + str(np.round(MDAPE, 2)) + ' %')

# Mean Absolute Error (MAE)
print('Mean Absolute Error (MAE): ' + str(np.round(mean_absolute_error(y_test_unscaled, pred_unscaled), 4)))

# Mean Squared Error (MSE)
print('Mean Squared Error (MSE): ' + str(np.round(mean_squared_error(y_test_unscaled, pred_unscaled), 4)))

# The date from which on the date is displayed
display_start_date = pd.Timestamp('today') - timedelta(days=500)

# Add the date column
data_filtered_sub = data_filtered.copy()
# Shift the timeframe by 10 month
date_index = date_index_df[date_index_df['Date'] > use_start_date].copy()
data_filtered_sub['Date'] = date_index

# Add the difference between the valid and predicted prices
train = data_filtered_sub[:train_data_len + 1]
valid = data_filtered_sub[train_data_len:]
valid.insert(1, "Prediction", pred_unscaled.ravel(), True)
valid.insert(1, "Percentage Deviation", (valid["Prediction"] - valid["Close"]) * 100 / valid["Close"], True)

# Zoom in to a closer timeframe
valid = valid[valid['Date'] > display_start_date]
train = train[train['Date'] > display_start_date]

# Visualize the data
fig, ax1 = plt.subplots(figsize=(22, 10), sharex=True)
xt = train['Date']; yt = train[["Close"]]
xv = valid['Date']; yv = valid[["Close", "Prediction"]]
plt.title("Predictions vs Actual Values", fontsize=24)
plt.ylabel(stockname, fontsize=18)
plt.plot(xt, yt, color="#039dfc", linewidth=2.0)
plt.plot(xv, yv["Prediction"], color="#E91D9E", linewidth=1.0)
plt.plot(xv, yv["Close"], color="black", linewidth=1.0)
plt.legend(["Train", "Test Predictions", "Actual Values"], loc="upper left")

# # Create the bar plot with the differences
x = valid['Date']
y = valid["Percentage Deviation"]

# Create custom color range for positive and negative differences
valid.loc[y >= 0, 'diff_color'] = "#2BC97A"
valid.loc[y < 0, 'diff_color'] = "#C92B2B"

#Configure Axis 1
ax1.set_ylim([0,max(valid["Close"])])

#Configure Axis 2
ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis
ax2.set_ylabel('Prediction Error in %', color='tab:blue', fontsize=20)  # we already handled the x-label with ax1
ax2.tick_params(axis='y')
ax2.set_yticks(np.arange(-50, 50, 5.0))
ax2.set_ylim([-50,50])
plt.bar(x, y, width=0.5, color=valid['diff_color'])
#ax1.xaxis.set_major_locator(plt.MaxNLocator(x.count()))

ax1.annotate('features: ' + str(assetname_list) + '--- performance: MAPE: ' + str(MAPE) + ', MDAPE: ' + str(MDAPE), xy=(0.07, .04), xycoords='figure fraction', 
    horizontalalignment='left', verticalalignment='bottom', fontsize=11)
ax1.annotate('hyperparameters: ' + str(parameter_list), xy=(0.07, .005), xycoords='figure fraction', 
    horizontalalignment='left', verticalalignment='bottom', fontsize=11)

#Plot the chart
ax1.xaxis.grid()
plt.show()
Mean Absolute Percentage Error (MAPE): 1.75 % 
Median Absolute Percentage Error (MDAPE): 1.08 % 
Mean Absolute Error (MAE): 138.8632 
Mean Squared Error (MSE): 40948.7719

On average, the predictions of our model deviate from the actual values by about one percent. Well, only one percent? This may not sound like a big problem at first. But imagine you would base your daily trading actions on these predictions, then an error of one percent can quickly lead to you losing money all the time.

7) Overview of Selected Models

While creating this tutorial I have tested a variety of models based on different features. The configuration of the hyper parameters and the architecture of the neural network were the same as before in the tutorial, except for the learning rate for which I tested different values. The results of these different model configurations:

#LearnrateFeaturesCloseBollinger_UpperBollinger_LowerRSIDiff-MA50-CLOSEDiff-MA100-CLOSEDiff-MA200-CLOSEVolumeMACDMA12MA14_lowMA14_highMAPEMDAPE
10,0503XXX1,47%0,91%
20,0504XXXX1,86%1,17%
40,0504XXXX2,00%1,28%
50,0103XXX2,39%1,36%
60,0513XXX1,81%1,37%
70,0501X1,76%1,47%
80,0021X1,98%1,60%
90,0109XXXXXXXXX3,93%1,72%
100,0104XXXX2,41%1,87%
30,0503XXX2,35%2,00%
110,0014XXXX2,54%2,05%
120,0019XXXXXXXXX3,24%2,18%
130,0505XXXXX3,37%2,20%
140,0603XXX2,58%2,33%
150,0206XXXXXX2,70%2,58%
160,0017XXXXXXX3,15%2,91%
170,0493XXX3,71%3,60%
Overview of different model configurations and performance

Below, you find a detailed view of the prediction results:

8) Conclusions

The results allow various conclusions to be drawn:

  • From the tested configurations, a learn rate of 0.05 achieves the best results.
  • Of all features, only the bollinger bands have a positive effect on the model performance.
  • As expected, the performance tends to decrease with the number of features.
  • In general, the performance of the models seems to be influenced more by the hyperparameters than by the choice of features.

Finally, one should also keep in mind that the results are based on configurations where only adjusted the learning rates was adjusted, but all other parameters like the optimiser, the architecture of the neural network or the sequence_length remained the same. So, there is plenty of room for improvement and experimentation. With more time for experiments and computational power, it will certainly be possible to identify better combinations of features and model configuration. So, have fun experimenting! 🙂

Summary

In this tutorial you have learned how to perform feature engineering for stock market prediction. We have developed and different features to train or model, before we have measured and illustrated the performance of our models. Finally, the tutorial gave an overview of how the prediction performance of different models compares against each other.

If you have understood the essential steps, then you are well prepared to apply this knowledge to any other multivariate time series prediction problem.

Follow Florian Müller:

Data Scientist & Machine Learning Consultant

Hi, my name is Florian! I am a Zurich-based Data Scientist with a passion for Artificial Intelligence and Machine Learning. After completing my PhD in Business Informatics at the University of Bremen, I started working as a Machine Learning Consultant for the swiss consulting firm ipt. When I'm not working on use cases for our clients, I work on own analytics projects and report on them in this blog.

2 Responses

  1. Shawn P

    Do the scikit learn scalers use forward information? For example, scaling data from Jan 12 using a maximum that occurred afte Jan 12? Should you only use rolling scalers?

    • Florian Müller

      Hi Shawn, in general the Scikit-learn scalers use all the data we provide them. So if you have large outliers in your data, it is a good idea to treat them before scaling.

Leave a Reply

Your email address will not be published. Required fields are marked *