Feature Engineering for Multivariate Time-Series Prediction

Feature Engineering for Multivariate Time Series Prediction with Python

Multivariate time series forecasting models often do not rely exclusively on historical time series data but use additional features such as moving averages or momentum indicators. The underlying assumption is that multiple variables increase the accuracy of a forecast by helping the model identify patterns in the historical data indicative of future price movements. Creating these variables is called feature engineering. It plays an essential role in stock market forecasting, where it draws on various metrics from chart analysis. In this article, we use the example of stock market forecasting to show how feature engineering works. For this purpose, we create several features (e.g., Bollinger bands, RSI, Moving Averages) and use them for training a recurrent neural network with LSTM layers using Python and Keras.

This article has two parts: The conceptual part briefly introduces metrics from financial analysis such as the RSI and the moving average. Then in the practical part, we develop multivariate time series models for stock market forecasting in Python. The model is a Recurrent Neural Network with LSTM layers based on the Keras library. We engineer a variety of KPIs and use a selection of them to train different model variations. Then, we compare the performance of these models and conclude the influence of the KPIs on the prediction quality.

A multivariate time-series forecast, as we will create it in this article
A multivariate time-series forecast, as we will create it in this article.

New to time series modeling?
Consider starting with the following tutorial on univariate time series models: Stock-market forecasting using Keras Recurrent Neural Networks and Python.

Feature Engineering in Stock Market Forecasting

The training of machine learning algorithms always requires some input data. This input is typically in the form of structured columns, which are the model features. Which features lead to good results depends on the application context and the data used. With the number of features, model complexity and training time increase, but not necessarily performance. Simply adding features won’t do the trick and can even decrease model performance. Instead, the real challenge is finding the right combination of features and creating an input shape that enables the model to detect meaningful patterns.

The process of checking and selecting features is exploratory and characterized by trial and error, which can be very time-consuming, especially in less familiar application areas. However, in some common application domains, we can draw upon established features and don’t have to develop everything from scratch. Stock market forecasting is an excellent example of such an established domain. In this area, many indicators are available from chart analysis, which we can use as features for our model.

Chart analysis aims to forecast future price developments by studying historical prices and trading volume. The underlying assumption is that specific patterns or chart formations in the data can signal the timing of beneficial buying or selling decisions. When we develop predictive machine learning models, the difference to chart analysis is that we do not aim to analyze the chart ourselves manually, but try to create a machine learning model, for example, a recurrent neural network, that does the job for us.

Exemplary line plot with technical indicators (bollinger bands, RSI and Double-EMA): Multivariate Time Series
Exemplary chart with technical indicators (Bollinger bands, RSI, and Double-EMA)

Does this Really Work?

It is essential to point out that the effectiveness of chart analysis and algorithmic trading is controversial. There is at least as much controversy about whether it is possible to predict the price of stock markets with neural networks. Various studies have examined the effectiveness of chart analysis without coming to a clear conclusion. One of the most significant points of criticism is that it cannot take external events into account. Nevertheless, many financial analysts take financial indicators into account when making investment decisions, so a lot of money is moved simply because many people believe in statistical indicators.

So without knowing how well this will work, it is worth an attempt to feed a neural network with different financial indicators. But first and foremost, I see this as an excellent way to show how feature engineering works. Just make sure not to rely on the predictions of these models blindly.

Selected Statistical Indicators

The following indicators are commonly used in chart analysis and may be helpful when creating forecasting models:

Relative Strength Index (RSI)

The Relative Strength Index (RSI) is one of the most commonly used oscillating indicators. It was developed in 1978 by Welles Wilder to determine the momentum of price movements and compares the strength of price losses in a period with price gains. It can take percentage values between 0 and 100.

Reference lines determine how long an existing trend will last before we can expect a trend reversal. In other words, when the price is heavily oversold or overbought, one should expect a trend reversal.

  • With an upward trend, the reference line is at 40% (oversold) and 80% (overbought)
  • With a downtrend, the reference line is at 20% (oversold) and 60% (overbought

The formula for the RSI is as follows:

Calculate the sum of all positive and negative price changes in a period (e.g., 30 days):

The mean value of the sums is then calculated:

Finally, we calculate the RSI with the following formula:

time series feature engineering: formula for the rsi
time series feature engineering: formula for the rsi
time series feature engineering: formula for the rsi

Simple Moving Averages (SMA)

Simple Moving Averages (SMA) is another technical indicator that financial analysts use to determine if an asset price will continue a trend or reverse it. The SMA is calculated as the average of the sum of all values within a certain period. Financial analysts pay close attention to the 200 day SMA (SMA-200). When the price crosses the SMA, this may signal a trend reversal. Further SMAs are often calculated for 50 (SMA-50) and 100 days (SMA-100) periods. In this regard, two popular trading patterns include the death cross and a golden cross.

  • A death cross occurs when the trend line of the SMA-50/100 crosses below the 200-day SMA. This suggests that a falling trend will likely accelerate downwards.
  • A golden cross occurs when the trend line of the SMA-50/100 crosses over the 200-day SMA. This suggests that a rising trend will likely accelerate upwards.

We can use the SMA in the input shape of our model simply by measuring the distance between two trendlines.

Exponential Moving Averages (EMA)

The exponential moving average (EMA) is another lagging trend indicator. Similar to the SMA, it is used to measure the strength of a price trend. The difference between SMA and EMA is that the SMA assigns equal values to all price points, while the EMA uses a multiplier that weights recent prices higher.

The formula for the EMA is as follows: Calculating the EMA for a given data point requires past price values. For example, to calculate the SMA for today, based on 30 past values, we calculate the average of price values for the past 30 days. We then multiply the result by a weighting factor that weighs the EMA. The formula for this multiplier is as follows: Smoothing factor / (1+ days)

It is common to use different smoothing factors. For a 30-day moving average, the multiplier would be [2/(30+1)]= 0.064.

As soon as we have calculated the EMA for the first data point, we can use the following formula to calculate the ema for all subsequent data points: EMA = Closing price x multiplier + EMA (previous day) x (1-multiplier)

Feature Engineering for Time Series Prediction Models in Python

In the following, this tutorial will guide you through the process of implementing a multivariate time series prediction model for the NASDAQ stock market index. You will learn how to implement and use different features to train the model and measure model performance.

Prerequisites

Before we start the coding part, make sure that you have set up your Python 3 environment and required packages. If you don’t have an environment set up yet, you can follow this tutorial to set up the Anaconda environment.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages: 

In addition, we will be using Keras (2.0 or higher) with Tensorflow backend to train the neural network, the machine learning library scikit-learn, and the pandas-DataReader. You can install these packages using the following console commands:

  • pip install <package name>
  • conda install <package name> (if you are using the anaconda packet manager)

Step #1 Load the Data

Let’s start by setting up the imports and loading the data. We will use price data from the NASDAQ composite index (symbol: ^IXIC) from yahoo.finance.com in our Python project.

# Remote data access for pandas
import pandas_datareader as webreader
# Mathematical functions 
import math 
# Fundamental package for scientific computing with Python
import numpy as np 
# Additional functions for analysing and manipulating data
import pandas as pd 
# Date Functions
from datetime import date, timedelta, datetime
# This function adds plotting functions for calender dates
from pandas.plotting import register_matplotlib_converters
# Important package for visualization - we use this to plot the market data
import matplotlib.pyplot as plt 
# Formatting dates
import matplotlib.dates as mdates
# Packages for measuring model performance / errors
from sklearn.metrics import mean_absolute_error, mean_squared_error
# Deep learning library, used for neural networks
from keras.models import Sequential 
# Deep learning classes for recurrent and regular densely-connected layers
from keras.layers import LSTM, Dense, Dropout
# EarlyStopping during model training
from keras.callbacks import EarlyStopping
# This Scaler removes the median and scales the data according to the quantile range to normalize the price data 
from sklearn.preprocessing import RobustScaler

# Setting the timeframe for the data extraction
today = date.today()
date_today = today.strftime("%Y-%m-%d")
date_start = '2010-01-01'

# Getting NASDAQ quotes
stockname = 'NASDAQ'
symbol = '^IXIC'
df = webreader.DataReader(
    symbol, start=date_start, end=date_today, data_source="yahoo"
)

# Quick overview of dataset
train_dfs = df.copy()
train_dfs

Step #2 Explore the Data

Let’s take a quick look at the data by creating line charts for the columns of our data set.

# Plot line charts
df_plot = train_dfs.copy()

list_length = df_plot.shape[1]
ncols = 2
nrows = int(round(list_length / ncols, 0))

fig, ax = plt.subplots(nrows=nrows, ncols=ncols, sharex=True, figsize=(14, 7))
fig.subplots_adjust(hspace=0.5, wspace=0.5)
for i in range(0, list_length):
        ax = plt.subplot(nrows,ncols,i+1)
        sns.lineplot(data = df_plot.iloc[:, i], ax=ax)
        ax.set_title(df_plot.columns[i])
        ax.tick_params(axis="x", rotation=30, labelsize=10, length=0)
        ax.xaxis.set_major_locator(mdates.AutoDateLocator())
fig.tight_layout()
plt.show()
Histograms of the columns in our initial dataset
Line charts of the columns in our initial dataset

Our initial dataset includes six features: High, Low, Open, Close, Volumen, and Adj Close.

Step #3 Feature Engineering

Now comes the exciting part – we will implement additional features. For this, we can use various indicators from chart analysis. For example, there are different averages for different periods and stochastic oscillators to measure the momentum.

# Indexing Batches
train_df = train_dfs.sort_values(by=['Date']).copy()

# We safe a copy of the dates index, before we need to reset it to numbers
date_index = train_df.index
date_index_df = pd.DataFrame(date_index)

# Adding Month and Year in separate columns
d = pd.to_datetime(train_df.index)
train_df['Day'] = d.strftime("%d") 
train_df['Month'] = d.strftime("%m") 
train_df['Year'] = d.strftime("%Y") 

# We reset the index, so we can convert the date-index to a number-index
train_df.reset_index(level=0, inplace=True)
train_df.tail(5)

With the following code, we create a set of indicators for the training data. However, we will make one more restriction in the next step since a model with all these indicators does not achieve good results and would take far too long to train on a local computer.

# Feature Engineering
def createFeatures(df):
    df = pd.DataFrame(df)

    # Moving averages - different periods
    df['MA200'] = df['Close'].rolling(window=200).mean() 
    df['MA100'] = df['Close'].rolling(window=100).mean() 
    df['MA50'] = df['Close'].rolling(window=50).mean() 
    df['MA26'] = df['Close'].rolling(window=26).mean() 
    df['MA20'] = df['Close'].rolling(window=20).mean() 
    df['MA12'] = df['Close'].rolling(window=12).mean() 
    
    # SMA Differences - different periods
    df['DIFF-MA200-MA50'] = df['MA200'] - df['MA50']
    df['DIFF-MA200-MA100'] = df['MA200'] - df['MA100']
    df['DIFF-MA200-CLOSE'] = df['MA200'] - df['Close']
    df['DIFF-MA100-CLOSE'] = df['MA100'] - df['Close']
    df['DIFF-MA50-CLOSE'] = df['MA50'] - df['Close']
    
    # Moving Averages on high, lows, and std - different periods
    df['MA200_low'] = df['Low'].rolling(window=200).min()
    df['MA14_low'] = df['Low'].rolling(window=14).min()
    df['MA200_high'] = df['High'].rolling(window=200).max()
    df['MA14_high'] = df['High'].rolling(window=14).max()
    df['MA20dSTD'] = df['Close'].rolling(window=20).std() 
    
    # Exponential Moving Averages (EMAS) - different periods
    df['EMA12'] = df['Close'].ewm(span=12, adjust=False).mean()
    df['EMA20'] = df['Close'].ewm(span=20, adjust=False).mean()
    df['EMA26'] = df['Close'].ewm(span=26, adjust=False).mean()
    df['EMA100'] = df['Close'].ewm(span=100, adjust=False).mean()
    df['EMA200'] = df['Close'].ewm(span=200, adjust=False).mean()

    # Shifts (one day before and two days before)
    df['close_shift-1'] = df.shift(-1)['Close']
    df['close_shift-2'] = df.shift(-2)['Close']

    # Bollinger Bands
    df['Bollinger_Upper'] = df['MA20'] + (df['MA20dSTD'] * 2)
    df['Bollinger_Lower'] = df['MA20'] - (df['MA20dSTD'] * 2)
    
    # Relative Strength Index (StochRSI)
    df['K-ratio'] = 100*((df['Close'] - df['MA14_low']) / (df['MA14_high'] - df['MA14_low']) )
    df['StochRSI'] = df['K-ratio'].rolling(window=3).mean() 

    # Moving Average Convergence/Divergence (MACD)
    df['MACD'] = df['EMA12'] - df['EMA26']
    
    # Replace nas 
    nareplace = df.at[df.index.max(), 'Close']    
    df.fillna((nareplace), inplace=True)
    
    return df

Now that we have created several features, we are going to limit them. Then we make a plot that shows us, as in a typical chart view, which features are taken into account when training the model.

# List of considered Features
FEATURES = [
#             'High',
#             'Low',
#             'Open',
              'Close',
#             'Volume',
              'Date',
#             'Day',
#             'Month',
#             'Year',
#             'Adj Close',
#             'close_shift-1',
#             'close_shift-2',
#             'MACD',
#             'RSI',
#             'MA200',
#             'MA200_high',
#             'MA200_low',
              'Bollinger_Upper',
              'Bollinger_Lower',
#             'MA100',            
#             'MA50',
#             'MA26',
#             'MA14_low',
#             'MA14_high',
#             'MA12',
#             'EMA20',
#             'EMA100',
#             'EMA200',
#             'DIFF-MA200-MA50',
#             'DIFF-MA200-MA10',
#             'DIFF-MA200-CLOSE',
#             'DIFF-MA100-CLOSE',
#             'DIFF-MA50-CLOSE'
           ]

# Create the dataset with features
data = createFeatures(train_df)

# Shift the timeframe by 10 month
use_start_date = pd.to_datetime("2010-11-01" )
data = data[data['Date'] > use_start_date].copy()

# Filter the data to the list of FEATURES
data_filtered = data[FEATURES]

# We add a prediction column and set dummy values to prepare the data for scaling
data_filtered_ext = data_filtered.copy()
data_filtered_ext['Prediction'] = data_filtered_ext['Close'] 
print(data_filtered_ext.tail().to_string())


# remove Date column before training
dfs = data_filtered_ext.copy()
del dfs[('Date')]
del dfs[('Prediction')]

# Register matplotlib converters
register_matplotlib_converters()

# Define plot parameters 
nrows = dfs.shape[1]
fig, ax = plt.subplots(figsize=(16, 8))
x = data_filtered_ext['Date']
assetname_list = []

# Plot each column
for i in range(nrows):
    assetname = dfs.columns[i-1]
    y = data_filtered_ext[assetname]
    ax.plot(x, y, label=assetname, linewidth=1.0)
    assetname_list.append(assetname)

# Configure and show the plot    
ax.set_title(stockname + ' price chart')
ax.legend()
ax.tick_params(axis="x", rotation=90, labelsize=10, length=0)   
plt.show
NASDAQ Price Chart with upper and lower Bollinger Bands and the simple MA200
NASDAQ Price Chart with upper and lower Bollinger Bands and the simple MA200

Step #4 Scaling and Transforming the Data

Before we can start training our model, we need to scale and transform the data. This step also includes dividing the data into training and test set.

Most of the code used in this section stems from the previous article on multivariate time-series prediction, which covers the steps to transform the data in more detail.

# Calculate the number of rows in the data
nrows = dfs.shape[0]
np_data_unscaled = np.reshape(np.array(dfs), (nrows, -1))
print(np_data_unscaled.shape)

# Transform the data by scaling each feature to a range between 0 and 1
scaler = RobustScaler()
np_data = scaler.fit_transform(np_data_unscaled)

# Creating a separate scaler that works on a single column for scaling predictions
scaler_pred = RobustScaler()
df_Close = pd.DataFrame(data_filtered_ext['Close'])
np_Close_scaled = scaler_pred.fit_transform(df_Close)
Out: (2619, 6)

After we have scaled the data to a range from 0 to 1, we will split the data into a train and test set. x_train and x_test contain the data with our selected features. The two sets y_train and y_test have the actual values, which our model will try to predict.

# Set the sequence length - this is the timeframe used to make a single prediction
sequence_length = 40 # = number of neurons in the first layer of the neural network

# Prediction Index
index_Close = 0

# Split the training data into train and train data sets
# As a first step, we get the number of rows to train the model on 80% of the data 
train_data_len = math.ceil(np_Close_scaled.shape[0] * 0.8)

# Create the training and test data
train_data = np_Close_scaled[0:train_data_len, :]
test_data = np_Close_scaled[train_data_len - sequence_length:, :]

# The RNN needs data with the format of [samples, time steps, features]
# Here, we create N samples, sequence_length time steps per sample, and 6 features
def partition_dataset(sequence_length, data):
    x, y = [], []
    data_len = data.shape[0]
    for i in range(sequence_length, data_len):
        x.append(data[i-sequence_length:i,:]) #contains sequence_length values 0-sequence_length * columsn
        y.append(data[i, index_Close]) #contains the prediction values for validation (3rd column = Close),  for single-step prediction
    
    # Convert the x and y to numpy arrays
    x = np.array(x)
    y = np.array(y)
    return x, y

# Generate training data and test data
x_train, y_train = partition_dataset(sequence_length, train_data)
x_test, y_test = partition_dataset(sequence_length, test_data)

# Print the shapes: the result is: (rows, training_sequence, features) (prediction value, )
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# Validate that the prediction value and the input match up
# The last close price of the second input sample should equal the first prediction value
print(x_train[1][sequence_length-1][index_Close])
print(y_train[0])
Out:
(1914, 30, 1) (1914,)
(486, 30, 1) (486,)

Step #5 Train the Time Series Forecasting Model

Now that we have prepared the data, it’s time to train our time series, forecasting model. For this purpose, we will use a recurrent neural network from the Keras library. The model architecture looks as follows:

  • LSTM layer that receives a mini-batch as input.
  • LSTM layer that has the same number of neurons as the mini-batch
  • Another LSTM layer that does not return the sequence
  • Dense layer with 32 neurons
  • Dense layer with 1 neuron that outputs the forecast

The architecture is not too complex and is suitable for experimenting with different features. If you are wondering how I arrived at this architecture, the answer is that I tried several.

During model training, the neural network processes several mini-batches. The shape of the mini-batch is defined by the number of features and the period chosen. Multiplying these two dimensions (number of features x number of time steps) gives the input shape of our model.

The following code defines the model architecture, trains the model, and then prints the training loss curve:

# Configure the neural network model
model = Sequential()

# Configure the Neural Network Model with n Neurons - inputshape = t Timestamps x f Features
n_neurons = x_train.shape[1] * x_train.shape[2]
print('timesteps: ' + str(x_train.shape[1]) + ',' + ' features:' + str(x_train.shape[2]))
model.add(LSTM(n_neurons, return_sequences=True, input_shape=(x_train.shape[1], x_train.shape[2]))) 
#model.add(Dropout(0.1))
model.add(LSTM(n_neurons, return_sequences=True))
#model.add(Dropout(0.1))
model.add(LSTM(n_neurons, return_sequences=False))
model.add(Dense(32))
model.add(Dense(1, activation='relu'))

# Configure the Model  
optimizer='adam'; loss='mean_squared_error'; epochs = 100; batch_size = 32; patience = 8; learn_rate = 0.06
adam = Adam(lr=learn_rate)
parameter_list = ['epochs ' + str(epochs), 'batch_size ' + str(batch_size), 'patience ' + str(patience), 'optimizer ' + str(optimizer) + ' with learn rate ' + str(learn_rate), 'loss ' + str(loss)]
print('Parameters: ' + str(parameter_list))

# Compile and Training the model
model.compile(optimizer=optimizer, loss=loss)
early_stop = EarlyStopping(monitor='loss', patience=patience, verbose=1)
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, callbacks=[early_stop], shuffle = True,
                  validation_data=(x_test, y_test))

# Plot training & validation loss values
fig, ax = plt.subplots(figsize=(6, 6), sharex=True)
plt.plot(history.history["loss"])
plt.title("Model loss")
plt.ylabel("Loss")
plt.xlabel("Epoch")
ax.xaxis.set_major_locator(plt.MaxNLocator(epochs))
plt.legend(["Train", "Test"], loc="upper left")
plt.grid()
plt.show()
loss curve of our time series prediction model for stock market forecasting
Loss Curve

The loss drops quickly, and the training process looks promising.

Step #6 Evaluate Model Performance

If we test a feature, we also want to know how it impacts the performance of our model. Feature Engineering is therefore closely related to evaluating model performance. So, let’s check the prediction performance. For this purpose, we score the model with the test data set (x_test). Then we can visualize the predictions together with the actual values (y_test) as a plot chart.

# Get the predicted values
y_pred = model.predict(x_test)

# Get the predicted values
pred_unscaled = scaler_pred.inverse_transform(y_pred.reshape(-1, 1))
y_test_unscaled = scaler_pred.inverse_transform(y_test.reshape(-1, 1))

# Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(y_test_unscaled, pred_unscaled)/ y_test_unscaled))) * 100
print('Mean Absolute Percentage Error (MAPE): ' + str(np.round(MAPE, 2)) + ' %')

# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(y_test_unscaled, pred_unscaled)/ y_test_unscaled)) ) * 100
print('Median Absolute Percentage Error (MDAPE): ' + str(np.round(MDAPE, 2)) + ' %')

# Mean Absolute Error (MAE)
print('Mean Absolute Error (MAE): ' + str(np.round(mean_absolute_error(y_test_unscaled, pred_unscaled), 4)))

# Mean Squared Error (MSE)
print('Mean Squared Error (MSE): ' + str(np.round(mean_squared_error(y_test_unscaled, pred_unscaled), 4)))

# The date from which on the date is displayed
display_start_date = pd.Timestamp('today') - timedelta(days=500)

# Add the date column
data_filtered_sub = data_filtered.copy()
# Shift the timeframe by 10 month
date_index = date_index_df[date_index_df['Date'] > use_start_date].copy()
data_filtered_sub['Date'] = date_index

# Add the difference between the valid and predicted prices
train = data_filtered_sub[:train_data_len + 1]
valid = data_filtered_sub[train_data_len:]
valid.insert(1, "Prediction", pred_unscaled.ravel(), True)
valid.insert(1, "Percentage Deviation", (valid["Prediction"] - valid["Close"]) * 100 / valid["Close"], True)

# Zoom in to a closer timeframe
valid = valid[valid['Date'] > display_start_date]
train = train[train['Date'] > display_start_date]

# Visualize the data
fig, ax1 = plt.subplots(figsize=(22, 10), sharex=True)
xt = train['Date']; yt = train[["Close"]]
xv = valid['Date']; yv = valid[["Close", "Prediction"]]
plt.title("Predictions vs Actual Values", fontsize=24)
plt.ylabel(stockname, fontsize=18)
plt.plot(xt, yt, color="#039dfc", linewidth=2.0)
plt.plot(xv, yv["Prediction"], color="#E91D9E", linewidth=1.0)
plt.plot(xv, yv["Close"], color="black", linewidth=1.0)
plt.legend(["Train", "Test Predictions", "Actual Values"], loc="upper left")

# # Create the bar plot with the differences
x = valid['Date']
y = valid["Percentage Deviation"]

# Create custom color range for positive and negative differences
valid.loc[y >= 0, 'diff_color'] = "#2BC97A"
valid.loc[y < 0, 'diff_color'] = "#C92B2B"

#Configure Axis 1
ax1.set_ylim([0,max(valid["Close"])])

#Configure Axis 2
ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis
ax2.set_ylabel('Prediction Error in %', color='tab:blue', fontsize=20)  # we already handled the x-label with ax1
ax2.tick_params(axis='y')
ax2.set_yticks(np.arange(-50, 50, 5.0))
ax2.set_ylim([-50,50])
plt.bar(x, y, width=0.5, color=valid['diff_color'])
#ax1.xaxis.set_major_locator(plt.MaxNLocator(x.count()))

ax1.annotate('features: ' + str(assetname_list) + '--- performance: MAPE: ' + str(MAPE) + ', MDAPE: ' + str(MDAPE), xy=(0.07, .04), xycoords='figure fraction', 
    horizontalalignment='left', verticalalignment='bottom', fontsize=11)
ax1.annotate('hyperparameters: ' + str(parameter_list), xy=(0.07, .005), xycoords='figure fraction', 
    horizontalalignment='left', verticalalignment='bottom', fontsize=11)

#Plot the chart
ax1.xaxis.grid()
plt.show()
Mean Absolute Percentage Error (MAPE): 1.75 % 
Median Absolute Percentage Error (MDAPE): 1.08 % 
Mean Absolute Error (MAE): 138.8632 
Mean Squared Error (MSE): 40948.7719

On average, the predictions of our model deviate from the actual values by about one percent. Although one percent may not sound like a lot, the prediction errors can quickly accumulate to larger values over time.

Step #7 Overview of Selected Models

In the course of writing this article, I tested a variety of models based on different features. The neural network architecture remained unchanged. Likewise, except for the learning rate, I kept the hyperparameters the same. The results of these different model variants:

performance of different variations of the multivariate keras neural network model for stock market forecasting

Below, you find a detailed view of the prediction results:

Step #8 Conclusions

It isn’t easy to estimate in advance which combination of indicators will lead to good results. So there is no way around testing different variants. More indicators do not necessarily lead to better results because they increase the model complexity and add data without predictive power. This so-called noise makes it harder for the model to separate important influencing factors from less important ones. Also, each additional indicator increases the time needed to train the model.

Besides the feature, various hyperparameters such as the learning rate, optimizer, batch size, and the selected time frame of the data (sequence_length) impact the model’s performance. Tuning these hyperparameters can further improve model performance.

  • From the tested configurations, a learning rate of 0.05 achieves the best results.
  • Of all features, only the Bollinger bands had a positive effect on the model performance.
  • As expected, the performance tends to decrease with the number of features.
  • In our case, the hyperparameters seem to affect the performance of the models more than the choice of features.

Finally, we have optimized only a single parameter. We searched for optimal learning rates and left all other parameters such as the optimizer, the neural network architecture, or the sequence_length unchanged. Based on the results, we can draw several conclusions:

There is plenty of room for improvement and experimentation. With more time for experiments and computational power, it will undoubtedly be possible to identify features and model configurations better. So, have fun experimenting! 🙂

Summary

This article has demonstrated feature engineering for multivariate time series model at the example of stock market forecasting. We have developed various features known from chart analysis, such as the RSI, moving averages, Bollinger bands, etc. We have experimented with selections of these features to train different variants of a Keras recurrent neural network. Finally, we compared the performance of the model variations.

In general, we need to choose the features as sparingly as possible. Beyond that, however, the statements about our model cannot be generalized. Which features help the model to recognize patterns depends on the time series data. If you have understood the essential steps, you are well prepared to apply feature engineering to any other multivariate time series forecasting problem.

I hope you found this article helpful. If you have any remaining thoughts or questions, let me know in the comments.

Author

  • Hi, I am Florian, a Zurich-based consultant for AI and Data. Since the completion of my Ph.D. in 2017, I have been working on the design and implementation of ML use cases in the Swiss financial sector. I started this blog in 2020 with the goal in mind to share my experiences and create a place where you can find key concepts of machine learning and materials that will allow you to kick-start your own Python projects.

Leave a Reply