Tensorflow Archives - relataly.com

Image Classification with Convolutional Neural Networks – Classifying Cats and Dogs in Python

Florian Follonier — Sun, 13 Dec 2020 14:09:31 +0000

This tutorial shows how to use Convolutional Neural Networks (CNNs) with Python for image classification. CNNs belong to the field of deep learning, a subarea of machine learning, and have become a cornerstone to many exciting innovations. There are endless applications, from self-driving cars over biometric security to automated tagging in social media. And the importance of CNNs grows steadily! So there are plenty of reasons to understand how this technology works and how we can implement it.

This article proceeds as follows: The first part introduces the core concepts behind CNNs and explains their use in image classification. The second part is a hands-on tutorial in which you will build your own CNN to distinguish images of cats and dogs. This tutorial develops a model that achieves around 82% validation accuracy. We will work with TensorFlow and Python to integrate different layers, such as Convolution Layers, Dense layers, and MaxPooling. Furthermore, we will prevent the network from overfitting the training data by using Dropout between the layers. We will also load the model and make predictions on a fresh set of images. Finally, we analyze and illustrate the performance of our image classifier.

Also: Generating Detailed Images with OpenAI DALL-E and ChatGPT in Python: A Step-By-Step API Tutorial

Image Classification with Convolutional Neural Networks

The history of image recognition dates back to the mid-1960s when the first attempts were made to identify objects by coding their characteristic shapes and lines. However, this task turned out to be incredibly complex. Our human brain is trained so well to recognize things that one can easily forget how diverse the observation conditions can be. Here are some examples:

Fotos can be taken from various viewpoints
Living things can have multiple forms and poses
Objects come in different forms, colors, and sizes
The picture may hide parts of the things in the picture
The light conditions vary from image to image
There may be one or multiple objects in the same image

At the beginning of the 1990s, the focus of research shifted to statistical approaches and learning algorithms.

The idea of computer vision is inspired by the fact that the visual cortex has cells activated by specific shapes and their orientation in the visual field.

The Emergence of CNNs

The basic concept of a neural network in computer vision has existed since the 1980s. It goes back to research from Hubel and Wiesel on the emergence of a cat’s visual system. They found that the visual cortex has cells activated by specific shapes and their orientation in the visual field. Some of their findings inspired the development of crucial computer vision technologies, such as, for example, hierarchical features with different levels of abstraction [1, 2]. However, it took another three decades of research and the availability of faster computers before the emergence of modern CNNs.

The year 2012 was a defining moment for the use of CNNs in image recognition. This year, for the first time, CNN won the ILSVRC competition for computer vision. The challenge was classifying more than a hundred thousand images into 1000 object categories. With an error rate of only 15,3%, the succeeding model was a CNN called “AlexNet.”.

AlexNet was the first model to achieve more than 75% accuracy. In the same year, CNNs succeeded in several other competitions. For example, in 2015, the CNN ResNet exceeded human performance in the ILSVRC competition. Only a decade ago, this achievement was considered almost impossible. So how was this performance increase possible? To understand this surge in performance, let us first look at what a picture is.

Top-performing models in the ImageNet image classification challenge (Alyafeai & Ghouti, 2019)

What is an Image?

A digital image is a three-dimensional array of integer values. One dimension of this array represents the pixel width, and one dimension represents the height of the picture. The third dimension contains the color depth, defined by the image format. As shown below, we can thus represent the format of a digital image as “width x height x depth.” Next, let’s have a quick look at different image formats.

A digital image is a multidimensional integer array.

Overview of Different Image Formats

We can train CNNs with different image formats, but the input data are always multidimensional arrays of integer values. One of the most commonly used color formats in deep learning is “RGB.” RGB stands for the three color channels: “Red,” “Green,” and “Blue.” RGB images are divided into three layers of integer values, one layer for each color channel—the integer values of a 16-bit RGB image in each layer range from 1 to 255. Together, the three layers can reproduce 65,536 different colors.

In contrast to RGB images, grey-scale images only have a single color layer. This layer resembles the brightness of each pixel in the image. Consequently, the format of a grey-scale image is width x height x 1. Using grey-scale images or images with black and white shades instead of RGB images can speed up the training process because less data needs to be processed. However, image data with multiple color channels provide the model with more information, leading to better predictions. The RGB format is often a good choice between prediction quality and performance. Next, let’s look at how CNNs handle digital images in the learning process.

Convolutional Neural Networks

As mentioned before, a CNN is a specific form of an artificial neural network. The main difference between the CNN and the standard multi-layer perceptron is their convolutional layers. CNNs can have other layers, but the convolutions make a CNN so good at detecting objects. They allow the network to identify patterns based on features that work regardless of where in the image they occur. Let’s see how this works in more detail.

Convolutional Layers

Convolutional layers use a rasterizing technique that breaks down an image into smaller groups of pixels called filters. Filters act as feature detectors from the original image. The primary purpose is to extract meaningful features from the input images.

During the training, the CNN slides the filter over image locations and calculates the dot product for each feature at a time. The results of these calculations are stored in a so-called feature map (sometimes called an activation map). A feature map represents where in the image a particular feature was identified. Subsequently, the values from the feature map are transformed with an activation function (usually ReLu), and the algorithm uses them as input to the next layer.

Illustration of operations in the convolutional layers

Features become more complex with the increasing depth of the network. In the first layer of the network, convolutions will detect generic geometric forms and low-level features based on edges, corners, squares, or circles. The subsequent layers of the network will look at more sophisticated shapes and may, for example, include features that resemble the form of an eye of a cat or the nose of a dog. In this way, convolutions provide the network with features at different levels of detail that enable powerful detection patterns.

Exemplary convolutions of an image that contains the number “3.”

Pooling / Downsampling

A convolutional layer is usually followed by a pooling operation, which reduces the amount of data by filtering unnecessary information. This process is also called downsampling or subsampling. There are various forms of pooling. In the most common variant – max-pooling – only the highest value in a predefined grid (e.g., 2×2) is processed, and the remaining values are discarded. For example, imagine a 2×2 grid with values 0.1, 0.5, 0.4, and 0.8. The algorithm would only process the 0,8 further for this grid and use it as part of the input to the next layer. The advantages of pooling are reduced data and faster training times. Because pooling minimizes the complexity of the network, it allows for the construction of deeper architectures with more layers. In addition, pooling offers a certain protection against overfitting during training.

Dropout

Dropout is another technique that helps prevent the network from overfitting the training data. When we activate Dropout for a layer, the algorithm will remove a random number of neurons from the layer per training step. As a result, the network needs to learn patterns that give less weight to individual layers and thus generalize better. The dropout rate controls the percentage of switched-off neurons in each training iteration. We can configure Dropout for each layer separately.

CNNs with many layers and training epochs tend to overfit the training data. Especially here, Dropout is crucial to avoid overfitting and to achieve good prediction results with data that the network does not know yet. A typical value for the rate lies between 10% to 30%.

Multi-Layer Perceptron (MLP)

The CNN architecture ends with multiple dense layers that are fully connected. The layers are part of a Multilayer Perception (MLP), which has the task of dense down the results from the previous convolutions and outputting one of the multiple classes. Consequently, the number of neurons in the final dense layer usually corresponds to the number of different classes to be predicted. It is also possible to use a single neuron in the final layer for two-class prediction problems. In this case, the last neuron outputs a binary label of 0 or 1.

Building a CNN with Tensorflow that Classifies Cats and Dogs

Now that you are familiar with the basic concepts behind convolutional neural networks, we can commence with the practical part and build an image classifier. In the following, we will train a CNN to distinguish images of cats and dogs. We first define a CNN model and then feed it a few thousand photos from a public dataset with labeled images of cats and dogs.

Distinguishing cats and dogs may not sound difficult, but many challenges exist. Imagine the almost infinite circumstances in which animals can be photographed, not to mention the many forms a cat can take. These variations lead to the fact that even humans sometimes confuse a cat with a dog or vice versa. So don’t expect our model to be perfect right from the start. Our model will score around 82% accuracy on the validation dataset.

The code is available on the GitHub repository.

View on GitHub Relataly GitHub Repo

Cat or Dog? That’s what our CNN will predict.

Prerequisites

Before starting the coding part, make sure that you have set up your Python 3 environment and required packages. If you don’t have an environment, you can follow this tutorial to set up the Anaconda environment.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:

In addition, we will be using Keras (2.0 or higher) with Tensorflow backend and the machine learning library Scikit-learn.

You can install packages using console commands:

pip install
conda install (if you are using the anaconda packet manager)

Download the Dataset

We will train our image classification model with a public dataset from Kaggle.com. The dataset contains more than 25.000 JPG pictures of cats and dogs. The images are uniformly named and numbered, for example, dog.1.jpg, dog.2.jpg, dog.3.jpg, cat.1.jpg, cat.2.jpg, and so on. You can download the picture set directly from Kaggle: cats-vs-dogs.

Setup the Folder Structure

There are different ways data can be structured and loaded during model training. One approach (1) is to split the images into classes and create a separate folder for each class, class_a, class_b, etc. Another method (2) is to put all images into a single folder and define a DataFrame that splits the data into test and train. Because the cats and dogs dataset files already contain the classes in their name, I decided to go for the second approach.

Before we begin with the coding part, we create a folder structure that looks as follows:

The folder structure of our cats and dogs prediction project

If you want to use the standard pathways given in the python tutorial, make sure that your notebook resides in the parent folder of the “data” folder.

After you have created the folder structure, open the cats-vs-dogs zip file. The ZIP file contains the folders “train,” “test,” and “sample.” Unzip the JPG files from the “train” (20.000 images) and the “test” folder (5.000 pictures) to the “train” folder of your project. Afterward, the train folder should contain 25.000 images. The sample folder is intended to include your sample images, for example, of your pet. We will later use the images from the sample folder to test the model on new real-world data.

We have fulfilled all requirements and can start with the coding part.

Step #1 Make Imports and Check Training Device

We begin by setting up the imports for this project. I have put the package imports at the beginning to give you a quick overview of the packages you need to install.

Using the GPU instead of the CPU allows for faster training times. However, setting up Tensorflow to work with the GPUs can cause problems. Not everyone has a GPU; in this case, TensorFlow should usually automatically run all code on the CPU. However, should you for any reason prefer to manually switch to CPU training, change [“CUDA_VISIBLE_DEVICES”]= “1” to “-1”. As a result, Tensorflow will run all code on the CPU and ignore all available GPUs.

import os
#os.environ["CUDA_VISIBLE_DEVICES"]="-1" 

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D
from tensorflow.keras.layers import Conv2D, Activation, Dropout, Flatten, Dense, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.metrics import Accuracy
from tensorflow.keras import regularizers
from tensorflow.keras.optimizers import SGD, Adam
from tensorflow.python.client import device_lib
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score

tf.config.allow_growth = True
tf.config.per_process_gpu_memory_fraction = 0.9

from random import randint
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import seaborn as sns
from PIL import Image
import random as rdn

Running the command below checks the TensorFlow version and the number of available GPUs in our system.

# check the tensorflow version
print('Tensorflow Version: ' + tf.__version__)

# check the number of available GPUs
physical_devices = tf.config.list_physical_devices('GPU')
print("Num GPUs:", len(physical_devices))

Tensorflow Version: 2.4.0-rc3
Num GPUs: 1

My GPU is an RTX 3080. When I wrote this article, the GPU was not yet supported by the standard TensorFlow release. I have therefore used the pre-release version of TensorFlow (2.4.0-rc3). I expect the following standard release (2.3) to work fine.

In my case, the GPU check returns one because I have a single GPU on my computer. If TensorFlow doesn’t recognize any GPU, this command will return 0. Tensorflow will then run on the CPU.

Step #2 Define the Prediction Classes

Next, we will define the path to the folders that contain our train and validation images. In addition, we will define a Dataframe “image_df,” which has all the pictures from the “train” folder. With the help of this Dataframe, we can later split the data simply by defining which images from the train folder contain the training dataset and which belong to the test dataset. Important note: the dataframe “image_df” only includes the names of the images and the classes, but not the photos themselves.

It’s good to check the distribution of classes in the training data set. For this purpose, we create a bar plot, which illustrates the number of both classes in the image data. And yes, I admit, I choose some custom colors to make it look fancy.

# set the directory for train and validation images
train_path = 'data/images/cats-and-dogs/train/'
#test_path = 'data/cats-and-dogs/test/'

# function to create a list of image labels 
def createImageDf(path):
    filenames = os.listdir(path)
    categories = []

    for fname in filenames:
        category = fname.split('.')[0]
        if category == 'dog':
            categories.append(1)
        else:
            categories.append(0)
    df = pd.DataFrame({
        'filename':filenames,
        'category':categories
    })
    return df

# display the header of the train_df dataset
image_df = createImageDf(train_path)
image_df.head(5)

sns.countplot(y='category', data=image_df, palette=['#2FE5C7',"#2F8AE5"], orient="h")

The number of images in the two classes is balanced, so we don’t need to rebalance the data. That’s nice!

Step #3 Plot Sample Images

I prefer not to jump directly into preprocessing and check that the data has been correctly loaded. We will do this by plotting some random images from the train folder. This step is not necessary, but it’s a best practice.

n_pictures = 16 # number of pictures to be shown
columns = int(n_pictures / 2)
rows = 2
plt.figure(figsize=(40, 12))
for i in range(n_pictures):
    num = i + 1
    ax = plt.subplot(rows, columns, i + 1)
    if i < columns:
        image_name = 'cat.' + str(rdn.randint(1, 1000)) + '.jpg'
    else: 
        image_name = 'dog.' + str(rdn.randint(1, 1000)) + '.jpg'
    plt.xlabel(image_name)    
    plt.imshow(load_img(train_path + image_name)) 

#if you get a deprecated warning, you can ignore it

I never expected to have so many pictures of cats and dogs one day, but I guess neither did you 🙂 Neural networks require a fixed input shape where each neuron corresponds to a pixel value.

As we can see from the sample images, the images in our dataset have different sizes and aspect ratios. For the images to fit into the input shape of our neural network, we need to put the images into a standard format. But before that, we split the data into two datasets for train and test.

Step #4 Split the Data

Image classification requires splitting the data into a train and a validation set. We define a split ratio of 1/5 so that 80% of the data goes into the training dataset and 20% goes into the validation dataframe. We shuffle the data to create two DataFrameswith a mix of random cat and dog pictures. In addition, we transform the classes of the images into categorical values 0->”cat” and 1->”dog”. The result is two new DataFrames: train_df (20.000 images) and validate_df (5.000 images).

image_df["category"] = image_df["category"].replace({0:'cat',1:'dog'})

train_df, validate_df = train_test_split(image_df, test_size=0.20, random_state=42)
train_df = train_df.reset_index(drop=True)
total_train = train_df.shape[0]

validate_df = validate_df.reset_index(drop=True)
total_validate = validate_df.shape[0]
train_df.head()

print(len(train_df), len(validate_df))

Output: 20000 5000

Step #5 Preprocess the Images

The next step is to define two data generators for these DataFrames, which use the names given in the train and validation DataFrames to feed the images from the “train” path into our neural network. The data generator has various configuration options. We will perform the following operations:

Rescale the image by dividing their RGB color values (1-255) by 255
Shuffle the images (again)
Bring the images into a uniform shape of 128 x 128 pixels
We define a batch size of 32, which processes the 32 images simultaneously.
The class mode is “binary” so our two prediction labels are encoded as float32 scalars with values 0 or 1. As a result, we will only have a single end neuron in our network.
We perform some data augmentation techniques on the training data (incl. horizontal flip, shearing, and zoom). In this way, the model never sees different variants of the images, which helps to prevent overfitting.

Some augmentation techniques

It is essential to mention that the input shape of the first layer of the neural network must correspond to the image shape of 128 x 128. The reason is that each pixel becomes an input to a neuron.

# set the dimensions to which we will convert the images
img_width, img_height = 128, 128
target_size = (img_width, img_height)
batch_size = 32
rescale=1.0/255

# configure the train data generator
print('Train data:')
train_datagen = ImageDataGenerator(rescale=rescale)
train_generator = train_datagen.flow_from_dataframe(
    train_df, 
    train_path,
    shear_range=0.2, #
    zoom_range=0.2, #
    horizontal_flip=True, # 
    shuffle=True, # shuffle the image data
    x_col='filename', y_col='category',
    classes=['dog', 'cat'],
    target_size=target_size,
    batch_size=batch_size,
    color_mode="rgb",
    class_mode='binary')

# configure test data generator
# only rescaling
print('Test data:')
validation_datagen = ImageDataGenerator(rescale=rescale)
validation_generator = validation_datagen.flow_from_dataframe(
    validate_df, 
    train_path,    
    shuffle=True,
    x_col='filename', y_col='category',
    classes=['dog', 'cat'],
    target_size=target_size,
    batch_size=batch_size,
    color_mode="rgb",
    class_mode='binary')

Train data:
Found 20000 validated image filenames belonging to 2 classes.
Test data:
Found 5000 validated image filenames belonging to 2 classes.

At this point, we have already completed the data preprocessing part. The next step is to define and compile the convolutional neural network.

Step #6 Define and Compile the Convolutional Neural Network

The architecture of our image classification CNN is inspired by the famous VGGNet. In this section, we will define and compile our CNN model. We do this by defining multiple layers and stacking them on top of each other. However, to lower the amount of time needed to train the network, I reduced the number of layers.

The initial layer of our network is the initial input layer, which receives the preprocessed images. As already noted, the shape of the input layer needs to match the shape of our images. Considering how we have defined the format of the images in our data generators, the input shape is defined as 128 x 128 x 3.

The subsequent layers are four convolutional layers. Each of these layers is followed by a pooling layer. In addition, we define a Dropoutrate of 20% for each convolutional layer.

Finally, a fully connected output layer with 128 neurons and a binary layer for the output complete the structure of the CNN.

3-dimensional Input Shape of our Neural Network

Additional Info

Loss function: measures model accuracy during training. We try to minimize this function to “steer” the model in the right direction. We use binary_crossentropy.
Optimizer: defines how the model weights are updated based on the data it sees and its loss function.
Metrics are used to monitor the steps during training and testing. The following example uses accuracy, which is the fraction of the correctly classified images.

# define the input format of the model
input_shape = (img_width, img_height, 3)
print(input_shape)

# define  model
model = Sequential()
model.add(Conv2D(32, (3, 3), strides=(1, 1), activation='relu', kernel_initializer='he_uniform', padding='same', input_shape=input_shape))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.20))
model.add(Conv2D(64, (3, 3), strides=(1, 1), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.20))
model.add(Conv2D(64, (3, 3), strides=(1, 1), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.20))
model.add(Conv2D(128, (3, 3),  strides=(1, 1),activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.20))
model.add(Flatten())
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

# compile the model and print its architecture
opt = SGD(lr=0.001, momentum=0.9)
history = model.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])
print(model.summary())

input_shape: (100, 100, 3)
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 100, 100, 32)      896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 50, 50, 32)        0         
_________________________________________________________________
dropout (Dropout)            (None, 50, 50, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 50, 50, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 25, 25, 64)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 25, 25, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 25, 25, 64)        36928     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 12, 12, 64)        0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 12, 12, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 12, 12, 128)       73856     
_________________________________________________________________
...
Trainable params: 720,257
Non-trainable params: 0
_________________________________________________________________
None

At this point, we have defined and assembled our convolutional neural network. Next, it is time to train the model.

Step #7 Train the Model

Before we train the image classifier, we still have to choose the number of epochs. More epochs can improve the model performance and lead to longer training times. In addition, the risk increases that the model overfits. Finding the optimal number of epochs is difficult and often requires a trial-and-error approach. I typically start with a small number of 5 epochs and then increase this number until increases do not lead to significant improvements.

# train the model
epochs = 40
early_stop = EarlyStopping(monitor='loss', patience=6, verbose=1)

history = model.fit(
    train_generator,
    epochs=epochs,
    callbacks=[early_stop],
    steps_per_epoch=len(train_generator),
    verbose=1,
    validation_data=validation_generator,
    validation_steps=len(validation_generator))

Epoch 1/35
625/625 [==============================] - 121s 194ms/step - loss: 0.7050 - accuracy: 0.5282 - val_loss: 0.6902 - val_accuracy: 0.5824
Epoch 2/35
625/625 [==============================] - 115s 183ms/step - loss: 0.6853 - accuracy: 0.5469 - val_loss: 0.6856 - val_accuracy: 0.5806
Epoch 3/35
625/625 [==============================] - 115s 184ms/step - loss: 0.6744 - accuracy: 0.5752 - val_loss: 0.6746 - val_accuracy: 0.5806
Epoch 4/35
625/625 [==============================] - 112s 180ms/step - loss: 0.6569 - accuracy: 0.5987 - val_loss: 0.6593 - val_accuracy: 0.6110
Epoch 5/35
625/625 [==============================] - 115s 185ms/step - loss: 0.6423 - accuracy: 0.6194 - val_loss: 0.6474 - val_accuracy: 0.6134
Epoch 6/35
625/625 [==============================] - 116s 185ms/step - loss: 0.6309 - accuracy: 0.6370 - val_loss: 0.6386 - val_accuracy: 0.6260
Epoch 7/35
625/625 [==============================] - 115s 183ms/step - loss: 0.6139 - accuracy: 0.6539 - val_loss: 0.6082 - val_accuracy: 0.6682

A quick comment on the required time to train the model. Although the model is not overly complex and the size of the data is still moderate, training the model can take some time. I made two training runs – the first run on my GPU (Nvidia Geforce 3080 RTX) and the second on my CPU (AMD Ryzen 3700x). On the GPU, training took approximately 10 minutes. The CPU training was much slower and took about 30 minutes, three times longer than the GPU.

After training, you may want to save the classification model and load it at a later time. You can do this with the code below:
However, we need to define the model strictly as it was during training before loading.

# Safe the weights
model.save_weights('cats-and-dogs-weights-v1.h5')

# Define model as during training
# model architecture

# Loads the weights
model.load_weights('cats-and-dogs-weights-v1.h5')

Step #8 Visualize Model Performance

After training the model, we want to check the performance of our image classification model. For this purpose, we can apply the same performance measures as in traditional classification projects. The code below illustrates the performance of our image classifier on the validation dataset.

To learn more about measuring model performance, check out my previous post on Measuring Model Performance.

def plot_loss(history, value1, value2, title):
    fig, ax = plt.subplots(figsize=(15, 5), sharex=True)
    plt.plot(history.history[value1], 'b')
    plt.plot(history.history[value2], 'r')
    plt.title(title)
    plt.ylabel("Loss")
    plt.xlabel("Epoch")
    ax.xaxis.set_major_locator(plt.MaxNLocator(epochs))
    plt.legend(["Train", "Validation"], loc="upper left")
    plt.grid()
    plt.show()

# plot training & validation loss values
plot_loss(history, "loss", "val_loss", "Model loss")
# plot training & validation loss values
plot_loss(history, "accuracy", "val_accuracy", "Model accuracy")

Next, let’s print the accuracy and a confusion matrix on the predictions from the validation dataset.

# function that returns the label for a given probability
def getLabel(prob):
    if(prob > .5):
               return 'dog'
    else:
               return 'cat'

# get the predictions for the validation data
val_df = validate_df.copy()
val_df['pred'] = ""
val_pred_prob = model.predict(validation_generator)

for i in range(val_pred_prob.shape[0]):
    val_df['pred'][i] = getLabel(val_pred_prob[i])
          
# create a confusion matrix
y_val = val_df['category']
y_pred = val_df['pred']

print('Accuracy: {:.2f}'.format(accuracy_score(y_val, y_pred)))
cnf_matrix = confusion_matrix(y_val, y_pred)

# plot the confusion matrix in form of a heatmap

%matplotlib inline
class_names=[False, True] # name  of classes
fig, ax = plt.subplots(figsize=(8, 8))
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names)
plt.yticks(tick_marks, class_names)
sns.heatmap(pd.DataFrame(cnf_matrix), annot=True, cmap="YlGnBu", fmt='g')
plt.title('Confusion matrix')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')

Accuracy: 0.82

Step #9 Image Classification on Sample Images

Now that we have trained the model, I bet you can’t wait to test the image classifier on some sample data. For this purpose, ensure that you have some sample images in the “sample” folder. Running the code below will feed the image classifier with the test dataset. Based on this dataset, the model will then predict the labels for the images from the sample folder. Finally, the code below prints the images in an image grid and the predicted labels.

# set the path to the sample images
sample_path = "data/images/cats-and-dogs/sample/"
sample_df = createImageDf(sample_path)
sample_df['category'] = sample_df['category'].replace({0:'cat',1:'dog'})
sample_df['pred'] = ""

# create an image data generator for the sample images - we will only rescale the images
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_dataframe(
    sample_df, 
    sample_path,    
    shuffle=False,
    x_col='filename', y_col='category',
    target_size=target_size)

# make the predictions 
pred_prob = model.predict(test_generator)
image_number = pred_prob.shape[0]

# define the plot size
for i in range(pred_prob.shape[0]):
    sample_df['pred'][i] = getLabel(pred_prob[i])
    
print('Accuracy: {:.2f}'.format(accuracy_score(sample_df['category'], sample_df['pred'])))

nrows = 6
ncols = int(round(image_number / nrows, 0))
fig, axs = plt.subplots(nrows, ncols, figsize=(15, 15))
for i, ax in enumerate(fig.axes):
    if i < sample_df.shape[0]:
        filepath = sample_path + sample_df.at[i ,'filename']
        ax = ax
        img = Image.open(filepath).resize(target_size)
        ax.imshow(img)
        ax.set_title(sample_df.at[i ,'filename'] + '\n' + ' predicted: '  + str(sample_df.at[i ,'pred']))
        result = [True if sample_df.at[i ,'pred'] == sample_df.at[i ,'category'] else False]
        ax.set_xlabel(str(result))
        ax.set_xticks([]); ax.set_yticks([])

Our image classifier achieves an accuracy of around 83% on the validation set. The model is not perfect, but it should have labeled most images correctly. With deeper architectures, more data, and training runs, you can create classification models that achieve better results over 95%.

Summary

In this tutorial, you learned how to train an image classification model. We have prepared a dataset and performed several transformations to bring the data in shape for training. Finally, we have trained a convolutional neural network to distinguish between dogs and cats. You can now use this knowledge to train image classification models that determine other objects.

There are many other cool things that you can do with CNNs. For example, object localization in images and videos and even stock market prediction. But these are topics for further articles.

I am always happy to receive feedback. I hope you enjoyed the article and would be happy if you left a comment. Cheers

Sources and Further Reading

Andriy Burkov Machine Learning Engineering
Oliver Theobald (2020) Machine Learning For Absolute Beginners: A Plain English Introduction
Charu C. Aggarwal (2018) Neural Networks and Deep Learning
Aurélien Géron (2019) Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
David Forsyth (2019) Applied Machine Learning Springer
[1] D. H. Hubel and T. N. Wiesel – Receptive Fields of Neurons in the Cat’s Striate Cortex, The Journal of physiology (1959)

The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.

The post Image Classification with Convolutional Neural Networks – Classifying Cats and Dogs in Python appeared first on relataly.com.

Mastering Multivariate Stock Market Prediction with Python: A Guide to Effective Feature Engineering Techniques

Florian Follonier — Mon, 29 Jun 2020 21:47:28 +0000

Are you interested in learning how multivariate forecasting models can enhance the accuracy of stock market predictions? Look no further! While traditional time series data provides valuable insights into historical trends, multivariate forecasting models utilize additional features to identify patterns and predict future price movements. This process, known as “feature engineering,” is a crucial step in creating accurate stock market forecasts.

In this article, we dive into the world of feature engineering and demonstrate how it can improve stock market predictions. We explore popular financial analysis metrics, including Bollinger bands, RSI, and Moving Averages, and show how they can be used to create powerful forecasting models.

But we don’t just stop at theory. We provide a hands-on tutorial using Python to prepare and analyze time-series data for stock market forecasting. We leverage the power of recurrent neural networks with LSTM layers, based on the Keras library, to train and test different model variations with various feature combinations.

By the end of this article, you’ll have a thorough understanding of feature engineering and how it can improve the accuracy of stock market predictions. So, buckle up and get ready to discover how multivariate forecasting models can take your stock market analysis to the next level!

New to time series modeling?
Consider starting with the following tutorial on univariate time series models: Stock-market forecasting using Keras Recurrent Neural Networks and Python.

Disclaimer: This article does not constitute financial advice. Stock markets can be very volatile and are generally difficult to predict. Predictive models and other forms of analytics applied in this article only illustrate machine learning use cases.

Squirrels mastered the art of multivariate feature engineering for time series analysis a long time ago. You can do it too! Image generated with Midjourney

Feature Engineering for Stock Market Forecasting – Borrowing Features from Chart Analysis

The idea behind multivariate time series models is to feed the model with additional features that improve prediction quality. An example of such an additional feature is a “moving average.” Adding more features does not automatically improve predictive performance but increases the time needed to train the models. The challenge is to find the right combination of features and to create an input form that allows the model to recognize meaningful patterns. There is no way around conducting experiments and trying out feature combinations. This process of trial and error can be time-consuming. It is, therefore, helping to build upon established indicators.

In stock market forecasting, we can use indicators from chart analysis. This domain forecasts future prices by studying historical prices and trading volume. The underlying idea is that specific patterns or chart formations in the data can signal the timing of beneficial buying or selling decisions. We can borrow indicators from this discipline and use them as input features.

When we develop predictive machine learning models, the difference from chart analysis is that we do not aim to analyze the chart ourselves manually, but try to create a machine learning model, for example, a recurrent neural network, that does the job for us.

A multivariate time-series forecast, as we will create it in this article. Exemplary chart with technical indicators (Bollinger bands, RSI, and Double-EMA)

Stock Market Forecasting – Does this really Work?

It is essential to point out that the effectiveness of chart analysis and algorithmic trading is controversial. There is at least as much controversy about whether it is possible to predict the price of stock markets with neural networks. Various studies and researchers have examined the effectiveness of chart analysis with different results. One of the most significant points of criticism is that it cannot take external events into account. Nevertheless, many financial analysts consider financial indicators when making investment decisions, so a lot of money is moved simply because many people believe in statistical indicators.

So without knowing how well this will work, it is worth an attempt to feed a neural network with different financial indicators. But first and foremost, I see this as an excellent way to show how feature engineering works. Just make sure not to rely on the predictions of these models blindly.

Also: Stock Market Prediction using Univariate Recurrent Neural Networks

Selected Statistical Indicators

The following indicators are commonly used in chart analysis and may be helpful when creating forecasting models:

Relative Strength Index
Simple Moving Averages
Exponential Moving Averages
Bolliger Bands

Relative Strength Index (RSI)

The Relative Strength Index (RSI) is one of the most commonly used oscillating indicators. In 1978, Welles Wilder developed it to determine the momentum of price movements and compare the strength of price losses in a period with price gains. It can take percentage values between 0 and 100.

Reference lines determine how long an existing trend will last before expecting a trend reversal. In other words, when the price is heavily oversold or overbought, one should expect a trend reversal.

The reference line is at 40% (oversold) and 80% (overbought) with an upward trend.
The reference line is at 20% (oversold) and 60% (overbought) with a downtrend.

The formula for the RSI is as follows:

Calculate the sum of all positive and negative price changes in a period (e.g., 30 days):
We then calculate the mean value of the sums with the following formula:
Finally, we calculate the RSI with the following formula:

Simple Moving Averages (SMA)

Simple Moving Averages (SMA) is another technical indicator that financial analysts use to determine if a price trend will continue or reverse. The SMA is the average sum of all values within a certain period. Financial analysts pay close attention to the 200-day SMA (SMA-200). When the price crosses the SMA, this may signal a trend reversal. Furthermore, we often use SMAs for 50 (SMA-50) and 100 days (SMA-100) periods. In this regard, two popular trading patterns include the death cross and a golden cross.

A death cross occurs when the trend line of the SMA-50/100 crosses below the 200-day SMA. This suggests that a falling trend will likely accelerate downwards.
A golden cross occurs when the trend line of the SMA-50/100 crosses over the 200-day SMA, suggesting a rising trend will likely accelerate upwards.

We can use the SMA in the input shape of our model simply by measuring the distance between two trendlines.

Exponential Moving Averages (EMA)

The exponential moving average (EMA) is another lagging trend indicator. Like the SMA, the EMA measures the strength of a price trend. The difference between SMA and EMA is that the SMA assigns equal values to all price points, while the EMA uses a multiplier that weights recent prices higher.

The formula for the EMA is as follows: Calculating the EMA for a given data point requires past price values. For example, to calculate the SMA for today, based on 30 past values, we calculate the average price values for the past 30 days. We then multiply the result by a weighting factor that weighs the EMA. The formula for this multiplier is as follows: Smoothing factor / (1+ days)

It is common to use different smoothing factors. For a 30-day moving average, the multiplier would be [2/(30+1)]= 0.064.

As soon as we have calculated the EMA for the first data point, we can use the following formula to calculate the ema for all subsequent data points: EMA = Closing price x multiplier + EMA (previous day) x (1-multiplier)

Bollinger Bands

Bollinger Bands are a popular technical analysis tool used to identify market volatility and potential price movements in financial markets. They are named after their creator, John Bollinger.

Bollinger Bands consist of three lines that are plotted on a price chart. The middle line is a simple moving average (SMA) of the asset price over a specified period (typically 20 days). The upper and lower lines are calculated by adding and subtracting a multiple (usually two) of the standard deviation of the asset price from the middle line.

The upper band is calculated as: Middle band + (2 x Standard deviation) The lower band is calculated as: Middle band – (2 x Standard deviation)

The standard deviation is a measure of how much the asset price deviates from the average. When the asset price is more volatile, the bands widen, and when the price is less volatile, the bands narrow.

Traders use Bollinger Bands to identify potential buy or sell signals. When the price touches or crosses the upper band, it may be a sell signal, indicating that the asset is overbought. Conversely, when the price touches or crosses the lower band, it may be a buy signal, indicating that the asset is oversold.

Feature Engineering for Time Series Prediction Models in Python

In the following, this tutorial will guide you through the process of implementing a multivariate time series prediction model for the NASDAQ stock market index. Our aim is to equip you with the knowledge and practical skills required to create a powerful predictive model that can effectively forecast stock prices.

Throughout this tutorial, we will take you through a step-by-step approach to building a multivariate time series prediction model. You will learn how to implement and utilize different features to train and measure the performance of your model. Our goal is to ensure that you are not only able to understand the underlying concepts of multivariate time series prediction, but that you are also capable of applying these concepts in a practical setting.

The code is available on the GitHub repository.

View on GitHub Relataly GitHub Repo

Let’s do some feature engineering for machine learning!

" data-image-caption="

Let’s do some feature engineering for machine learning!

" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/robot-artificial-intelligence-colorful-midjourney-relataly-min.png" src="https://www.relataly.com/wp-content/uploads/2023/03/robot-artificial-intelligence-colorful-midjourney-relataly-min.png" alt="Let's do some feature engineering for machine learning!" class="wp-image-12671" srcset="https://www.relataly.com/wp-content/uploads/2023/03/robot-artificial-intelligence-colorful-midjourney-relataly-min.png 496w, https://www.relataly.com/wp-content/uploads/2023/03/robot-artificial-intelligence-colorful-midjourney-relataly-min.png 298w, https://www.relataly.com/wp-content/uploads/2023/03/robot-artificial-intelligence-colorful-midjourney-relataly-min.png 140w" sizes="(max-width: 496px) 100vw, 496px" />

Let’s do some feature engineering for machine learning!

Prerequisites

Before starting the coding part, make sure that you have set up your Python 3 environment and required packages. If you don’t have an environment, follow this tutorial to set up the Anaconda environment.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:

In addition, we will be using Keras (2.0 or higher) with Tensorflow backend to train the neural network, the machine learning library scikit-learn, and the pandas-DataReader. You can install these packages using the following console commands:

pip install
conda install (if you are using the anaconda packet manager)

Step #1 Load the Data

Let’s start by setting up the imports and loading the data. Our Python project will use price data from the NASDAQ composite index (symbol: ^IXIC) from yahoo.finance.com.

# Time Series Forecasting - Feature Engineering For Multivariate Models (Stock Market Prediction Example)
# A tutorial for this file is available at www.relataly.com

import math # Mathematical functions  
import numpy as np # Fundamental package for scientific computing with Python 
import pandas as pd # Additional functions for analysing and manipulating data 
from datetime import date # Date Functions 
import matplotlib.pyplot as plt # Important package for visualization - we use this to plot the market data 
import matplotlib.dates as mdates # Formatting dates 
from sklearn.metrics import mean_absolute_error, mean_squared_error # Packages for measuring model performance / errors 
import tensorflow as tf
from tensorflow.keras.models import Sequential # Deep learning library, used for neural networks 
from tensorflow.keras.layers import LSTM, Dense, Dropout # Deep learning classes for recurrent and regular densely-connected layers 
from tensorflow.keras.callbacks import EarlyStopping # EarlyStopping during model training 
from sklearn.preprocessing import RobustScaler # This Scaler removes the median and scales the data according to the quantile range to normalize the price data  
#from keras.optimizers import Adam # For detailed configuration of the optimizer 
import seaborn as sns # Visualization
sns.set_style('white', { 'axes.spines.right': False, 'axes.spines.top': False})


# check the tensorflow version and the number of available GPUs
print('Tensorflow Version: ' + tf.__version__)
physical_devices = tf.config.list_physical_devices('GPU')
print("Num GPUs:", len(physical_devices))

# Setting the timeframe for the data extraction
end_date =  date.today().strftime("%Y-%m-%d")
start_date = '2010-01-01'

# Getting NASDAQ quotes
stockname = 'NASDAQ'
symbol = '^IXIC'

# You can either use webreader or yfinance to load the data from yahoo finance
# import pandas_datareader as webreader
# df = webreader.DataReader(symbol, start=start_date, end=end_date, data_source="yahoo")

import yfinance as yf #Alternative package if webreader does not work: pip install yfinance
df = yf.download(symbol, start=start_date, end=end_date)

# Quick overview of dataset
df.head()

Tensorflow Version: 2.5.0
Num GPUs: 1
[*********************100%***********************]  1 of 1 completed
			Open		High		Low	Close	Adj 		Close		Volume
Date						
2009-12-31	2292.919922	2293.590088	2269.110107	2269.149902	2269.149902	1237820000
2010-01-04	2294.409912	2311.149902	2294.409912	2308.419922	2308.419922	1931380000
2010-01-05	2307.270020	2313.729980	2295.620117	2308.709961	2308.709961	2367860000
2010-01-06	2307.709961	2314.070068	2295.679932	2301.090088	2301.090088	2253340000
2010-01-07	2298.090088	2301.300049	2285.219971	2300.050049	2300.050049	2270050000

Step #2 Explore the Data

Let’s take a quick look at the data by creating line charts for the columns of our data set.

# Plot line charts
df_plot = df.copy()

ncols = 2
nrows = int(round(df_plot.shape[1] / ncols, 0))

fig, ax = plt.subplots(nrows=nrows, ncols=ncols, sharex=True, figsize=(14, 7))
for i, ax in enumerate(fig.axes):
        sns.lineplot(data = df_plot.iloc[:, i], ax=ax)
        ax.tick_params(axis="x", rotation=30, labelsize=10, length=0)
        ax.xaxis.set_major_locator(mdates.AutoDateLocator())
fig.tight_layout()
plt.show()

Our initial dataset includes six features: High, Low, Open, Close, Volumen, and Adj Close.

Step #3 Feature Engineering

Now comes the exciting part – we will implement additional features. We use various indicators from chart analysis, such as averages for different periods and stochastic oscillators to measure price momentum.

# Indexing Batches
train_df = df.sort_values(by=['Date']).copy()

# Adding Month and Year in separate columns
d = pd.to_datetime(train_df.index)
train_df['Day'] = d.strftime("%d") 
train_df['Month'] = d.strftime("%m") 
train_df['Year'] = d.strftime("%Y") 
train_df

			Open		High		Low			Close		Adj Close	Volume		Day	Month	Year
Date									
2009-12-31	2292.919922	2293.590088	2269.110107	2269.149902	2269.149902	1237820000	31	12		2009
2010-01-04	2294.409912	2311.149902	2294.409912	2308.419922	2308.419922	1931380000	04	01		2010
2010-01-05	2307.270020	2313.729980	2295.620117	2308.709961	2308.709961	2367860000	05	01		2010
2010-01-06	2307.709961	2314.070068	2295.679932	2301.090088	2301.090088	2253340000	06	01		2010
2010-01-07	2298.090088	2301.300049	2285.219971	2300.050049	2300.050049	2270050000	07	01		2010

We create a set of indicators for the training data with the following code. However, we will make one more restriction in the next step since a model with all these indicators does not achieve good results and would take far too long to train on a local computer.

# Feature Engineering
def createFeatures(df):
    df = pd.DataFrame(df)

    
    df['Close_Diff'] = df['Adj Close'].diff()
        
    # Moving averages - different periods
    df['MA200'] = df['Close'].rolling(window=200).mean() 
    df['MA100'] = df['Close'].rolling(window=100).mean() 
    df['MA50'] = df['Close'].rolling(window=50).mean() 
    df['MA26'] = df['Close'].rolling(window=26).mean() 
    df['MA20'] = df['Close'].rolling(window=20).mean() 
    df['MA12'] = df['Close'].rolling(window=12).mean() 
    
    # SMA Differences - different periods
    df['DIFF-MA200-MA50'] = df['MA200'] - df['MA50']
    df['DIFF-MA200-MA100'] = df['MA200'] - df['MA100']
    df['DIFF-MA200-CLOSE'] = df['MA200'] - df['Close']
    df['DIFF-MA100-CLOSE'] = df['MA100'] - df['Close']
    df['DIFF-MA50-CLOSE'] = df['MA50'] - df['Close']
    
    # Moving Averages on high, lows, and std - different periods
    df['MA200_low'] = df['Low'].rolling(window=200).min()
    df['MA14_low'] = df['Low'].rolling(window=14).min()
    df['MA200_high'] = df['High'].rolling(window=200).max()
    df['MA14_high'] = df['High'].rolling(window=14).max()
    df['MA20dSTD'] = df['Close'].rolling(window=20).std() 
    
    # Exponential Moving Averages (EMAS) - different periods
    df['EMA12'] = df['Close'].ewm(span=12, adjust=False).mean()
    df['EMA20'] = df['Close'].ewm(span=20, adjust=False).mean()
    df['EMA26'] = df['Close'].ewm(span=26, adjust=False).mean()
    df['EMA100'] = df['Close'].ewm(span=100, adjust=False).mean()
    df['EMA200'] = df['Close'].ewm(span=200, adjust=False).mean()

    # Shifts (one day before and two days before)
    df['close_shift-1'] = df.shift(-1)['Close']
    df['close_shift-2'] = df.shift(-2)['Close']

    # Bollinger Bands
    df['Bollinger_Upper'] = df['MA20'] + (df['MA20dSTD'] * 2)
    df['Bollinger_Lower'] = df['MA20'] - (df['MA20dSTD'] * 2)
    
    # Relative Strength Index (RSI)
    df['K-ratio'] = 100*((df['Close'] - df['MA14_low']) / (df['MA14_high'] - df['MA14_low']) )
    df['RSI'] = df['K-ratio'].rolling(window=3).mean() 

    # Moving Average Convergence/Divergence (MACD)
    df['MACD'] = df['EMA12'] - df['EMA26']
    
    # Replace nas 
    nareplace = df.at[df.index.max(), 'Close']    
    df.fillna((nareplace), inplace=True)
    
    return df

Now that we have created several features, we will limit them. We can now choose from these features and test how different feature combinations affect model performance.

# List of considered Features
FEATURES = [
#             'High',
#             'Low',
#             'Open',
              'Close',
#             'Volume',
#             'Day',
#             'Month',
#             'Year',
#             'Adj Close',
#              'close_shift-1',
#              'close_shift-2',
#             'MACD',
#             'RSI',
#             'MA200',
#             'MA200_high',
#             'MA200_low',
            'Bollinger_Upper',
            'Bollinger_Lower',
#             'MA100',            
#             'MA50',
#             'MA26',
#             'MA14_low',
#             'MA14_high',
#             'MA12',
#             'EMA20',
#             'EMA100',
#             'EMA200',
#               'DIFF-MA200-MA50',
#               'DIFF-MA200-MA100',
#             'DIFF-MA200-CLOSE',
#             'DIFF-MA100-CLOSE',
#             'DIFF-MA50-CLOSE'
           ]

# Create the dataset with features
df_features = createFeatures(train_df)

# Shift the timeframe by 10 month
use_start_date = pd.to_datetime("2010-11-01" )
df_features = df_features[df_features.index > use_start_date].copy()

# Filter the data to the list of FEATURES
data_filtered_ext = df_features[FEATURES].copy()

# We add a prediction column and set dummy values to prepare the data for scaling
#data_filtered_ext['Prediction'] = data_filtered_ext['Close'] 
print(data_filtered_ext.tail().to_string())

# remove Date column before training
dfs = data_filtered_ext.copy()

# Create a list with the relevant columns
assetname_list = [dfs.columns[i-1] for i in range(dfs.shape[1])]

# Create the lineplot
fig, ax = plt.subplots(figsize=(16, 8))
sns.lineplot(data=data_filtered_ext[assetname_list], linewidth=1.0, dashes=False, palette='muted')

# Configure and show the plot    
ax.set_title(stockname + ' price chart')
ax.legend()
plt.show

 			Close  			Bollinger_Upper  Bollinger_Lower
Date                                                      
2022-05-18  11418.150391     13404.779247     11065.040772
2022-05-19  11388.500000     13285.741255     11005.463725
2022-05-20  11354.620117     13214.664450     10928.073538
2022-05-23  11535.269531     13075.594634     10920.185347
2022-05-24  11264.450195     13035.543222     10837.607755

Step #4 Scaling and Transforming the Data

Before training our model, we need to transform the data. This step includes scaling the data (to a range between 0 and 1) and dividing it into separate sets for training and testing the prediction model. Most of the code used in this section stems from the previous article on multivariate time-series prediction, which covers the steps to transform the data. So we don’t go into too much detail here.

# Calculate the number of rows in the data
nrows = dfs.shape[0]
np_data_unscaled = np.reshape(np.array(dfs), (nrows, -1))
print(np_data_unscaled.shape)

# Transform the data by scaling each feature to a range between 0 and 1
scaler = RobustScaler()
np_data = scaler.fit_transform(np_data_unscaled)

# Creating a separate scaler that works on a single column for scaling predictions
scaler_pred = RobustScaler()
df_Close = pd.DataFrame(data_filtered_ext['Close'])
np_Close_scaled = scaler_pred.fit_transform(df_Close)

Out: (2619, 6)

Once we have scaled the data, we will split the data into a train and test set. This step creates four datasets x_train and x_test, and y_train and y_test. x_train and x_test contain the data with our selected features. The two sets, y_train and y_test, have the actual values, which our model will try to predict.

# Set the sequence length - this is the timeframe used to make a single prediction
sequence_length = 50 # = number of neurons in the first layer of the neural network

# Split the training data into train and train data sets
# As a first step, we get the number of rows to train the model on 80% of the data 
train_data_len = math.ceil(np_data.shape[0] * 0.8)

# Create the training and test data
train_data = np_data[:train_data_len, :]
test_data = np_data[train_data_len - sequence_length:, :]

# The RNN needs data with the format of [samples, time steps, features]
# Here, we create N samples, sequence_length time steps per sample, and 6 features
def partition_dataset(sequence_length, data):
    x, y = [], []
    data_len = data.shape[0]

    for i in range(sequence_length, data_len):
        x.append(data[i-sequence_length:i,:]) #contains sequence_length values 0-sequence_length * columsn
        y.append(data[i, 0]) #contains the prediction values for validation,  for single-step prediction
    
    # Convert the x and y to numpy arrays
    x = np.array(x)
    y = np.array(y)
    return x, y

# Generate training data and test data
x_train, y_train = partition_dataset(sequence_length, train_data)
x_test, y_test = partition_dataset(sequence_length, test_data)

# Print the shapes: the result is: (rows, training_sequence, features) (prediction value, )
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# Validate that the prediction value and the input match up
# The last close price of the second input sample should equal the first prediction value
print(x_train[1][sequence_length-1][0])
print(y_train[0])

Out:
(1914, 30, 3) (1914,) 
(486, 30, 3) (486,)

Step #5 Train the Time Series Forecasting Model

Now that we have prepared the data, we can train our forecasting model. For this purpose, we will use a recurrent neural network from the Keras library. A recurrent neural network (RNN) is a type of artificial neural network that can process sequential data, such as text, audio, or time series data. Unlike traditional feedforward neural networks, in which data flows through the network in only one direction, RNNs have connections that form a directed cycle, allowing information to flow in multiple directions and be processed in a temporal manner.

The model architecture of our RNN looks as follows:

LSTM layer that receives a mini-batch as input.
LSTM layer that has the same number of neurons as the mini-batch
Another LSTM layer that does not return the sequence
Dense layer with 32 neurons
Dense layer with one neuron that outputs the forecast

The architecture is not too complex and is suitable for experimenting with different features. I arrived at this architecture by trying out different layers and configurations. However, I did not spend too much time fine-tuning the architecture since this tutorial focuses on feature engineering.

During model training, the neural network processes several mini-batches. The shape of the mini-batch is defined by the number of features and the period chosen. Multiplying these two dimensions (number of features x number of time steps) gives the input shape of our model.

The following code defines the model architecture, trains the model, and then prints the training loss curve:

# Configure the neural network model
model = Sequential()

# Configure the Neural Network Model with n Neurons - inputshape = t Timestamps x f Features
n_neurons = x_train.shape[1] * x_train.shape[2]
print('timesteps: ' + str(x_train.shape[1]) + ',' + ' features:' + str(x_train.shape[2]))
model.add(LSTM(n_neurons, return_sequences=True, input_shape=(x_train.shape[1], x_train.shape[2]))) 
#model.add(Dropout(0.1))
model.add(LSTM(n_neurons, return_sequences=True))
#model.add(Dropout(0.1))
model.add(LSTM(n_neurons, return_sequences=False))
model.add(Dense(32))
model.add(Dense(1, activation='relu'))


# Configure the Model   
optimizer='adam'; loss='mean_squared_error'; epochs = 100; batch_size = 32; patience = 8; 

# uncomment to customize the learning rate
learn_rate = "standard" # 0.05
# adam = Adam(learn_rate=learn_rate) 

parameter_list = ['epochs ' + str(epochs), 'batch_size ' + str(batch_size), 'patience ' + str(patience), 'optimizer ' + str(optimizer) + ' with learn rate ' + str(learn_rate), 'loss ' + str(loss)]
print('Parameters: ' + str(parameter_list))

# Compile and Training the model
model.compile(optimizer=optimizer, loss=loss)
early_stop = EarlyStopping(monitor='loss', patience=patience, verbose=1)
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, callbacks=[early_stop], shuffle = True,
                  validation_data=(x_test, y_test))

# Plot training & validation loss values
fig, ax = plt.subplots(figsize=(12, 6), sharex=True)
plt.plot(history.history["loss"])
plt.title("Model loss")
plt.ylabel("Loss")
plt.xlabel("Epoch")
ax.xaxis.set_major_locator(plt.MaxNLocator(epochs))
plt.legend(["Train", "Test"], loc="upper left")
plt.show()

timesteps: 50, features:1
Parameters: ['epochs 100', 'batch_size 32', 'patience 8', 'optimizer adam with learn rate standard', 'loss mean_squared_error']
Epoch 1/100
72/72 [==============================] - 9s 55ms/step - loss: 0.0990 - val_loss: 0.2985
Epoch 2/100
72/72 [==============================] - 3s 37ms/step - loss: 0.0932 - val_loss: 0.1768
Epoch 3/100
72/72 [==============================] - 3s 39ms/step - loss: 0.0931 - val_loss: 0.1246
Epoch 4/100
72/72 [==============================] - 3s 37ms/step - loss: 0.0931 - val_loss: 0.0902
Epoch 5/100
72/72 [==============================] - 3s 38ms/step - loss: 0.0929 - val_loss: 0.0846
Epoch 6/100
72/72 [==============================] - 3s 38ms/step - loss: 0.0930 - val_loss: 0.0611
Epoch 7/100
72/72 [==============================] - 3s 38ms/step - loss: 0.0929 - val_loss: 0.0498
Epoch 8/100
72/72 [==============================] - 3s 37ms/step - loss: 0.0928 - val_loss: 0.0208
Epoch 9/100
72/72 [==============================] - 3s 38ms/step - loss: 0.0929 - val_loss: 0.0588
Epoch 10/100
72/72 [==============================] - 3s 37ms/step - loss: 0.0928 - val_loss: 0.0437
Epoch 11/100
72/72 [==============================] - 3s 36ms/step - loss: 0.0928 - val_loss: 0.0192
Epoch 12/100
...
72/72 [==============================] - 3s 38ms/step - loss: 0.0925 - val_loss: 0.0094
Epoch 46/100
72/72 [==============================] - 3s 37ms/step - loss: 0.0925 - val_loss: 0.0113
Epoch 00046: early stopping

The loss drops quickly, and the training process looks promising.

Step #6 Evaluate Model Performance

If we test a feature, we also want to know how it impacts the performance of our model. Feature Engineering is therefore closely related to evaluating model performance. So, let’s check the prediction performance. For this purpose, we score the model with the test data set (x_test). Then we can compare the predictions with the actual values (y_test) in a lineplot.

# Get the predicted values
y_pred_scaled = model.predict(x_test)

# Unscale the predicted values
y_pred = scaler_pred.inverse_transform(y_pred_scaled)
y_test_unscaled = scaler_pred.inverse_transform(y_test.reshape(-1, 1))
y_test_unscaled.shape

# Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_unscaled, y_pred)
print(f'Median Absolute Error (MAE): {np.round(MAE, 2)}')

# Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled))) * 100
print(f'Mean Absolute Percentage Error (MAPE): {np.round(MAPE, 2)} %')

# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled)) ) * 100
print(f'Median Absolute Percentage Error (MDAPE): {np.round(MDAPE, 2)} %')

# The date from which on the date is displayed
display_start_date = "2019-01-01" 

# Add the difference between the valid and predicted prices
train = pd.DataFrame(dfs['Close'][:train_data_len + 1]).rename(columns={'Close': 'y_train'})
valid = pd.DataFrame(dfs['Close'][train_data_len:]).rename(columns={'Close': 'y_test'})
valid.insert(1, "y_pred", y_pred, True)
valid.insert(1, "residuals", valid["y_pred"] - valid["y_test"], True)
df_union = pd.concat([train, valid])

# Zoom in to a closer timeframe
df_union_zoom = df_union[df_union.index > display_start_date]

# Create the lineplot
fig, ax1 = plt.subplots(figsize=(16, 8))
plt.title("y_pred vs y_test")
plt.ylabel(stockname, fontsize=18)
sns.set_palette(["#090364", "#1960EF", "#EF5919"])
sns.lineplot(data=df_union_zoom[['y_pred', 'y_train', 'y_test']], linewidth=1.0, dashes=False, ax=ax1)

# Create the barplot for the absolute errors
df_sub = ["#2BC97A" if x > 0 else "#C92B2B" for x in df_union_zoom["residuals"].dropna()]
ax1.bar(height=df_union_zoom['residuals'].dropna(), x=df_union_zoom['residuals'].dropna().index, width=3, label='absolute errors', color=df_sub)
plt.legend()
plt.show()

Median Absolute Error (MAE): 547.23 
Mean Absolute Percentage Error (MAPE): 4.04 % 
Median Absolute Percentage Error (MDAPE): 3.73 %

On average, the predictions of our model deviate from the actual values by about one percent. Although one percent may not sound like a lot, the prediction errors can quickly accumulate to larger values.

Step #7 Overview of Selected Models

In writing this article, I tested various models based on different features. The neural network architecture remained unchanged. Likewise, I kept the hyperparameters the same except for the learning rate. Below are the results of these model variants:

Step #8 Conclusions

Estimating which indicators will lead to good results in advance is difficult. More indicators do not necessarily lead to better results because they increase the model complexity and add data without predictive power. This so-called noise makes it harder for the model to separate important influencing factors from less important ones. Also, each additional indicator increases the time needed to train the model. So there is no way around testing different variants.

Besides the feature, various hyperparameters such as the learning rate, optimizer, batch size, and the selected time frame of the data (sequence_length) impact the model’s performance. Tuning these hyperparameters can further improve model performance.

A learning rate of 0.05 achieves the best results from the tested configurations.
Of all features, only the Bollinger bands positively affected the model’s performance.
As expected, the performance tends to decrease with the number of features.
In our case, the hyperparameters seem to affect the performance of the models more than the choice of features.

Finally, we have optimized only a single parameter. We searched for optimal learning rates while leaving all other parameters unchanged, such as the optimizer, the neural network architecture, or the sequence length. Based on the results, we can draw several conclusions:

There is plenty of room for improvement and experimentation. With more time for experiments and computational power, it will undoubtedly be possible to identify better features and model configurations. So, have fun experimenting! 🙂

Summary

In this tutorial, we have delved into the fascinating world of feature engineering for stock market forecasting using Python. By exploring various features from chart analysis, such as RSI, moving averages, and Bollinger bands, we have developed multiple variants of a recurrent neural network that produce distinct prediction models.

Our experiments have shown that the choice of features can have a significant impact on the performance of the prediction model. Therefore, it’s essential to carefully select features and consider their potential impact on the model. Additionally, keep in mind that the most effective features for recognizing patterns in historical data will vary depending on the specific time series data being analyzed.

By following the crucial steps outlined in this tutorial, you now have the knowledge and tools to apply feature engineering techniques to any multivariate time series forecasting problem. With further experimentation and testing, you can fine-tune your models to achieve the best possible results for your specific use case.

We hope you found this tutorial both informative and helpful. If you have any questions or comments, don’t hesitate to reach out and let us know.

And if you want to learn more about feature preparation and exploration, check out my recent article on Exploratory Feature Preparation for Regression Models.

Sources and Further Reading

The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.

Books on Applied Machine Learning

The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.

The post Mastering Multivariate Stock Market Prediction with Python: A Guide to Effective Feature Engineering Techniques appeared first on relataly.com.

Stock Market Prediction using Multivariate Time Series and Recurrent Neural Networks in Python

Florian Follonier — Mon, 01 Jun 2020 17:20:46 +0000

Regression models based on recurrent neural networks (RNN) can recognize patterns in time series data, making them an exciting technology for stock market forecasting. What distinguishes these RNNs from traditional neural networks is their architecture. It consists of multiple layers of long-term, short-term memory (LSTM). These LSTM layers allow the model to learn patterns in a time series that occur over different periods and are often difficult for human analysts to detect. We can train such models with one feature (univariate forecasting models) or multiple features (multivariate models). Multivariate Models can take more data into account, and if we provide them with relevant features, they can make better predictions. This tutorial uses Python and Keras to implement a multivariate RNN for stock price prediction. We define the architecture of our regression model and then train this model to predict the NASDAQ index.

The remainder of this tutorial proceeds in two parts: We start with a brief intro in which we compare modeling univariate and multivariate time series data. Then we turn to the hands-on part, in which we prepare the multivariate time series data and use it to train a neural network in Python. The model is a recurrent neural network with LSTM layers that forecasts the NASDAQ stock market index. Finally, we evaluate the performance of our model and make a forecast for the next day.

Disclaimer

This article does not constitute financial advice. Stock markets can be very volatile and are generally difficult to predict. Predictive models and other forms of analytics applied in this article only serve the purpose of illustrating machine learning use cases.

Stock market forecasting has become an exciting application for recurrent neural networks.

Univariate vs. Multivariate Time Series Models

Multivariate models and univariate models differ in the number of their input features. While univariate models consider only a single feature, multivariate models use several input variables (features). In stock market forecasting, we can create additional features from price history. Examples are performance indicators such as moving averages, the RSI, or the Sales Volume. We can also include features from other sources, for example, social media sentiment, weather forecasts, etc. Multivariate models that have additional relevant information available have a chance to outperform univariate models. However, this is only true if the features are relevant and are indicative of future price movements.

Preparing data for training univariate models is more straightforward than for multivariate models. If you are new to time series prediction, you might want to look at my earlier articles. These explain how to develop and evaluate univariate time series models:

Univariate Prediction Models

In time series regression, the standard approach is to train a model using past values from the time series that need to be predicted. The assumption is that the value of a time series at time t is closely related to the previous time steps t-1, t-2, t-3, and so on. This approach is similar to chart analysis, which involves identifying patterns in a price chart that can indicate future movements. Both approaches rely on the ability to identify recurring patterns in the data and make accurate predictions based on them. The performance of the model or analysis depends on the ability to identify these patterns and draw the right conclusions from them.

Several techniques can be used to improve the performance of time series regression models, including feature engineering, hyperparameter optimization, and ensemble methods. In addition to these techniques, it is also important to carefully evaluate the performance of the model using appropriate metrics, such as mean squared error or mean absolute error, and to continuously monitor the model’s performance to ensure it remains accurate over time.

Univariate Time Series Prediction

Multivariate Prediction Models

Predicting the price of a financial asset is a challenging task due to the numerous variables that can influence it, including economic cycles, political events, unforeseen occurrences, psychological factors, market sentiment, and even the weather. These variables are often interdependent, which makes statistical modeling even more complex. While multivariate models can take into account several factors, they are still a simplification of reality and may not fully capture the complexity of the market. On the other hand, univariate models only consider a single dependent variable, ignoring the other dimensions.

Even with good features, predicting financial prices can be difficult because patterns and market rules may change frequently. As a result, models may make mistakes. However, as Georg Box famously said, “All models are wrong, but some are useful.” Despite their limitations, multivariate models can provide a more detailed representation of reality compared to univariate models, and can still be useful in forecasting financial prices.

Multivariate Time Series Prediction

Implementing a Multivariate Time Series Prediction Model in Python

Now that we have a solid understanding of multivariate time series forecasting, it’s time to put our knowledge into practice by building a model using Python and TensorFlow. Specifically, we will create a multivariate recurrent neural network (RNN) to predict the NASDAQ stock market index. RNNs are well-suited for time series forecasting because they can process sequential data, considering the dependencies between past and future events.

To build our RNN model, we will need to go through several essential steps.

Creating features and scaling: This involves loading and preparing the time series data for modeling, including selecting the time period and relevant features and scaling the data.
Splitting the data: We split the data into train and test sets.
Sliding window approach: The time series data is sliced into mini-batches using the sliding window approach.
Model design and training: The appropriate architecture for the RNN model is chosen, and an optimization algorithm is used to adjust the model’s weights and biases to minimize prediction error.
Model validation and predictions: We evaluate the model’s performance by comparing the predicted values to the actual values of the NASDAQ index. In addition, we will use the model to make predictions about future events.
Unscaling the predictions: The predictions are unscaled to bring them back to their original scale.

The code is available on the GitHub repository.

View on GitHub Relataly Github Repo

Six Essential Steps for Developing a Multivariate Time Series Model

Prerequisites

Before starting the coding part, make sure that you have set up your Python 3 environment and required packages. If you don’t have a Python environment, follow the steps in this tutorial to set up the Anaconda environment.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:

In addition, we will be using Keras (2.0 or higher) with Tensorflow backend, the machine learning library sci-kit-learn, and the pandas-DataReader.

You can install packages using console commands:

pip install
conda install (if you are using the anaconda packet manager)

Step #1 Load the Time Series Data

Let’s start by loading price data on the NASDAQ composite index (symbol: ^IXIC) from yahoo.finance.com into our Python project. To download the data, we use Pandas DataReader – a popular Python library that provides functions to extract data from various sources on the web. Alternatively, you can also use the “yfinance” library.

We provide the technical symbol for the NASDAQ index, “^IXIC.” Alternatively, you could use other asset symbols, for example, BTC-USD, to get price quotes for Bitcoin. In addition, we limit the data in the API request to the timeframe between 2010-01-01 and the current date.

Running the code below will load the data into a new DataFrame object. Be aware that input data and predictions will vary depending on when you execute the code.

# Time Series Forecasting - Multivariate Time Series Models for Stock Market Prediction

import math # Mathematical functions 
import numpy as np # Fundamental package for scientific computing with Python
import pandas as pd # Additional functions for analysing and manipulating data
from datetime import date, timedelta, datetime # Date Functions
from pandas.plotting import register_matplotlib_converters # This function adds plotting functions for calender dates
import matplotlib.pyplot as plt # Important package for visualization - we use this to plot the market data
import matplotlib.dates as mdates # Formatting dates
import tensorflow as tf
from sklearn.metrics import mean_absolute_error, mean_squared_error # Packages for measuring model performance / errors
from tensorflow.keras import Sequential # Deep learning library, used for neural networks
from tensorflow.keras.layers import LSTM, Dense, Dropout # Deep learning classes for recurrent and regular densely-connected layers
from tensorflow.keras.callbacks import EarlyStopping # EarlyStopping during model training
from sklearn.preprocessing import RobustScaler, MinMaxScaler # This Scaler removes the median and scales the data according to the quantile range to normalize the price data 
import seaborn as sns # Visualization
sns.set_style('white', { 'axes.spines.right': False, 'axes.spines.top': False})

# check the tensorflow version and the number of available GPUs
print('Tensorflow Version: ' + tf.__version__)
physical_devices = tf.config.list_physical_devices('GPU')
print("Num GPUs:", len(physical_devices))

# Setting the timeframe for the data extraction
end_date =  date.today().strftime("%Y-%m-%d")
start_date = '2010-01-01'

# Getting NASDAQ quotes
stockname = 'NASDAQ'
symbol = '^IXIC'

# You can either use webreader or yfinance to load the data from yahoo finance
# import pandas_datareader as webreader
# df = webreader.DataReader(symbol, start=start_date, end=end_date, data_source="yahoo")

import yfinance as yf #Alternative package if webreader does not work: pip install yfinance
df = yf.download(symbol, start=start_date, end=end_date)

# Create a quick overview of the dataset
df.head()

Tensorflow Version: 2.5.0
Num GPUs: 1
[*********************100%***********************]  1 of 1 completed
			Open		High		Low			Close		Adj Close	Volume
Date						
2009-12-31	2292.919922	2293.590088	2269.110107	2269.149902	2269.149902	1237820000
2010-01-04	2294.409912	2311.149902	2294.409912	2308.419922	2308.419922	1931380000
2010-01-05	2307.270020	2313.729980	2295.620117	2308.709961	2308.709961	2367860000
2010-01-06	2307.709961	2314.070068	2295.679932	2301.090088	2301.090088	2253340000
2010-01-07	2298.090088	2301.300049	2285.219971	2300.050049	2300.050049	2270050000

The data looks as expected and has the following columns:

High – the daily high
Low – the daily low
Open – the opening price
Close – the closing price
Volume – the daily trading volume
Adj Close – the adjacent closing price

Step #2 Explore the Data

Let’s first familiarize ourselves with the data before processing them further. Line plots are an excellent choice to gain a quick overview of time series data. By running the code below, we loop over the columns to plot a line chart for each column of the dataframe.

# Plot line charts
df_plot = df.copy()

ncols = 2
nrows = int(round(df_plot.shape[1] / ncols, 0))

fig, ax = plt.subplots(nrows=nrows, ncols=ncols, sharex=True, figsize=(14, 7))
for i, ax in enumerate(fig.axes):
        sns.lineplot(data = df_plot.iloc[:, i], ax=ax)
        ax.tick_params(axis="x", rotation=30, labelsize=10, length=0)
        ax.xaxis.set_major_locator(mdates.AutoDateLocator())
fig.tight_layout()
plt.show()

The line plots look as expected. We continue with preprocessing and feature engineering.

Step #3 Feature Selection and Scaling

Before we can train the neural network, we need to transform the data into a processable shape. In this section, we perform the following tasks:

Selecting features
Scaling the data to a standard value range

3.1 Selecting Features

First, we will select the features upon which we want to train our neural network. The selection and engineering of relevant feature variables is a complex topic. We could also create additional features such as moving averages, but I want to keep things simple. Therefore, we select features that are already present in our data. To learn more about feature engineering for stock market prediction, check out the relataly feature engineering tutorial.

Feature Selection of Multivariate Time Series Models

Running the code below selects the features. We add a dummy column to our record called “Predictions,” which will help us later when we need to reverse the scaling of our data.

# Indexing Batches
train_df = df.sort_values(by=['Date']).copy()

# List of considered Features
FEATURES = ['High', 'Low', 'Open', 'Close', 'Volume'
            #, 'Month', 'Year', 'Adj Close'
           ]

print('FEATURE LIST')
print([f for f in FEATURES])

# Create the dataset with features and filter the data to the list of FEATURES
data = pd.DataFrame(train_df)
data_filtered = data[FEATURES]

# We add a prediction column and set dummy values to prepare the data for scaling
data_filtered_ext = data_filtered.copy()
data_filtered_ext['Prediction'] = data_filtered_ext['Close']

# Print the tail of the dataframe
data_filtered_ext.tail()

FEATURE LIST
['High', 'Low', 'Open', 'Close', 'Volume']
			High			Low				Open			Close			Volume		Prediction
Date						
2022-05-09	11990.610352	11574.940430	11923.030273	11623.250000	5911380000	11623.250000
2022-05-10	11944.940430	11566.280273	11900.339844	11737.669922	6199090000	11737.669922
2022-05-11	11844.509766	11339.179688	11645.570312	11364.240234	6120860000	11364.240234
2022-05-12	11547.330078	11108.759766	11199.250000	11370.959961	6647400000	11370.959961
2022-05-13	11856.709961	11510.259766	11555.969727	11805.000000	5868610000	11805.000000

3.2 Scaling the Multivariate Input Data

Another necessary step in data preparation for neural networks is scaling the input data. Scaling will increase training times and improve model accuracy. The scikit-learn package offers different scaling approaches. We use the MinMaxScaler to scale the input data to a range between 0 and 1.

A model that is trained on scaled data will also produce scaled predictions. Therefore, when we make predictions later with our model, we must not forget to scale the predictions back. The scaler_model will adapt to the shape of the data (6-dimensional). However, our predictions will be one-dimensional. Because the scaler has a fixed input shape, we cannot simply reuse it for unscaling our model predictions. To unscale the predictions later, we create an additional scaler that works on a single feature column (scaler_pred).

# Get the number of rows in the data
nrows = data_filtered.shape[0]

# Convert the data to numpy values
np_data_unscaled = np.array(data_filtered)
np_data = np.reshape(np_data_unscaled, (nrows, -1))
print(np_data.shape)

# Transform the data by scaling each feature to a range between 0 and 1
scaler = MinMaxScaler()
np_data_scaled = scaler.fit_transform(np_data_unscaled)

# Creating a separate scaler that works on a single column for scaling predictions
scaler_pred = MinMaxScaler()
df_Close = pd.DataFrame(data_filtered_ext['Close'])
np_Close_scaled = scaler_pred.fit_transform(df_Close)

Out: (2619, 6)

Step #4 Transforming the Multivariate Data

Next, we train our multivariate regression model based on a three-dimensional data structure. The first dimension is the sequences, the second dimension is the time steps (mini-batches), and the third dimension is the features. The illustration below shows the steps to bring the multivariate data into a shape our neural model can process during training. We must keep this form and perform the same steps when using the model to create a forecast.

An essential step in the preparation process is slicing the data into multiple input data sequences with associated target values. We write a simple Python script that uses a “sliding window.” This approach moves a window through the time series data, adding a sequence of multiple data points to the input data with each step. The target value (e.g., Closing Price) follows this sequence, and we store it in a separate target dataset. Then we push the window one step further and repeat these activities. This process results in a data set with many input sequences (mini-batches), each with a corresponding target value in the target record. This process applies both to the training and the test data.

Sliding Window

We will apply the sliding window approach to our data. The result is a training set (x_train) containing 2258 input sequences, each with 50 steps and six features. The related target dataset (y_train) has 2258 target values.

# Set the sequence length - this is the timeframe used to make a single prediction
sequence_length = 50

# Prediction Index
index_Close = data.columns.get_loc("Close")

# Split the training data into train and train data sets
# As a first step, we get the number of rows to train the model on 80% of the data 
train_data_len = math.ceil(np_data_scaled.shape[0] * 0.8)

# Create the training and test data
train_data = np_data_scaled[0:train_data_len, :]
test_data = np_data_scaled[train_data_len - sequence_length:, :]

# The RNN needs data with the format of [samples, time steps, features]
# Here, we create N samples, sequence_length time steps per sample, and 6 features
def partition_dataset(sequence_length, data):
    x, y = [], []
    data_len = data.shape[0]
    for i in range(sequence_length, data_len):
        x.append(data[i-sequence_length:i,:]) #contains sequence_length values 0-sequence_length * columsn
        y.append(data[i, index_Close]) #contains the prediction values for validation,  for single-step prediction
    
    # Convert the x and y to numpy arrays
    x = np.array(x)
    y = np.array(y)
    return x, y

# Generate training data and test data
x_train, y_train = partition_dataset(sequence_length, train_data)
x_test, y_test = partition_dataset(sequence_length, test_data)

# Print the shapes: the result is: (rows, training_sequence, features) (prediction value, )
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# Validate that the prediction value and the input match up
# The last close price of the second input sample should equal the first prediction value
print(x_train[1][sequence_length-1][index_Close])
print(y_train[0])

(2474, 50, 5) (2474,)
(630, 50, 5) (630,)
0.02049456793614579
0.02049456793614579

Step #5 Train the Multivariate Prediction Model

Once we have the data prepared and ready, we can train our model. The architecture of our neural network consists of the following four layers:

An LSTM layer, which takes our mini-batches as input and returns the whole sequence
Another LSTM layer that takes the sequence from the previous layer but only returns five values
Dense layer with five neurons
A final dense layer that outputs the predicted value

The number of neurons in the first layer must equal the size of a minibatch of the input data. Each minibatch in our dataset consists of a matrix with 50 steps and six features. Thus, the input layer of our recurrent neural network consists of 300 neurons. Keeping this architecture in mind is essential because, later, we need to bring the data into the same shape when we want to predict a new dataset. Running the code below creates the model architecture and compiles the model.

# Configure the neural network model
model = Sequential()

# Model with n_neurons = inputshape Timestamps, each with x_train.shape[2] variables
n_neurons = x_train.shape[1] * x_train.shape[2]
print(n_neurons, x_train.shape[1], x_train.shape[2])
model.add(LSTM(n_neurons, return_sequences=True, input_shape=(x_train.shape[1], x_train.shape[2]))) 
model.add(LSTM(n_neurons, return_sequences=False))
model.add(Dense(5))
model.add(Dense(1))

# Compile the model
model.compile(optimizer='adam', loss='mse')

Running the code below starts the training process.

# Training the model
epochs = 50
batch_size = 16
early_stop = EarlyStopping(monitor='loss', patience=5, verbose=1)
history = model.fit(x_train, y_train, 
                    batch_size=batch_size, 
                    epochs=epochs,
                    validation_data=(x_test, y_test)
                   )
                    
                    #callbacks=[early_stop])

Output exceeds the size limit. Open the full output data in a text editor
Epoch 1/50
155/155 [==============================] - 6s 19ms/step - loss: 6.7374e-04 - val_loss: 0.0011
Epoch 2/50
155/155 [==============================] - 2s 13ms/step - loss: 6.6207e-05 - val_loss: 8.4700e-04
Epoch 3/50
155/155 [==============================] - 2s 14ms/step - loss: 5.0667e-05 - val_loss: 6.6467e-04
Epoch 4/50
155/155 [==============================] - 2s 14ms/step - loss: 5.8446e-05 - val_loss: 6.8575e-04
Epoch 5/50
155/155 [==============================] - 2s 14ms/step - loss: 4.8430e-05 - val_loss: 8.4892e-04
Epoch 6/50
155/155 [==============================] - 2s 14ms/step - loss: 7.1283e-05 - val_loss: 8.2255e-04
Epoch 7/50
155/155 [==============================] - 2s 15ms/step - loss: 6.0554e-05 - val_loss: 8.0583e-04
Epoch 8/50
155/155 [==============================] - 2s 15ms/step - loss: 5.5977e-05 - val_loss: 4.5830e-04
Epoch 9/50
155/155 [==============================] - 2s 15ms/step - loss: 4.2453e-05 - val_loss: 6.1866e-04
Epoch 10/50
155/155 [==============================] - 2s 14ms/step - loss: 3.5722e-05 - val_loss: 4.5288e-04
Epoch 11/50
155/155 [==============================] - 2s 14ms/step - loss: 4.1409e-05 - val_loss: 8.5975e-04
Epoch 12/50
155/155 [==============================] - 3s 18ms/step - loss: 7.0007e-05 - val_loss: 5.0300e-04
Epoch 13/50
...
Epoch 49/50
155/155 [==============================] - 2s 13ms/step - loss: 2.7064e-05 - val_loss: 2.8202e-04
Epoch 50/50
155/155 [==============================] - 2s 13ms/step - loss: 2.9009e-05 - val_loss: 2.5486e-04

Let’s take a quick look at the loss curve.

# Plot training & validation loss values
fig, ax = plt.subplots(figsize=(16, 5), sharex=True)
sns.lineplot(data=history.history["loss"])
plt.title("Model loss")
plt.ylabel("Loss")
plt.xlabel("Epoch")
ax.xaxis.set_major_locator(plt.MaxNLocator(epochs))
plt.legend(["Train", "Test"], loc="upper left")
plt.grid()
plt.show()

The loss drops quickly to a lower plateau, which signals that the model has improved throughout the training process.

Step #6 Evaluate Model Performance

Once we have trained the neural network regression model, we want to measure its performance. As mentioned in section 3, we first have to reverse the scaling of the predictions. Afterward, we calculate different error metrics, MAE, MAPE, and MDAPE. Then we will compare the predictions in a line plot with the actual values. For more information on measuring the performance of regression models, see this relataly article.

# Get the predicted values
y_pred_scaled = model.predict(x_test)

# Unscale the predicted values
y_pred = scaler_pred.inverse_transform(y_pred_scaled)
y_test_unscaled = scaler_pred.inverse_transform(y_test.reshape(-1, 1))

# Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_unscaled, y_pred)
print(f'Median Absolute Error (MAE): {np.round(MAE, 2)}')

# Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled))) * 100
print(f'Mean Absolute Percentage Error (MAPE): {np.round(MAPE, 2)} %')

# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled)) ) * 100
print(f'Median Absolute Percentage Error (MDAPE): {np.round(MDAPE, 2)} %')

Median Absolute Error (MAE): 175.28
Mean Absolute Percentage Error (MAPE): 1.48 %
Median Absolute Percentage Error (MDAPE): 1.16 %

The MAPE is 22.15, which means that the mean of our predictions deviates from the actual values by 3.12%. The MDAPE is 2.88 % and a bit lower than the mean, thus indicating there are some outliers among the prediction errors. 50% of the predictions deviate by more than 2.88%, and 50% differ by less than 2.88% from the actual values.

Next, we create a line plot showing the forecast and compare it to the actual values. Adding a bar plot to the chart helps highlight the deviations of the predictions from the actual values. Running the code below creates the line plot.

# The date from which on the date is displayed
display_start_date = "2019-01-01" 

# Add the difference between the valid and predicted prices
train = pd.DataFrame(data_filtered_ext['Close'][:train_data_len + 1]).rename(columns={'Close': 'y_train'})
valid = pd.DataFrame(data_filtered_ext['Close'][train_data_len:]).rename(columns={'Close': 'y_test'})
valid.insert(1, "y_pred", y_pred, True)
valid.insert(1, "residuals", valid["y_pred"] - valid["y_test"], True)
df_union = pd.concat([train, valid])

# Zoom in to a closer timeframe
df_union_zoom = df_union[df_union.index > display_start_date]

# Create the lineplot
fig, ax1 = plt.subplots(figsize=(16, 8))
plt.title("y_pred vs y_test")
plt.ylabel(stockname, fontsize=18)
sns.set_palette(["#090364", "#1960EF", "#EF5919"])
sns.lineplot(data=df_union_zoom[['y_pred', 'y_train', 'y_test']], linewidth=1.0, dashes=False, ax=ax1)

# Create the bar plot with the differences
df_sub = ["#2BC97A" if x > 0 else "#C92B2B" for x in df_union_zoom["residuals"].dropna()]
ax1.bar(height=df_union_zoom['residuals'].dropna(), x=df_union_zoom['residuals'].dropna().index, width=3, label='residuals', color=df_sub)
plt.legend()
plt.show()

The line plot shows that the forecast is close to the actual values but partially deviates from it. The deviations between actual values and predictions are called residuals. For our mode, they seem to be most significant during periods of increased market volatility and least during periods of steady market movement, which makes sense because sudden movements are generally more difficult to predict.

Step #7 Predict the Next Day’s Price

After training the neural network, we want to forecast the stock market for the next day. For this purpose, we extract a new dataset from the Yahoo-Finance API and preprocess it as we did for model training.

We trained our model with mini-batches of 50-steps and six features. Thus, we must also provide the model with 50-steps when making the forecast. As before, we transform the data into the shape of 1 x 50 x 6, whereby the last figure is the number of feature columns. After generating the forecast, we unscale the stock market predictions back to the original range of values.

df_temp = df[-sequence_length:]
new_df = df_temp.filter(FEATURES)

N = sequence_length

# Get the last N day closing price values and scale the data to be values between 0 and 1
last_N_days = new_df[-sequence_length:].values
last_N_days_scaled = scaler.transform(last_N_days)

# Create an empty list and Append past N days
X_test_new = []
X_test_new.append(last_N_days_scaled)

# Convert the X_test data set to a numpy array and reshape the data
pred_price_scaled = model.predict(np.array(X_test_new))
pred_price_unscaled = scaler_pred.inverse_transform(pred_price_scaled.reshape(-1, 1))

# Print last price and predicted price for the next day
price_today = np.round(new_df['Close'][-1], 2)
predicted_price = np.round(pred_price_unscaled.ravel()[0], 2)
change_percent = np.round(100 - (price_today * 100)/predicted_price, 2)

plus = '+'; minus = ''
print(f'The close price for {stockname} at {end_date} was {price_today}')
print(f'The predicted close price is {predicted_price} ({plus if change_percent > 0 else minus}{change_percent}%)')

The close price for NASDAQ on 2021-06-27 was 14360.39. The predicted closing price is 14232.8095703125 (-0.9%)

Summary

This tutorial has shown multivariate time series modeling for stock market prediction in Python. We trained a neural network regression model for predicting the NASDAQ index. Before training our model, we performed several steps to prepare the data. The steps included splitting the data and scaling them. In addition, we created and tested various new features from the original time series data to account for the multivariate modeling approach. You now have the knowledge and code to conduct further experiments with the features of your choice.

Multivariate time series forecasting is a complex topic. You might want to take the time to retrace the different steps. Especially the transformation of the data can be challenging. The best way to learn is to practice. Therefore I encourage you to develop more time series models and experiment with other data sources.

Another interesting approach to stock market prediction uses candlestick images and convolutional neural networks. If this topic interests you, check out the following article: Deep reinforcement learning stock market trading, utilizing a CNN with candlestick images

I am always trying to learn and improve. If you want to give feedback or have remarks, feel free to share them in the comments.

Stockmarket forecasting with a neural network is about identifying meaningful patterns. Be aware that there is no guarantee that these patterns are present in the data. Image created with Midjourney.

Sources and Further Reading

The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.

The post Stock Market Prediction using Multivariate Time Series and Recurrent Neural Networks in Python appeared first on relataly.com.

Rolling Time Series Forecasting: Creating a Multi-Step Prediction for a Rising Sine Curve using Neural Networks in Python

Florian Follonier — Sun, 19 Apr 2020 09:59:49 +0000

Many time forecasting problems can be solved by predicting just one step into the future. However, some problems require a forecast for an extended period of time, which calls for a multi-step time series forecasting approach. This approach involves modeling the distribution of future values of a signal over a prediction horizon. In this article, we use the rising sine curve as an example to demonstrate how to apply a multi-step prediction approach using Keras neural networks with LSTM layers in Python. We create a rolling forecast for the sine curve by generating several single-step predictions and iteratively using them as input to predict further steps in the future.

The remainder of this article proceeds as follows: We begin by looking at the sine curve problem and provide a quick intro to recurrent neural networks. After this conceptual introduction, we turn to the hands-on part in Python. We generate synthetic sine curve data and preprocess the data to train univariate models using a Keras neural network. We thereby experiment with different architectures and hyperparameters. Then, we use these model variants to create rolling multi-step forecasts. Finally, we evaluate the model performance.

A multi-step time series forecast for a rising sine curve, as we will create in this article.

If you are just getting started with time-series forecasting, we have covered the single-step forecasting approach in two previous articles:

Predicting Stock Markets with Neural Networks – A Step-by-Step Guide
Stock Market Prediction – Adjusting Time Series Prediction Intervals in Python

The Problem of a Rising Sine Curve

The line plot below illustrates the sample of a rising sine curve. The goal is to use the ground truth values (blue) to forecast several points in this curve (purple).

Traditional mathematical methods can resolve this function by decomposing it into constituent parts. Thus, we could easily foresee the further course of the curve. But in practice, the function might not be precisely periodic and change over a more extended period. Therefore, recurrent neural networks can achieve better results than traditional mathematical approaches, especially when they train on extensive data.

Also: Stock Market Prediction using Univariate Recurrent Neural Networks (RNN) with Python

Application Domains with Similar Time Series Forecasting Problems

A rising sine wave may sound like an abstract problem at first. However, similar issues are widespread. Imagine you are the owner of an online shop. The users who visit your shop and buy something fluctuate depending on the time and the weekday. For instance, there are fewer visitors at night, and on weekends, the number of visitors rises sharply. At the same time, the overall number of users increases over a more extended period as the shop becomes known to a broader audience. To plan the number of goods to be held in stock, you need to see the number of orders at any point in time over several weeks. It is a typical multi-step time series problem, and similar problems exist in various domains:

Healthcare: e.g., forecasting of health signals such as heart, blood, or breathing signals
Network Security: e.g., analysis of network traffic in intrusion detection systems
Sales and Marketing: e.g., forecasting of market demand
Production demand forecasting: e.g., for power consumption and capacity planning
Prediction and filtering of sensor signals, e.g., of audio signals

A time series forecasting problem: predicting sine curve data

Recurrent Neural Networks for Time Series Forecasting

The model used in this article is a recurrent neural network with Long short-term memory (LSTM) layers. Unlike feedforward neural networks, recurrent networks with LSTM layers have loops to pass output values from one training instance to the next. LSTM layers are a powerful and widely-used tool for deep learning, and they work particularly well for time series data. By using LSTM layers, it is possible to train machine learning models that can make accurate predictions based on time series data, which can be useful for a wide range of applications, including finance, weather forecasting, and many others.

Also: Mastering Multivariate Stock Market Prediction with Python: A Guide to Effective Feature Engineering Techniques

The Training Process of a Recurrent Neural Network

When training neural networks, the process typically involves multiple epochs, where an epoch refers to a single pass through the entire training dataset. During each epoch, the neural network receives the entire training data and adjusts its weights accordingly through forward and backward propagation. The batch size, which determines the number of examples passed through the network at once, also affects how the weights are updated between neurons.

It’s important to note that one epoch is usually not enough for the model to learn the underlying patterns and relationships in the data, leading to underfitting and poor performance in prediction tasks. Hence, multiple epochs are often needed to fine-tune the network and improve its predictive capabilities. However, care must be taken not to overfit the model by choosing too many epochs. Overfitting occurs when the model becomes too complex and performs well on the training data but poorly on any other data, resulting in poor generalization.

To avoid overfitting, we can employ various techniques such as early stopping, where the training process is stopped when the validation loss begins to increase, indicating that the model is overfitting. Another method is to use regularization techniques such as L1 or L2 regularization, dropout, or data augmentation to prevent overfitting and improve the model’s generalization capabilities.

Functioning of an LSTM layer

An LSTM (Long Short-Term Memory) layer is a specialized type of recurrent neural network (RNN) layer that is specifically designed for processing and making predictions based on time series data. Unlike traditional RNNs, which suffer from the vanishing gradient problem, LSTM layers can maintain a memory of past events and selectively use this memory to make predictions about future events.

LSTM layers accomplish this by incorporating several unique mechanisms, such as gates and cell states. The gates control the flow of information into and out of the cell, while the cell state represents the memory of the network. These mechanisms work together to enable LSTM layers to learn and make predictions based on long-term dependencies in the data, which is a characteristic of many time series datasets.

One of the key advantages of using LSTM layers for time series forecasting is their ability to generate predictions for multiple timesteps. This is achieved by iteratively reusing the model outputs from the previous training run as input for the next prediction step. This approach, known as a rolling forecast, allows the model to make accurate predictions for multiple time steps into the future.

Functioning of an LSTM layer

LSTM Components

To understand how an LSTM layer works, it is helpful to think about the layer as consisting of several different components, each of which has a specific role in the overall operation of the layer. These components include the following:

Input gate: The input gate controls which information from the input data will be passed on to the cell state. The input gate uses a sigmoid activation function to determine which information should be retained and which should be discarded.
Forget gate: The forget gate controls which information from the cell state will be discarded. The forget gate uses a sigmoid activation function to determine which information should be forgotten and which should be retained.
Cell state: The cell state is a vector that contains the information that is retained by the LSTM layer. This information is updated at each time step based on the input from the input gate and the forget gate.
Output gate: The output gate controls which information from the cell state will be passed on to the output of the LSTM layer. The output gate uses a sigmoid activation function to determine which information should be retained and which should be discarded.

This article uses LSTM layers combined with a rolling forecast approach to predict the course of a sinus curve with a linear slope. The result is a multi-step time series forecast.

Creating a Rolling Multi-Step Time Series Forecast in Python

In this tutorial, we will explore the process of creating a rolling multi-step forecast using Python. Our dataset for this exercise will be a synthetically generated rising sine curve, which we will use to demonstrate the principles of multi-step time series forecasting.

By following the steps outlined in this tutorial, you will gain a comprehensive understanding of the process involved in multi-step time series forecasting. This will include techniques for data preprocessing, feature engineering, and model selection.

To get started, we have made the code available on our GitHub repository. You can easily access and follow along with the tutorial by downloading the code and running it on your local machine. This will enable you to experiment with different parameters and modify the code to suit your specific requirements.

View on GitHub Relataly GitHub Repo

Prerequisites

Before starting the coding part, make sure that you have set up your Python 3 environment and required packages. If you don’t have an environment yet, you can follow this tutorial to set up the Anaconda environment.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:

In addition, we will be using Keras (2.0 or higher) with Tensorflow backend and the machine learning library Scikit-learn.

You can install packages using console commands:

pip install
conda install (if you are using the anaconda packet manager)

Step #1 Generating Synthetic Data

We begin by loading the required packages and creating a synthetic dataset. The synthetic data contains 300 values of the sinus function combined with a slight linear upward slope of 0.02. The code below creates the data and visualizes it in a line plot.

# A tutorial for this file is available at www.relataly.com

import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, TimeDistributed, Dropout, Activation
from sklearn.preprocessing import RobustScaler
from sklearn.metrics import mean_absolute_error
import tensorflow as tf
import seaborn as sns
sns.set_style('white', { 'axes.spines.right': False, 'axes.spines.top': False})

# check the tensorflow version and the number of available GPUs
print('Tensorflow Version: ' + tf.__version__)
physical_devices = tf.config.list_physical_devices('GPU')
print("Num GPUs:", len(physical_devices))

# Creating the synthetic sinus curve dataset
steps = 300
gradient = 0.02
list_a = []
for i in range(0, steps, 1):
    y = round(gradient * i + math.sin(math.pi * 0.125 * i), 5)
    list_a.append(y)
df = pd.DataFrame({"sine_curve": list_a}, columns=["sine_curve"])

# Visualizing the data
fig, ax1 = plt.subplots(figsize=(16, 4))
ax1.xaxis.set_major_locator(plt.MaxNLocator(30))
plt.title("Sinus Data")
sns.lineplot(data=df[["sine_curve"]], color="#039dfc")
plt.legend()
plt.show()

The signal curve oscillates and is steadily moving upward.

Step #2 Preprocessing

Next, we preprocess the data to bring it into the shape required by our neural network model. Running the code below will perform the following tasks:

Scale the data to a standard range.
Partition the data into multiple periods with a predefined sequence length. Each partition series contains 110 data points. The number of steps needs to match the number of neurons in the first layer of the neural network architecture.
Split the data into two separate datasets for training and testing.

Also: Mastering Multivariate Stock Market Prediction with Python: A Guide to Effective Feature Engineering Techniques

data = df.copy()

# Get the number of rows in the data
nrows = df.shape[0]

# Convert the data to numpy values
np_data_unscaled = np.array(df)
np_data_unscaled = np.reshape(np_data_unscaled, (nrows, -1))
print(np_data_unscaled.shape)

# Transform the data by scaling each feature to a range between 0 and 1
scaler = RobustScaler()
np_data_scaled = scaler.fit_transform(np_data_unscaled)

# Set the sequence length - this is the timeframe used to make a single prediction
sequence_length = 110

# Prediction Index
index_Close = 0

# Split the training data into train and train data sets
# As a first step, we get the number of rows to train the model on 80% of the data 
train_data_len = math.ceil(np_data_scaled.shape[0] * 0.8)

# Create the training and test data
train_data = np_data_scaled[0:train_data_len, :]
test_data = np_data_scaled[train_data_len - sequence_length:, :]

# The RNN needs data with the format of [samples, time steps, features]
# Here, we create N samples, sequence_length time steps per sample, and 6 features
def partition_dataset(sequence_length, data):
    x, y = [], []
    data_len = data.shape[0]
    for i in range(sequence_length, data_len):
        x.append(data[i-sequence_length:i,:]) #contains sequence_length values 0-sequence_length * columsn
        y.append(data[i, index_Close]) #contains the prediction values for validation (3rd column = Close),  for single-step prediction
    
    # Convert the x and y to numpy arrays
    x = np.array(x)
    y = np.array(y)
    return x, y

# Generate training data and test data
x_train, y_train = partition_dataset(sequence_length, train_data)
x_test, y_test = partition_dataset(sequence_length, test_data)

# Print the shapes: the result is: (rows, training_sequence, features) (prediction value, )
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# Validate that the prediction value and the input match up
# The last close price of the second input sample should equal the first prediction value
print(x_test[1][sequence_length-1][index_Close])
print(y_test[0])

(130, 110, 1) (130,)
(60, 110, 1) (60,)
0.599655121905751
0.599655121905751

Step #3 Preparing Data for the Multi-Step Time Series Forecasting Model

Now that we have prepared the synthetic data, we need to determine the architecture of our neural network. There are no clear guidelines available on how to configure a neural network. Most problems are unique, and the model settings depend on the nature and extent of the data. Thus, configuring LSTM layers can be a real challenge.

3.1 Overview of Model Parameters

Finding the optimal configuration is often a process of trial and error. Below you find a list of model parameters with which you can experiment:

Keras model parameters used in this tutorial
epoch: An epoch is an iteration over the entire x_train and y_train data provided.
Batch_size: Number of samples per gradient update. If unspecified, batch_size is 32.
Activation: Mathematical equations determine whether a neuron in the network should activate (“fired”) or not.
Input_shape: Tensor with shape: (batch_size, …, input_dim).
Input_len: the length of the generated input sequence.
Return_sequences: If set to false, the layer will return the final output in the output sequence. If true, the layer returns the entire series.
Loss: The loss function tells the model how to evaluate its performance so that the weights can be updated to reduce the loss on the following evaluation.
Optimizer: Every time a neural network finishes processing a batch through the network and generates prediction results, the optimizer decides how to adjust weights between the nodes.

Source: keras.io/layers/recurrent – view the Keras documentation for a complete list of parameters.

3.2 Choosing Model Parameters

Finding a suitable architecture for a neural network is not easy and requires a systematic approach. A common practice is to start with an established standard architecture. We can then change the model parameters slightly and record the configurations and outcomes. We can also automate configuring and testing the model (Hyperparameter Tuning). Still, because this is not the focus of this article, we will use a manual approach.

We start with five epochs, a batch_size of 1, and configure our recurrent model with one LSTM layer with 110 neurons, corresponding to a data slice of 110 input values. In addition, we add a dense layer that provides us with a single output value.

# Configure the neural network model
epochs = 12; batch_size = 1;

# Model with n_neurons = inputshape Timestamps, each with x_train.shape[2] variables
n_neurons = x_train.shape[1] * x_train.shape[2]
model = Sequential()
model.add(LSTM(n_neurons, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(LSTM(n_neurons, return_sequences=False))
model.add(Dense(5))
model.add(Dense(1))
model.compile(optimizer="adam", loss="mean_squared_error")

Step #3 Training the Prediction Model

Now that we have defined the architecture of our recurrent neural network, the next step is to train the model. Training the model involves using a training dataset to adjust the weights and biases of the network so that it can make accurate predictions on new data. This process is done using an optimization algorithm, such as gradient descent, to minimize the difference between the predicted values and the true values.

Training can be time-consuming, especially for large datasets or complex models, and the amount of time it takes may vary depending on the performance of your computer. The complexity of our model is relatively low, and we only use five training epochs. It should thus only take a couple of minutes to train the model.

It is important to monitor the training process to ensure that the model is learning effectively and to identify any potential issues or problems. Once the training is complete, we can test the model on a separate test dataset to evaluate its performance.

# Train the model
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs)

Epoch 1/12
130/130 [==============================] - 7s 28ms/step - loss: 0.0544
Epoch 2/12
130/130 [==============================] - 4s 27ms/step - loss: 0.0041
Epoch 3/12
130/130 [==============================] - 4s 28ms/step - loss: 3.5105e-04
Epoch 4/12
130/130 [==============================] - 4s 28ms/step - loss: 4.3903e-05
Epoch 5/12
130/130 [==============================] - 4s 28ms/step - loss: 5.9422e-05
Epoch 6/12
130/130 [==============================] - 4s 28ms/step - loss: 5.7988e-05
Epoch 7/12
130/130 [==============================] - 4s 28ms/step - loss: 8.3814e-04
Epoch 8/12
130/130 [==============================] - 4s 28ms/step - loss: 1.9122e-04
Epoch 9/12
130/130 [==============================] - 4s 28ms/step - loss: 0.0014
Epoch 10/12
130/130 [==============================] - 4s 29ms/step - loss: 0.0114
Epoch 11/12
130/130 [==============================] - 4s 28ms/step - loss: 0.0013
Epoch 12/12
130/130 [==============================] - 4s 28ms/step - loss: 3.4571e-05

Step #4 Predicting a Single-step Ahead

We continue by making single-step predictions based on the training data. Then we calculate the mean squared error and the median error to measure the performance of our model.

# Reshape the data, so that we get an array with multiple test datasets
x_test_np = np.array(x_test)
x_test_reshape = np.reshape(x_test_np, (x_test_np.shape[0], x_test_np.shape[1], 1))

# Get the predicted values
y_pred = model.predict(x_test_reshape)
y_pred_unscaled = scaler.inverse_transform(y_pred)
y_test_unscaled = scaler.inverse_transform(y_test.reshape(-1, 1))

# Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_unscaled, y_pred_unscaled)
print(f'Median Absolute Error (MAE): {np.round(MAE, 2)}')

# Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(y_test_unscaled, y_pred_unscaled)/ y_test_unscaled))) * 100
print(f'Mean Absolute Percentage Error (MAPE): {np.round(MAPE, 2)} %')

# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(y_test_unscaled, y_pred_unscaled)/ y_test_unscaled)) ) * 100
print(f'Median Absolute Percentage Error (MDAPE): {np.round(MDAPE, 2)} %')

Out: me: 0.0223, rmse: 0.0206

Both the mean error and the squared mean error are pretty small. As a result, the values predicted by our model are close to the actual values of the ground truth. Even though it is unlikely because of the small number of epochs, it could still be that the model is over-fitting.

Step #5 Visualizing Predictions and Loss

Next, we look at the quality of the training predictions by plotting the training predictions and the actual values (i.e., ground truth).

# Visualize the data
#train = df[:train_data_len]
df_valid_pred = df[train_data_len:]
df_valid_pred.insert(1, "y_pred", y_pred_unscaled, True)

# Create the lineplot
fig, ax1 = plt.subplots(figsize=(32, 5), sharex=True)
ax1.tick_params(axis="x", rotation=0, labelsize=10, length=0)
plt.title("Predictions vs Ground Truth")
sns.lineplot(data=df_valid_pred)
plt.show()

The smaller the area between the two lines, the better our model predictions. So we can tell from the plot that the predictions are not entirely wrong. We also check the learning path of the regression model.

# Plot training & validation loss values
fig, ax = plt.subplots(figsize=(10, 5), sharex=True)
sns.lineplot(data=history.history["loss"])
plt.title("Model loss")
plt.ylabel("Loss")
plt.xlabel("Epoch")
ax.xaxis.set_major_locator(plt.MaxNLocator(epochs))
plt.legend(["Train"], loc="upper left")
plt.grid()
plt.show()

The loss drops quickly, and after five epochs, the model seems to have converged.

Step #6 Rolling Forecasting: Creating a Multi-step Time Series Forecast

Next, we will generate the rolling multi-step forecast. This approach is different from a single-step approach in that we predict several points of a signal within a prediction window and not just a single value. However, the prediction quality decreases over more extended periods because reusing predictions creates a feedback loop that amplifies potential errors over time.

Also: Measuring Regression Errors with Python

The forecasting process begins with an initial prediction for a single time step. After that, we add the predicted value to the input values for another projection, and so on. In this way, we create the rolling forecast with multiple time steps.

# Settings and Model Labels
rolling_forecast_range = 30
titletext = "Forecast Chart Model A"
ms = [
    ["epochs", epochs],
    ["batch_size", batch_size],
    ["lstm_neuron_number", n_neurons],
    ["rolling_forecast_range", rolling_forecast_range],
    ["layers", "LSTM, DENSE(1)"],
]
settings_text = ""
lms = len(ms)
for i in range(0, lms):
    settings_text += ms[i][0] + ": " + str(ms[i][1])
    
    if i < lms - 1:
        settings_text = settings_text + ",  "

# Making a Multi-Step Prediction
# Create the initial input data
new_df = df.filter(["sine_curve"])
for i in range(0, rolling_forecast_range):
    # Select the last sequence from the dataframe as input for the prediction model
    last_values = new_df[-n_neurons:].values
    
    # Scale the input data and bring it into shape
    last_values_scaled = scaler.transform(last_values)
    X_input = np.array(last_values_scaled).reshape([1, 110, 1])
    X_test = np.reshape(X_input, (X_input.shape[0], X_input.shape[1], 1))
    
    # Predict and unscale the predictions
    pred_value = model.predict(X_input)
    pred_value_unscaled = scaler.inverse_transform(pred_value)
    
    # Add the prediction to the next input dataframe
    new_df = pd.concat([new_df, pd.DataFrame({"sine_curve": pred_value_unscaled[0, 0]}, index=new_df.iloc[[-1]].index.values + 1)])
    new_df_length = new_df.size
forecast = new_df[new_df_length - rolling_forecast_range : new_df_length].rename(
    columns={"sine_curve": "Forecast"}
)

We can plot the forecast together with the ground truth.

#Visualize the results
validxs = df_valid_pred.copy()
dflen = new_df.size - 1
dfs = pd.concat([validxs, forecast], sort=False)
dfs.at[dflen, "y_pred_train"] = dfs.at[dflen, "y_pred"]
dfs

# Zoom in to a closer timeframe
df_zoom = dfs[dfs.index > 200]

# Visualize the data
fig, ax = plt.subplots(figsize=(16, 5), sharex=True)
ax.tick_params(axis="x", rotation=0, labelsize=10, length=0)
ax.xaxis.set_major_locator(plt.MaxNLocator(rolling_forecast_range))
plt.title('Forecast And Predictions (y_pred)')
sns.lineplot(data=df_zoom[["sine_curve", "y_pred"]], ax=ax)
sns.scatterplot(data=df_zoom[["Forecast"]], x=df_zoom.index, y='Forecast', linewidth=1.0, ax=ax)
plt.legend(["x_train", "y_pred", "y_test_rolling"], loc="lower right")
ax.annotate('ModelSettings: ' + settings_text, xy=(0.06, .015),  xycoords='figure fraction', horizontalalignment='left', verticalalignment='bottom', fontsize=10)
plt.show()

The model learned to predict the periodic movement of the curve and thus succeeds in modeling its further course. However, the amplitude of the predicted curve increases over time, which increases the prediction errors.

Step #7 Comparing Results for Different Parameters

There is plenty of room to improve the forecasting model further. So far, our model seems not to consider that the amplitude of the sinus curve is gradually increasing. Over time, errors are amplified, leading to a growing deviation from the ground truth signal.

We can improve the model by changing the model parameters and the model architecture. However, I would not recommend changing them one at a time.

I tested several model configurations with varying epochs and neuron numbers/sample sizes to further optimize the model. Parameters such as Batch_size (=1), the timeframe for the forecast (=30 timesteps), and the model architecture (=1 LSTM Layer, 1 DENSE Layer) were left unchanged. The illustrations below show the results:

Different designs of the rolling time series forecasting model

The model that looks most promising is model #6. This model considers both the periodic movements of the sinus curve and the steadily increasing slope. The configuration of this model uses 15 epochs and 115 neurons/sample values.

Errors are amplified over more extended periods when they enter a feedback loop. For this reason, we can see in the graph below that the prediction accuracy decreases over time.

Long-time multi-step forecast (model #6)

Summary

In this tutorial, we have created a rolling time-series forecast for a rising sine curve. A multi-step forecast helps better understand how a signal will develop over a more extended period. Finally, we have tested and compared different model variants and selected the best-performing model.

There are several ways to improve model accuracy further. For example, the current network architecture was kept simple and only had a single LSTM layer. Consequently, we could try to add additional layers. Another possibility is to experiment with different hyperparameters. For example, we could increase the training epochs and use dropout to prevent overfitting. In addition, we could experiment with other activation functions. Feel free to try it out.

I hope this article was helpful. If you have questions remaining, let me know in the comments.

Another forecasting approach is to train a neural network model that predicts multiple outputs per input batch. Another relataly article demonstrates how this multi-output regression approach works: Stock Market Prediction – Multi-output regression in Python.

Also, consider non-neural network approaches to time series forecasting, such as ARIMA, which can achieve great results too.

Sources and Further Reading

The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.

The post Rolling Time Series Forecasting: Creating a Multi-Step Prediction for a Rising Sine Curve using Neural Networks in Python appeared first on relataly.com.

Stock Market Prediction – Adjusting Time Series Prediction Intervals in Python

Florian Follonier — Wed, 01 Apr 2020 16:37:56 +0000

Get ready to level up your time-series forecasting game! In this tutorial, we’re going to take things up a notch by showing you how to adjust prediction intervals using Keras recurrent neural networks and Python.

Now, you may remember our previous article on stock market forecasting where we made a forecast for the S&P500 stock market index using a prediction interval of just one day. However, other time-series prediction problems may require us to look further ahead – maybe several days, weeks, or even months. Fear not! We’ve got you covered.

By tweaking our data preparation and model architecture, we can modify the prediction interval and create a single-step forecast for a longer time frame. In this article, we’re going to show you exactly how to do that.

First, we’ll give you a quick rundown of different methods for adjusting the time series prediction interval. Then, we’ll dive into the practical stuff. We’ll use Python to train a simple neural network on stock market data and validate its performance. Once we’re happy with the model, we’ll prepare the data in a way that allows us to forecast a single but more extended step into the future.

Ready to take your time-series forecasting to the next level? Then let’s get started!

Disclaimer

Time Series Analysis Clocks

" data-image-caption="

Time Series Analysis Clocks

" data-large-file="https://www.relataly.com/wp-content/uploads/2020/04/Time-Series-Analysis-Clocks.png" src="https://www.relataly.com/wp-content/uploads/2020/04/Time-Series-Analysis-Clocks-512x287.png" alt="Time Series Analysis Clocks" class="wp-image-13471" srcset="https://www.relataly.com/wp-content/uploads/2020/04/Time-Series-Analysis-Clocks.png 512w, https://www.relataly.com/wp-content/uploads/2020/04/Time-Series-Analysis-Clocks.png 300w, https://www.relataly.com/wp-content/uploads/2020/04/Time-Series-Analysis-Clocks.png 768w, https://www.relataly.com/wp-content/uploads/2020/04/Time-Series-Analysis-Clocks.png 1456w" sizes="(max-width: 512px) 100vw, 512px" />

Time Series Analysis

Ways of Adjusting Prediction Intervals

When forecasting one step, the prediction interval is the point in time for which a prediction model will simulate the next value. There are three different ways to change the prediction interval:

Multi-Step Rolling Forecasting: Another way is to train the model on its output. We do this by maintaining the predictions and reusing them as input in the subsequent training run. In this way, the projections range one-time step further ahead with each iteration. After seven iterations, based on daily input time steps, the model will have provided the output for a weekly prediction. We have covered this rolling forecasting approach in a separate tutorial.
Deep Multi-Output Forecasting: A third option is to create a multi-output model that provides an entire series of predictions with multiple timesteps. We have covered this multi-output forecasting approach in a separate tutorial.
Single-step forecasting with bigger timesteps: In a single-stage forecasting approach, the input data defines the length of a time step. Changing the size of the input steps will change the output steps to the same extent. For example, a model that uses daily prices as input data will also provide day-to-day forecasts. We will cover this forecasting approach in the following section.

Predicting the Price of the S&P500 One Week Ahead

Let’s begin with the hands-on part. In this part, we will use Python to create a single-step forecasting model with more extended timesteps. Our model will make projections that reach one week ahead. For this purpose, we reuse most of the code from the previous article on univariate single-step daily forecasting. So we won’t go into all the details and will only speak about the areas in which we must adjust the code. Changes are necessary to data preparation and model architecture.

In the following, we develop a single-variate neural network model that forecasts the S&P500 stock market index. The code is available on the GitHub repository.

View on GitHub Relataly GitHub Repo

Prerequisites

Before starting the coding part, make sure that you have set up your Python 3 environment and required packages. If you don’t have an environment, you can follow these steps to set up the Anaconda environment.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:

In addition, we will be using Keras (2.0 or higher) with Tensorflow backend, the machine learning library sci-kit-learn, and pandas DataReader to interact with the yahoo finance API.

You can install packages using console commands:

pip install
conda install (if you are using the anaconda packet manager)

Step #1 Load the Data

In the following, we will modify the prediction interval of the neural network model we developed in a previous post. As a result, the model will generate predictions for the market price of the S&P500 Index that range one week ahead.

As before, we start loading the stock market data via an API.

# A tutorial for this file is available at www.relataly.com

import math # Fundamental package for scientific computing with Python
import numpy as np # Additional functions for analysing and manipulating data
import pandas as pd # Date Functions
from datetime import date, timedelta # This function adds plotting functions for calender dates
from pandas.plotting import register_matplotlib_converters # Important package for visualization - we use this to plot the market data
import matplotlib.pyplot as plt # Formatting dates
import matplotlib.dates as mdates # Packages for measuring model performance / errors
from sklearn.metrics import mean_absolute_error, mean_squared_error # Tools for predictive data analysis. We will use the MinMaxScaler to normalize the price data 
from sklearn.preprocessing import MinMaxScaler # Deep learning library, used for neural networks
from tensorflow.keras.models import Sequential # Deep learning classes for recurrent and regular densely-connected layers
from tensorflow.keras.layers import LSTM, Dense
from tensorflow.keras.callbacks import EarlyStopping
import tensorflow as tf
import seaborn as sns
sns.set_style('white', { 'axes.spines.right': False, 'axes.spines.top': False})

# check the tensorflow version and the number of available GPUs
print('Tensorflow Version: ' + tf.__version__)
physical_devices = tf.config.list_physical_devices('GPU')
print("Num GPUs:", len(physical_devices))


# Setting the timeframe for the data extraction
end_date = date.today().strftime("%Y-%m-%d")
start_date = '2010-01-01'

# Getting S&P500 quotes
stockname = 'S&P500'
symbol = '^GSPC'

# You can either use webreader or yfinance to load the data from yahoo finance
# import pandas_datareader as webreader
# df = webreader.DataReader(symbol, start=start_date, end=end_date, data_source="yahoo")

import yfinance as yf #Alternative package if webreader does not work: pip install yfinance
df = yf.download(symbol, start=start_date, end=end_date)

df.head(5)

Tensorflow Version: 2.5.0
Num GPUs: 0
[*********************100%***********************]  1 of 1 completed
			Open		High		Low			Close		Adj Close	Volume
Date						
2009-12-31	1126.599976	1127.640015	1114.810059	1115.099976	1115.099976	2076990000
2010-01-04	1116.560059	1133.869995	1116.560059	1132.989990	1132.989990	3991400000
2010-01-05	1132.660034	1136.630005	1129.660034	1136.520020	1136.520020	2491020000
2010-01-06	1135.709961	1139.189941	1133.949951	1137.140015	1137.140015	4972660000
2010-01-07	1136.270020	1142.459961	1131.319946	1141.689941	1141.689941	5270680000

Step #2 Adjusting the Shape of the Input Data and Exploration

We have a DataFrame that contains the daily price quotes for the S&P 500. Next, we prepare the data to include the weekly price quotes. If we want our model to provide weekly price predictions, we need to change the data so that the input contains weekly price quotes. A simple way to achieve this is to iterate through the rows and only keep every 7th row.

# Changing the data structure to a dataframe with weekly price quotes
df["index1"] = range(1, len(df) + 1)
rownumber = df.shape[0]
lst = list(range(rownumber))
list_of_relevant_numbers = lst[0::7]
df_weekly = df[df["index1"].isin(list_of_relevant_numbers)]
df_weekly.head(5)

	Open	High	Low	Close	Adj Close	Volume	index1
Date							
2010-01-11	1145.959961	1149.739990	1142.020020	1146.979980	1146.979980	4255780000	7
2010-01-21	1138.680054	1141.579956	1114.839966	1116.479980	1116.479980	6874290000	14
2010-02-01	1073.890015	1089.380005	1073.890015	1089.189941	1089.189941	4077610000	21
2010-02-10	1069.680054	1073.670044	1059.339966	1068.130005	1068.130005	4251450000	28
2010-02-22	1110.000000	1112.290039	1105.380005	1108.010010	1108.010010	3814440000	35

After this, we quickly create a line plot to validate that everything looks as expected.

# Creating a Lineplot
years = mdates.YearLocator() 
fig, ax1 = plt.subplots(figsize=(16, 6))
ax1.xaxis.set_major_locator(years)
ax1.legend([stockname], fontsize=12)
plt.title(stockname + ' from '+ start_date + ' to ' + end_date)
sns.lineplot(data=df['Close'], label=stockname, linewidth=1.0)
plt.ylabel('S&P500 Points')
plt.show()

			Open		High		Low			Close		Adj Close	Volume		index1
Date							
2010-01-11	1145.959961	1149.739990	1142.020020	1146.979980	1146.979980	4255780000	7
2010-01-21	1138.680054	1141.579956	1114.839966	1116.479980	1116.479980	6874290000	14
2010-02-01	1073.890015	1089.380005	1073.890015	1089.189941	1089.189941	4077610000	21
2010-02-10	1069.680054	1073.670044	1059.339966	1068.130005	1068.130005	4251450000	28
2010-02-22	1110.000000	1112.290039	1105.380005	1108.010010	1108.010010	3814440000	35

Step #3 Preprocess the Data

Before we can train the neural network, we first need to define the shape of the training data. We use weekly price quotes and define an input_sequence_length of 50 weeks.

# Feature Selection - Only Close Data
train_df = df.filter(['Close'])
data_unscaled = df.values

# Transform features by scaling each feature to a range between 0 and 1
mmscaler = MinMaxScaler(feature_range=(0, 1))
np_data = mmscaler.fit_transform(data_unscaled)

# Creating a separate scaler that works on a single column for scaling predictions
scaler_pred = MinMaxScaler()
df_Close = pd.DataFrame(df['Close'])
np_Close_scaled = scaler_pred.fit_transform(df_Close)

# Set the sequence length - this is the timeframe used to make a single prediction
sequence_length = 25

# Prediction Index
index_Close = train_df.columns.get_loc("Close")

# Split the training data into train and train data sets
# As a first step, we get the number of rows to train the model on 80% of the data 
train_data_length = math.ceil(np_data.shape[0] * 0.8)

# Create the training and test data
train_data = np_data[0:train_data_length, :]
test_data = np_data[train_data_length - sequence_length:, :]

# The RNN needs data with the format of [samples, time steps, features]
# Here, we create N samples, sequence_length time steps per sample, and 6 features
def partition_dataset(sequence_length, train_df):
    x, y = [], []
    data_len = train_df.shape[0]
    for i in range(sequence_length, data_len):
        x.append(train_df[i-sequence_length:i,:]) #contains sequence_length values 0-sequence_length * columsn
        y.append(train_df[i, index_Close]) #contains the prediction values for validation (3rd column = Close),  for single-step prediction
    
    # Convert the x and y to numpy arrays
    x = np.array(x)
    y = np.array(y)
    return x, y

# Generate training data and test data
x_train, y_train = partition_dataset(sequence_length, train_data)
x_test, y_test = partition_dataset(sequence_length, test_data)

# Print the shapes: the result is: (rows, training_sequence, features) (prediction value, )
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# Validate that the prediction value and the input match up
# The last close price of the second input sample should equal the first prediction value
print(x_test[1][sequence_length-1][index_Close])
print(y_test[0])

(2465, 25, 7) (2465,)
(622, 25, 7) (622,)
0.5509444640253316
0.5509444640253316

Step #4 Building a Time Series Prediction Model

The first layer of neurons in our neural network needs to fit the input values from the data. Therefore, we need 50 neurons – one for each input price quote.

The model architecture of the recurrent neural network

We use the following input arguments for the model fit:

x_train: Vector, matrix, or array of training data. It can also be a list (as in our case) if the model has multiple inputs.
y_train: Vector, matrix, or array of target data. This is the labeled data the model tries to predict; in other words, these are the results of x_train.
Epochs: The integer value defines how often the model goes through the training set.
Batch size: Integer value that defines the number of samples that will be propagated through the network. After each propagation, the network adjusts the weights of the nodes in each layer.

# Configure the neural network model
model = Sequential()

# Model with n_neurons Neurons
n_neurons = x_train.shape[1] * x_train.shape[2]
print(n_neurons, x_train.shape[1], x_train.shape[2])
model.add(LSTM(n_neurons, return_sequences=True, input_shape=(x_train.shape[1], x_train.shape[2])))
model.add(LSTM(n_neurons, return_sequences=False))
model.add(Dense(25, activation="relu"))
model.add(Dense(1))

# Compile the model
model.compile(optimizer="adam", loss="mean_squared_error")

# Training the model
epochs = 10
early_stop = EarlyStopping(monitor='loss', patience=2, verbose=1)
history = model.fit(x_train, y_train, 
                    batch_size=16, 
                    epochs=epochs, 
                    callbacks=[early_stop])

Epoch 1/10
155/155 [==============================] - 7s 25ms/step - loss: 8.2791e-04
Epoch 2/10
155/155 [==============================] - 4s 24ms/step - loss: 1.3465e-04
Epoch 3/10
155/155 [==============================] - 4s 24ms/step - loss: 1.0998e-04
Epoch 4/10
155/155 [==============================] - 4s 25ms/step - loss: 1.0241e-04
Epoch 5/10
155/155 [==============================] - 4s 24ms/step - loss: 7.4277e-05
Epoch 6/10
155/155 [==============================] - 4s 24ms/step - loss: 6.5786e-05
Epoch 7/10
155/155 [==============================] - 4s 24ms/step - loss: 6.8482e-05
Epoch 8/10
155/155 [==============================] - 4s 24ms/step - loss: 5.0326e-05
Epoch 9/10
155/155 [==============================] - 4s 24ms/step - loss: 4.8574e-05
Epoch 10/10
155/155 [==============================] - 4s 25ms/step - loss: 4.1287e-05

Step #5 Evaluate Model Performance

Next, we validate the model by calculating our predictions’ mean-squared and root-mean-squared errors. However, in time series forecasting metrics can be misleading. It is good to double-check model results using illustrations. Therefore, we plot the input sequences and the forecast to see if our model can continue the time series in a plausible way.

# Get the predicted values
y_pred_scaled = model.predict(x_test)
y_pred = scaler_pred.inverse_transform(y_pred_scaled)
y_test_unscaled = scaler_pred.inverse_transform(y_test.reshape(-1, 1))

# Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_unscaled, y_pred)
print(f'Median Absolute Error (MAE): {np.round(MAE, 2)}')

# Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled))) * 100
print(f'Mean Absolute Percentage Error (MAPE): {np.round(MAPE, 2)} %')

# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled)) ) * 100
print(f'Median Absolute Percentage Error (MDAPE): {np.round(MDAPE, 2)} %')

# The date from which on the date is displayed
display_start_date = "2018-01-01" 

# Add the difference between the valid and predicted prices
train = pd.DataFrame(train_df[:train_data_length + 1]).rename(columns={'Close': 'x_train'})
valid = pd.DataFrame(train_df[train_data_length:]).rename(columns={'Close': 'y_test'})
valid.insert(1, "y_pred", y_pred, True)
valid.insert(1, "residuals", valid["y_pred"] - valid["y_test"], True)
df_union = pd.concat([train, valid])

# Zoom in to a closer timeframe
df_union_zoom = df_union[df_union.index > display_start_date]

# Create the lineplot
fig, ax1 = plt.subplots(figsize=(16, 8))
plt.title("Predictions vs Ground Truth")
plt.ylabel(stockname, fontsize=18)
sns.despine();
sns.lineplot(data=df_union_zoom, linewidth=1.0, palette='CMRmap', ax=ax1)
plt.show()

Median Absolute Error (MAE): 42.7
Mean Absolute Percentage Error (MAPE): 1.16 %
Median Absolute Percentage Error (MDAPE): 0.92 %

At the bottom, we can see the differences between predictions and valid data. Positive values signal that the projections were too optimistic. Negative values mean that the predictions were too pessimistic and that the actual value turned out to be higher than the prediction.

Step #6 Predicting for the Next Week

What can be more satisfying than to see a newly trained model at work? Let’s use our new model to predict next week’s price for the S&P500. We will create a fresh input_sequence with prices from the past N days. Then we scale this input sequence and include it as input in our call to the model.predict() function.

# Get fresh data until today and create a new dataframe with only the price data

new_df = df

N = sequence_length

# Get the last N steps closing price values and scale the data to be values between 0 and 1
last_N_steps = new_df[-sequence_length:].values
last_N_steps_scaled = mmscaler.transform(last_N_steps)

# Create an empty list and Append past N steps
X_test_new = []
X_test_new.append(last_N_steps_scaled)

# Convert the X_test data set to a numpy array and reshape the data
pred_price_scaled = model.predict(np.array(X_test_new))
pred_price_unscaled = scaler_pred.inverse_transform(pred_price_scaled.reshape(-1, 1))

# Print last price and predicted price for the next week
price_today = np.round(new_df['Close'][-1], 2)
predicted_price = np.round(pred_price_unscaled.ravel()[0], 2)
change_percent = np.round(100 - (price_today * 100)/predicted_price, 2)

plus = '+'; minus = ''
print(f'The close price for {stockname} at {end_date} was {price_today}')
print(f'The predicted close price is {predicted_price} ({plus if change_percent > 0 else minus}{change_percent}%)')

The close price for S&P500 at 2022-05-11 was 4001.05
The predicted close price is 4046.300048828125 (+1.12%)

So for the 9th of April 2020, the model predicts that the S&P500 will close at:

4046.300048828125

Considering today’s (2nd of April 2020) price is 2528 points, our model expects the S&P to gain roughly 124 points in the coming seven days. Of course, this is by no means financial advice. As we have seen before, our model is often wrong.

Summary

This article has shown how to adjust the prediction intervals for a time series forecasting model. We have created a neural network that predicts the price of the S&P500 one week in advance. Finally, we trained and validated the model and made a forecast for the next week.

Varying the input shape is a quick approach to changing the forecasting time steps. However, increasing the length of the time steps also reduces the amount of data we can use for training and testing. In our case, we still have enough data available. But in other cases, where less information is available, this can become a problem. The preferred method is to use a rolling forecast approach or create a multi-output forecast in such a case.

I hope this article was helpful. Should you have questions or remarks, let me know in the comments.

Sources and Further Readings

The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.

The post Stock Market Prediction – Adjusting Time Series Prediction Intervals in Python appeared first on relataly.com.

Stock Market Prediction using Univariate Recurrent Neural Networks (RNN) with Python

Florian Follonier — Tue, 24 Mar 2020 00:38:39 +0000

Financial analysts have long been fascinated by the prospect of predicting the prices of financial assets. In recent years, there has been increasing interest in using machine learning and deep learning techniques to generate predictions, in addition to traditional methods such as technical and fundamental analysis. Python libraries like Keras and Scikit-Learn make it relatively straightforward for those with programming experience to build a neural network for stock market forecasting. This tutorial will guide you through the process of creating a univariate model using a Keras neural network with LSTM layers to forecast the S&P500 index. By the end of this tutorial, you will have a model that can make single-step predictions for the stock market.

The rest of this article proceeds in two parts: We briefly introduce univariate modeling and neural networks. Then we start with the coding part and go through all the steps to train a neural network, including data ingestion, data preprocessing, and the design, training, testing, and usage of a predictive neural network model.

Disclaimer

AI can help investors and traders to make more accurate investment decisions.

Single-Step Univariate Stock Market Prediction

The prediction approach described in this article is known as single-step single-variate time series forecasting. This approach is similar to technical chart analysis in that it assumes that predicting the price of an asset is fundamentally a time series problem. The goal is to identify patterns in a time series that indicate how the series will develop in the future.

This tutorial predicts the value for a single time step (1 day). In other words, we consider a single time series of data (single-variate). However, predicting multiple steps or increasing the time-step length would also be possible. In both cases, the predictions will range further into the future. I have covered this topic in a separate post on time series forecasting.

We will develop a univariate prediction model that predicts a single feature on historical prices for a specific period. More complex multivariate models use additional features such as moving averages, momentum indicators, or market sentiment. I have covered multivariate stock market prediction in a separate tutorial.

What are Recurrent Neural Networks?

Recurrent Neural Networks (RNNs) are particularly powerful for analyzing time series data because they use LSTM layers that allow information to flow back and forth through the network. This allows the RNN to learn patterns that may occur over long periods of time and potentially overlap, leading to more accurate predictions. RNNs differ from traditional feed-forward neural networks in that all information does not flow in a single direction from left to right. Instead, the output of the RNN is fed back into the network, allowing it to take into account past information when making predictions.

One of the primary advantages of RNNs is their ability to process sequential data, such as time series or natural language text. This is because the LSTM layers in an RNN allow the network to maintain a sense of memory and context, allowing it to understand the relationships between data points better and make more accurate predictions.

RNNs are commonly used in a wide range of applications, including language translation, speech recognition, and sentiment analysis. In these applications, the input data is often a sequence of words or other elements. The RNN can use its memory and context-aware processing to understand the meaning and relationships between the elements in the sequence.

Inspired by nature: Each layer of neurons extracts different features from the input data, and the output of one layer becomes the input for the next layer until the final output is generated.

Also: Stock Market Forecasting Neural Networks for Multi-Output Regression in Python

Creating a Univariate Forecasting Model using Keras Recurrent Neural Networks in Python

In this article, we will showcase the process of building a univariate forecast for the closing price of the S&P500 stock market index. To achieve this, we will use a Recurrent Neural Network (RNN) architecture with Long Short-Term Memory (LSTM) layers based on the popular Tensorflow library. In addition, we will employ various Python packages for data manipulation and analytics. The process of training and using the univariate model involves several key steps:

Loading the data
Exploring the data to identify trends and patterns
Scaling and splitting the data to prepare it for modeling
Creating the input shape to feed the data into the LSTM network
Training the model on the training data set
Forecasting future values using the trained model
Visualizing and interpreting the forecasted results

By following these steps, we will be able to create a robust forecast for the closing price of the S&P500 stock market index.

The Python code is available in the relataly GitHub repository.

View on GitHub Relataly GitHub Repo

Please note that you will need some programming experience and familiarity with Python to follow along with this article. Understanding Neural Networks in all depth is not a prerequisite for this tutorial. But if you want to learn more about their architecture and functioning, I can recommend this YouTube video.

Prerequisites

Before starting the coding part, make sure that you have set up your Python 3 environment and required packages. If you don’t have an environment yet, you can follow the steps in this article to set up the Anaconda environment. Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:

In addition, we will be using Keras (2.0 or higher) with Tensorflow backend and the machine learning library scikit-learn.

You can install packages using console commands:

pip install
conda install (if you are using the anaconda packet manager)

Step #1 Load the Data

Let’s start by setting up the imports and loading the price data from yahoo.finance.com via an API. To extract the data, we’ll use the pandas DataReader package – a popular library that provides a function to extract data from various Internet sources into pandas DataFrames. Note that if pandas DataReader does not work, you can use the yfinance package.

The following code extracts the price data for the S&P500 index from yahoo finance. If you wonder what “^GSPC” means, this is the symbol for the S&P500, a stock market index of the 500 most extensive stocks listed in the US stock market. You can use the symbols of other assets, e.g., BTC-USD for Bitcoin. The data is limited to the timeframe between 2010-01-01 and the current date. So when you execute the code, the results will show a more significant period, as in this tutorial.

# A tutorial for this file is available at www.relataly.com
# Tested with Python 3.88

import math 
import numpy as np # Fundamental package for scientific computing with Python
import pandas as pd # For analysing and manipulating data
from datetime import date, timedelta # Date Functions
import matplotlib.pyplot as plt # For visualization
import matplotlib.dates as mdates # Formatting dates
from sklearn.metrics import mean_absolute_error, mean_squared_error # For measuring model performance / errors
from sklearn.preprocessing import MinMaxScaler #to normalize the price data 
from tensorflow.keras.models import Sequential # Deep learning library, used for neural networks
from tensorflow.keras.layers import LSTM, Dense # Deep learning classes for recurrent and regular densely-connected layers
import tensorflow as tf
import seaborn as sns
sns.set_style('white', { 'axes.spines.right': False, 'axes.spines.top': False})

# check the tensorflow version and the number of available GPUs
print('Tensorflow Version: ' + tf.__version__)
physical_devices = tf.config.list_physical_devices('GPU')
print("Num GPUs:", len(physical_devices))

# Setting the timeframe for the data extraction
today = date.today()
end_date = today.strftime("%Y-%m-%d")
start_date = '2010-01-01'

# Getting S&P500 quotes
stockname = 'S&P500'
symbol = '^GSPC'

# You can either use webreader or yfinance to load the data from yahoo finance
# import pandas_datareader as webreader
# df = webreader.DataReader(symbol, start=start_date, end=end_date, data_source="yahoo")

import yfinance as yf #Alternative package if webreader does not work: pip install yfinance
df = yf.download(symbol, start=start_date, end=end_date)

# Taking a look at the shape of the dataset
print(df.shape)
df.head(5)

Tensorflow Version: 2.6.0
Num GPUs: 1
[*********************100%***********************]  1 of 1 completed
(3193, 6)
			Open		High		Low			Close		Adj Close	Volume
Date						
2009-12-31	1126.599976	1127.640015	1114.810059	1115.099976	1115.099976	2076990000
2010-01-04	1116.560059	1133.869995	1116.560059	1132.989990	1132.989990	3991400000
2010-01-05	1132.660034	1136.630005	1129.660034	1136.520020	1136.520020	2491020000
2010-01-06	1135.709961	1139.189941	1133.949951	1137.140015	1137.140015	4972660000
2010-01-07	1136.270020	1142.459961	1131.319946	1141.689941	1141.689941	5270680000

Step #2 Explore the Data

When you load a new data set into your project, it is often a good idea to familiarize yourself with it before taking further steps. When working with time-series data, visually viewing the data in a line plot is the primary way. Use the following code to create the line plot for the S&P500 data.

# Creating a Lineplot
years = mdates.YearLocator() 
fig, ax1 = plt.subplots(figsize=(16, 6))
ax1.xaxis.set_major_locator(years)
ax1.legend([stockname], fontsize=12)
plt.title(stockname + ' from '+ start_date + ' to ' + end_date)
sns.lineplot(data=df['Close'], label=stockname, linewidth=1.0)
plt.ylabel('S&P500 Points')
plt.show()

If you follow the course of the S&P500 stock markets a little, the chart above might look familiar to you.

Step #3 Scaling the Data

It’s best to scale the data before training a neural network. We will use the MinMaxScaler to normalize the price values in our data to a range between 0 and 1.

# Feature Selection - Only Close Data
train_df = df.filter(['Close'])
data_unscaled = train_df.values

# Get the number of rows to train the model on 80% of the data 
train_data_length = math.ceil(len(data_unscaled) * 0.8)

# Transform features by scaling each feature to a range between 0 and 1
mmscaler = MinMaxScaler(feature_range=(0, 1))
np_data = mmscaler.fit_transform(data_unscaled)

Step #4 Creating the Input Shape

Before we can begin with the training of the NN, we need to split the data into separate test sets for training and validation and ensure that it is in the right shape. We will train the NN on a decade of market price data. Then we predict the price of the next day based on the last 50 days of market prices. As illustrated below, we will use 80% of the data as training data and keep 20% as test data to later evaluate the performance of our univariate model.

Splitting data into train and test

Our neural network will have two layers, an input layer, and an output layer. The input data shape must correspond with the number of neurons in the neural network’s input layer. Therefore, we must also decide on the neural network architecture before bringing our data in the right shape.

4.1 Designing the Input Shape

Next, we create the training data based on which we will train our neural network. We make multiple slices of the training data (x_train), so-called mini-batches. The neural network processes the mini-batch one by one during the training process and creates a separate forecast for each mini-batch. The illustration below shows the shape of the data:

The sample dataset for time series forecasting is split into several train batches.

Neural networks learn in an iterative process. The algorithm reduces the prediction errors by adjusting the connection strength between the neurons (weights) in this process. The model needs a second list (y_train) to evaluate the forecast quality, containing the valid price values from our ground truth. The model compares the predictions with the ground truth during training and calculates the training error to minimize it over time.

4.2 Data Preprocessing

The code block below will carry out the steps to prepare the data. It is a standard procedure that will split the data into several mini-batches. Each minibatch contains an input sequence and a corresponding output sequence, the target.

# Set the sequence length - this is the timeframe used to make a single prediction
sequence_length = 50

# Prediction Index
index_Close = train_df.columns.get_loc("Close")
print(index_Close)
# Split the training data into train and train data sets
# As a first step, we get the number of rows to train the model on 80% of the data 
train_data_len = math.ceil(np_data.shape[0] * 0.8)

# Create the training and test data
train_data = np_data[0:train_data_len, :]
test_data = np_data[train_data_len - sequence_length:, :]

# The RNN needs data with the format of [samples, time steps, features]
# Here, we create N samples, sequence_length time steps per sample, and 6 features
def partition_dataset(sequence_length, train_df):
    x, y = [], []
    data_len = train_df.shape[0]
    for i in range(sequence_length, data_len):
        x.append(train_df[i-sequence_length:i,:]) #contains sequence_length values 0-sequence_length * columsn
        y.append(train_df[i, index_Close]) #contains the prediction values for validation (3rd column = Close),  for single-step prediction
    
    # Convert the x and y to numpy arrays
    x = np.array(x)
    y = np.array(y)
    return x, y

# Generate training data and test data
x_train, y_train = partition_dataset(sequence_length, train_data)
x_test, y_test = partition_dataset(sequence_length, test_data)

# Print the shapes: the result is: (rows, training_sequence, features) (prediction value, )
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# Validate that the prediction value and the input match up
# The last close price of the second input sample should equal the first prediction value
print(x_test[1][sequence_length-1][index_Close])
print(y_test[0])

(1966, 50, 1) 
(1966,)

x_train contains 1966 mini-batches. Each contains a series of quotes for 50 dates. In y_train, we have 1966 validation values – one for each mini-batch. Be aware that numbers depend on the timeframe and vary when executing the code.

Step #5 Designing the Model Architecture

Before we can train the model, we first need to decide on the model’s architecture. Above all, the architecture comprises the type and number of layers and the number of neurons in each layer.

5.1 Choosing Layers

Determining the optimal number of layers for a neural network can be challenging and often requires trial and error. One approach is to try out different architectures and see which one performs best. In this case, we will use a fully connected network with four layers, consisting of two layers of the LSTM class and two layers of the Dense class from the Keras library. This architecture was chosen because it is relatively simple and a good starting point for addressing time series problems. The architecture and performance of the univariate model can then be tested and refined through multiple iterations.

The architecture of our recurrent Neural Network

5.2 Choosing the Number of Neurons

When selecting the number of neurons for a neural network layer, there are a few general guidelines to follow:

A larger number of neurons can allow the model to capture more complex patterns in the data, but it may also increase the risk of overfitting.
A smaller number of neurons can reduce the risk of overfitting, but it may also limit the model’s ability to capture complex patterns.
It is generally recommended to start with a smaller number of neurons and increase the number if necessary.

One approach to determining the number of neurons is to use the “rule of thumb” method, which suggests using the following formula: (number of input neurons + number of output neurons) * 2/3. However, the optimal number of neurons will depend on the specific problem being solved and the characteristics of the data.

For our specific example, the input data consists of values for 50 dates, so the input layer should have at least 50 neurons for each value. The second layer should have 35 neurons according to the formula ((50 + 1) * 2 / 3), and the last layer should have only one neuron, as the prediction will contain a single price point for a single time step.

# Configure the neural network model
model = Sequential()

neurons = sequence_length

# Model with sequence_length Neurons 
# inputshape = sequence_length Timestamps
model.add(LSTM(neurons, return_sequences=True, input_shape=(x_train.shape[1], 1))) 
model.add(LSTM(neurons, return_sequences=False))
model.add(Dense(35, activation='relu'))
model.add(Dense(1))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

Step #6 Train the Univariate Model

Now that we have prepared our data and defined our model, it’s time to fit the model to the data and start training. Training a machine learning model involves adjusting the model’s parameters to minimize the error between the predicted and actual values. This process can take a varying amount of time, depending on the complexity of the model and the size of the dataset. For instance, the training time is usually a couple of minutes on my local notebook processor (Intel Core i7).

# Training the model
model.fit(x_train, y_train, batch_size=16, epochs=25)

Epoch 1/25
157/157 [==============================] - 8s 10ms/step - loss: 0.0025
Epoch 2/25
157/157 [==============================] - 1s 9ms/step - loss: 1.2553e-04
Epoch 3/25
157/157 [==============================] - 1s 9ms/step - loss: 1.2324e-04
Epoch 4/25
157/157 [==============================] - 1s 9ms/step - loss: 1.1924e-04
Epoch 5/25
157/157 [==============================] - 1s 9ms/step - loss: 1.1989e-04
Epoch 6/25
157/157 [==============================] - 1s 9ms/step - loss: 1.1564e-04
Epoch 7/25
157/157 [==============================] - 1s 9ms/step - loss: 1.0769e-04
Epoch 8/25
157/157 [==============================] - 1s 9ms/step - loss: 1.0273e-04
Epoch 9/25
157/157 [==============================] - 1s 9ms/step - loss: 9.0550e-05
Epoch 10/25
157/157 [==============================] - 1s 9ms/step - loss: 9.1702e-05
Epoch 11/25
157/157 [==============================] - 1s 8ms/step - loss: 8.8220e-05
Epoch 12/25
157/157 [==============================] - 1s 9ms/step - loss: 1.0555e-04
Epoch 13/25
...
Epoch 24/25
157/157 [==============================] - 1s 9ms/step - loss: 6.1625e-05
Epoch 25/25
157/157 [==============================] - 1s 9ms/step - loss: 4.8550e-05

We have fitted our model to the training data.

Step #7 Creating the Univariate Stock Market Forecasting

So how does our stock market prediction model perform? We need to feed the model with the test data to evaluate the model’s performance. For this purpose, we provide the test data (x_test) that we have generated in a previous step to the model to get some predictions. We must remember that we initially scaled the input data to 0 and 1. Therefore, before interpreting the results, we must inverse the MinMaxScaling from the predictions.

# Get the predicted values
y_pred_scaled = model.predict(x_test)
y_pred = mmscaler.inverse_transform(y_pred_scaled)
y_test_unscaled = mmscaler.inverse_transform(y_test.reshape(-1, 1))

Step #8 Evaluate Model Performance

Different indicators can help us to evaluate the performance of our model. We calculate the forecast error by subtracting valid test data (y_test) from predictions.

# Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_unscaled, y_pred)
print(f'Median Absolute Error (MAE): {np.round(MAE, 2)}')
# Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled))) * 100
print(f'Mean Absolute Percentage Error (MAPE): {np.round(MAPE, 2)} %')
# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled)) ) * 100
print(f'Median Absolute Percentage Error (MDAPE): {np.round(MDAPE, 2)} %')

MAE: 32.0 
RMSE: 17.6

The MAE can be negative or positive. If it is positive, our predictions lie below the valid values. For our model, the calculated MAE is (32.0). From the MAE, we can tell that our model generally tends to predict a bit too pessimistically.

The mean squared error (RMSE) is always positive. More significant errors tend to impact the RMSE as they are squared substantially. In our case, the RMSE is 17.6, indicating that the prediction error is relatively constant. In other words, the predictions are mostly not entirely wrong.

Visualizing test predictions helps in the process of evaluating the model. Therefore we will plot predicted and valid values.

# The date from which on the date is displayed
display_start_date = "2019-01-01" 

# Add the difference between the valid and predicted prices
train = pd.DataFrame(train_df[:train_data_length + 1]).rename(columns={'Close': 'y_train'})
valid = pd.DataFrame(train_df[train_data_length:]).rename(columns={'Close': 'y_test'})
valid.insert(1, "y_pred", y_pred, True)
valid.insert(1, "residuals", valid["y_pred"] - valid["y_test"], True)
df_union = pd.concat([train, valid])

# Zoom in to a closer timeframe
df_union_zoom = df_union[df_union.index > display_start_date]

# Create the lineplot
fig, ax1 = plt.subplots(figsize=(16, 8), sharex=True)
plt.title("Predictions vs Ground Truth")
sns.set_palette(["#090364", "#1960EF", "#EF5919"])
plt.ylabel(stockname, fontsize=18)
sns.lineplot(data=df_union_zoom[['y_train', 'y_pred', 'y_test']], linewidth=1.0, dashes=False, ax=ax1)

# Create the barplot for the absolute errors
df_sub = ["#2BC97A" if x > 0 else "#C92B2B" for x in df_union_zoom["residuals"].dropna()]
ax1.bar(height=df_union_zoom['residuals'].dropna(), x=df_union_zoom['residuals'].dropna().index, width=3, label='absolute errors', color=df_sub)
plt.legend()
plt.show()

We can see that the orange zone contains the test predictions. The grey area marks the difference between test predictions and ground truth. As indicated by the different performance measures, we can see that the predictions are typically near the ground truth.

We have also added the absolute errors on the bottom. Where the difference is negative, the predicted value was too optimistic. Where the difference is positive, the predictive value was too pessimistic.

Step #9 Stock Market Prediction – Predicting a Single Day Ahead

Now that we have tested our model, we can use it to make a prediction. We use a new data set as the input for our prediction model. The model returns a forecast for a single time the next day.

# Get fresh data
df_new = df.filter(['Close'])

# Get the last N day closing price values and scale the data to be values between 0 and 1
last_days_scaled = mmscaler.transform(df_new[-sequence_length:].values)

# Create an empty list and Append past n days
X_test = []
X_test.append(last_days_scaled)

# Convert the X_test data set to a numpy array and reshape the data
X_test = np.array(X_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))

# Get the predicted scaled price, undo the scaling and output the predictions
pred_price = model.predict(X_test)
pred_price_unscaled = mmscaler.inverse_transform(pred_price)

# Print last price and predicted price for the next day
price_today = round(df_new['Close'][-1], 2)
predicted_price = round(pred_price_unscaled.ravel()[0], 2)
percent_change = round((predicted_price * 100)/price_today - 100, 2)

prefix = '+' if percent_change > 0 else ''
print(f'The close price for {stockname} at {today} was {price_today}')
print(f'The predicted close price for the next day is {predicted_price} ({prefix}{percent_change}%)')

The price for S&P500 on 2020-04-04 was: 2578.0
The predicted S&P500 price on the date 2020-04-05 is: 2600.0

So, the model predicts a value of 2600.0 for the S&P500 on 2020-04-07.

Summary

In this tutorial, you have learned to create, train and test a four-layered recurrent neural network for stock market prediction using Python and Keras. Finally, we have used this model to predict the S&P500 stock market index. You can easily create models for other assets by replacing the stock symbol with another stock code. A list of common symbols for stocks or stock indexes is available on yahoo.finance.com. Don’t forget to retrain the model with a fresh copy of the price data.

The model created in this post makes predictions for a single time step. If you want to learn how to make time-series predictions that range further, you might want to check out the part II of this tutorial series: Creating a Multistep Forecast in Python.

I hope you enjoyed this article. I am always trying to improve and learn from my audience. So, please let me know in the comments if you have questions or remarks!

The Bulls and the Bears – a never-ending struggle but in the long run, the bulls tend to win. Image created with Midjourney.

Sources and Further Reading

The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.

If you want to learn about an alternative approach to univariate stock market forecasting, consider this article on Facebook Prophet.

The post Stock Market Prediction using Univariate Recurrent Neural Networks (RNN) with Python appeared first on relataly.com.