Recurrent Neural Networks Archives - relataly.com

Foundation Models Are Here: How Will They Impact Traditional Machine Learning?

Florian Follonier — Sat, 25 Feb 2023 09:03:16 +0000

Just a year ago, ChatGPT was launched, and it has since catalyzed a seismic shift in the AI landscape. Given the astonishing capabilities of generative AI and the rapid evolution of foundation models, one question has risen to the forefront of discussions: What will be the impact of foundation models on classic machine learning and so-called “narrow AI”?

The opinions on this topic are sharply divided. Some are convinced that foundation models are poised to completely overshadow traditional machine learning methods. Others remain steadfast in their belief that classic ML techniques have enduring value. As we’ll explore in this article, the reality is far more nuanced than a straightforward “yes” or “no.”

To dissect this complex issue, we’ll delve into a range of application domains, from classification and regression to natural language processing (NLP), clustering, and computer vision, among others. We’ll also discuss a few concepts and terms along the way. On top of that, we will examine additional considerations such as interpretability, computational demands, and reliability.

Having worked a few years classic ml and since the end of 2022 mostly on Generative AI, I would like to share a few thoughts. Please take them with some caution, as they are mostly based on personal experience in the field and my conversations with organizations on how they can benefit from generative AI.

The Beginnings Classic Machine Learning

When I started my career a few years ago in 2017 the world was an entirely different one. There were already some tools that would help with that but instead of Python, but most data scientists were still using R (yes, yes, its still being used). There was also no ChatGPT, and the closest thing available was BERT; which was introduced by Google in 2017 and technologially marked an important step towards todays foundation models.

If you you wanted to use AI in 2017, you would typically need to build your own model for your specific use case. You had to collect the data, prepare it (still today takes most of the time), and then train a model, test it and deploy it into production. Oh and then monitor it.

This is the classic data sciense process and it is complex, with a lot of steps that in many companies took several month. And there is a catch to it. At the end of the process, when you have built your model, it is what we call today narrow AI.

Why Narrow? Because the model is only there for a specific use case. If you have a slightly different use case, you would typically need to train a new model. As a result, companies that were successful in building ML models, quickly end up with a large number of models they need to run and monitor, which adds additional complexity.

Pretrained Models – Comodity AI

There are certain AI applications that we just see again and again and that stay mostly the same. Many of these applications are in the area of computer vision and audio recognition.

An example, is face recognition. Cloud providers soon recognized that the models they often had built for their own purpose, could also be useful to their customers. Face recognition, voice recognition, or information extraction from invoices have become commidities. These models are relatively complex to built from scratch but have a high degree of standardization. So customers who want to use these models often decide to not build them themselves but buy them as a service from an external provider.

Yet, these models are still narrow AI because they can only do one thing.

Foundation Models

With ChatGPT, LLMA2, Bard, etc. we are embarking from the world of narrow AI and enter the realm of foundation models. Its a paradigm shift not only in the way we create AI models but also in the way we interact with them.

Foundation models are fundamentally different from classic ML in the way they are trained. Instead of using the data from a specific problem, and training a model, foundation models are trained on large amounts of natural language – a major part of what is available today in the public internet in different languages, incl. wikipedia, computer code, etc. The training process takes up to several month, costs millions of dollar and results in extremely capable models.

Foundation models built on several technologies that were developed throughout the years. A few examples:

Machine learning: Instead of explicitly programming foundation models, they learn from the provided data and identify patterns.
Transformers: Were a milestone and part of the BERT model. It introduces an effective way of training word prediction models with a high degree of parallelization.
Supervised learning: foundation models are trained on labeled datasets.
To further improve the models, foundation models are trained using reinformcenent learning with human feedback.
Deep learning: Foundation models use large neural networks with a large number of layers and neurons.

The sheer size of these models make them possible to solve a whole lot of tasks. However, there are things that these models were so far struggling with, for example, solving math problems, halluzinations, etc.And regardless of the task you communicate and instruct foundation models via a prompt.

What can Foundation Models Do Compared to Classic ML?

Foundation models and classic machine learning (ML) techniques embody different approaches to problem-solving in the realm of artificial intelligence, each with its distinct advantages, disadvantages, and ideal use-cases. Here’s a nuanced comparison across various application domains and aspects.

The classic ML field is still in several application domains that have emerged over time. These areas can be roughly differentiated into:

classification (of structured inputs; two and multiclass)
regression & time series forecasting
clustering
outlier detection
recommender systems
natural language processing (NLP)
computer vision
audio processing

I belive these categories cover at least 90% of the use cases in the area of machine learning. Next let’s take a closer look at these categories. I will be mostly focusing on performance and discuss other aspects at the end of this article.

#1 Classification (on Tabular Datasets)

In the domain of classification, classic ML approaches like logistic regression, decision trees, and SVMs have been historically employed. If you have been following my blog then you know i have also explained a few of their applications (Classifying Purchase Intention of Online Shoppers with Python etc.).

Performance

These classic algorithms work well when the data is structured and the relationship between the input and the output is relatively straightforward. A classic example is the titantic dataset where the goal is to predict surval of passengers based on age, gender, cabin, etc. Foundation models, due to their ability to learn from vast amounts of data, not only take into account the structured variablesbut potetially also make sense of additional information such as names etc., that would otherwise be more difficult to use. In other cases, LLMs could also make sense of text snippets and extract additional information.

LLMs perform quite well on simpler classification tasks, but for more complex ones with many categories or high-dimensional data, custom trained and hyperparameter-tuned models outperform them. However, the potential in traditional algorithm for further improvements is rather low. On the other hand, foundation models are likely to improve further in the coming year and will thus likely see increasing adoption for more complex classification tasks.

Limitations

Indeed, LLMs offer unparalleled flexibility in this domain, especially with text-based tasks. Their ability to generalize over different types of data makes them highly versatile. Their auto-adjustment to data drift can save costs on retraining, a significant benefit.

For complex cases, the bgiggest limitation is the token limit. Even for LLM finetuning, you will have a hard time training the model with a large number (several hundrets) of input variables. As long as the token limit does not grow considerably, classification tasks are limited to cases with medium complexity. With better fine-tuning options and longer token limits, the use of foundation models will likely grow.

Outlook

Classic ML and foundation models will coexist, with foundation models gaining more traction as they overcome current limitations like token limits.

#2 Regression and Time Series Forecasting

Methods like Linear Regression, ARIMA (AutoRegressive Integrated Moving Average), and Prophet have long been the go-to choices for both regression and time series analysis. Their popularity stems from there interpretability and easy of implementation. Foundation models are not great with mathematics but can create solid forecasts. However, they lack the interpretability and reliability that is often key for forecasting use cases. In addition, for longer time series, the limited context window becomes a problem again, especially as floadting numbers take a lot of tokens. On the other hand, foundation models may be able to identify more complex patterns that traditional techniques would likely oversee.

Also: Stock Market Prediction using Multivariate Time Series and Recurrent Neural Networks in Python

Outlook: At the moment, regression tasks are a domain where classic machine learning has an edge over foundation models. It remainst o be seen how long this will last.

NLP

NLP comprises a variety of disciplines from recognition, text completion, text classification and information extraction, reasoning over text. But with the rise of BERT and ROBERTA this bastion of data science begun to crumble. Now with modern LLMs, it seems entirely lost to foundation models.

In all of these fields, LLMs like GPT-4 show superior performance. The large-scale training and vast data encompassed by foundation models make them highly efficient at generation tasks, often surpassing custom models. GPT-4 and ChatGPT were both able to pass numerous exams inlcuding the Bar exam.

Outlook: Foundation models are already ahead and likely to take the rest of the entire share.

Clustering

As LLMs generate meaningful embeddings, they indeed offer exciting prospects for clustering tasks. Traditional clustering algorithms might still be useful for specific use cases or where interpretability is crucial.

Embeddings work well for clustering techniques. Its the same technique that is based on the distance between objects and works both with numeric inputs as well as with text in different languages.

However, classic ml algorithms were always struggling with more complex patterns in data and as of now I have reason to believe that the situation is similar with LLMs. However, this assessment is only for the traditional LLMs but not for vision models. GPT-4 V was just released a few weeks and its image interpretation capabilities are crazy good. This model will also be able to interpret outliers and identify unusual geometric shapes in cluster diagrams that would be hard to detect with traditional models. Its a game changer for outlier detection and clastering alike.

Outlook: Foundation models taking an increasing spot

Outlier Detection

While outlier detection is performed differently from clustering in classic ml, foundation models perform a similar technique both for clustering and outlier detection.

While LLMs are good at detecting patterns, traditional ML has established algorithms for outlier detection that work exceptionally well in controlled settings, especially when we have a clear understanding of the data’s distribution.

When we are talking about outlier detection, classic ML models will give you a more fine-grained control over what is considered an outlier and here classic ml I believe will stay relevant for

Outlook: Classic ML models remain relevant, although foundation models will be used for certain tasks

Computer Vision

Your differentiation is spot on. While foundation models have made strides in general computer vision tasks, specialized tasks like medical imaging or self-driving cars require models built and trained for that specific purpose.

Outlook: The future will likely have a coexistence between custom trained special models for specific tasks like autononoums systems, and simpler more generic tasks like face detection, object detection and tagging. At the same time, foundation models have added a few new disciplines like image interpreation and image generation, where custom models have never been able to produce good results.

For specific task like getting exact coordinates, computer vision models will have certain limitations. Here traditional models and pretrained models will have an edge.

Outlook: Foundation models taking a large share of computer vision tasks.

Audio Processing

As foundation models expand their capabilities, they’re likely to dominate tasks like transcription or synthesis. However, specialized tasks such as voice biometrics or unique sound classifications might still rely on custom models.

Outlook: Foundation models taking the share entirely

Recommender Systems

Traditional recommendation systems, especially those utilizing collaborative filtering or matrix factorization, have been honed over years and work remarkably well. LLMs can augment these systems by providing context-rich content recommendations or by understanding nuanced user preferences.

In summary, while LLMs have certainly expanded the frontiers of what’s possible, especially in the realm of NLP, they won’t replace traditional ML/DL across the board. Instead, they complement existing systems, offering new capabilities and efficiencies. The choice between the two often boils down to the specifics of the task, the available data, and the desired outcomes. The future likely holds a symbiotic relationship where both coexist, each playing to its strengths.

Outlook: I am unsure. I could imagine that foundation are increasingly be used in combination with classic models.

Additional Aspects:

Interpretability:
- Classic ML: Often more interpretable, which is crucial for understanding model decisions in sensitive domains.
- Foundation Models: The “black box” nature makes them less interpretable, although efforts like LIME or SHAP are attempting to mitigate this issue.
Consistency and reliability:
- Classic ML results are still more predictable. In cases of high stakes, classic ML will be used more than generative AI.
Deployment and Maintenance:
- Classic ML: Easier to deploy and maintain due to their simplicity and lower resource requirements.
- Foundation Models: Can be resource-intensive to deploy and maintain, especially at the scale necessary for real-world applications.
Customization:
- Classic ML: Certain well-understood problems have commoditized solutions available.
- Foundation Models: Pushing the boundary of commoditization further by providing powerful, generalized solutions that can be fine-tuned for specific tasks.

Summary

Not long ago, if somebody would have asked me, if these models could entirely replace classic machine learning, i would have said, of course not, they will still have a place. There were just too many fields, where training a custom machine learning model makes just more sense. Computer vision is a great example, as soon as you want to count specific objects, you would typeically not come around training a custom model for that. Of course you would use existing libraries and maybe pretrained models, but still you would have to do a main part of the job.

Foundation models are undeniably transformative, especially in domains like NLP and Computer Vision, where they often outperform classical ML techniques. However, the necessity for interpretability, lower resource requirements, and specific problem-tailored solutions in certain scenarios ensures that classical ML will continue to hold its relevance. The evolution towards more capable foundation models doesn’t signify the end of classical ML, but rather presents an enriched tapestry of tools and methodologies that practitioners can draw upon to tackle complex problems in the AI landscape.

How does it look today? A few month later GPT-4 has been released, which is already much better at solcingf math problems than GPT3.5. Only a few weeks before, GPT-4 V has been released, which is extremely good at interperting images.

Sources and Further Reading

[2304.11633] Evaluating ChatGPT’s Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness (arxiv.org)

List: Here Are the Exams ChatGPT and GPT-4 Have Passed so Far (businessinsider.com)

The post Foundation Models Are Here: How Will They Impact Traditional Machine Learning? appeared first on relataly.com.

Stock Market Forecasting Neural Networks for Multi-Output Regression in Python

Florian Follonier — Tue, 13 Jul 2021 21:10:23 +0000

Multi-output time series regression can forecast several steps of a time series at once. The number of neurons in the final output layer determines how many steps the model can predict. Models with one output return single-step forecasts. Models with various outputs can return entire series of time steps and thus deliver a more detailed projection of how a time series could develop in the future. This article is a hands-on Python tutorial that shows how to design a neural network architecture with multiple outputs. The goal is to create a multi-output model for stock-price forecasting using Python and Keras. By the end of this tutorial, you will have learned how to design a multi-output model for stock price forecasting using Python and Keras. This knowledge can be applied to other types of time series forecasting tasks, such as weather forecasting or sales forecasting.

This article proceeds as follows: We briefly discuss the architecture of a multi-output neural network. After familiarizing ourselves with the model architecture, we develop a Keras neural network for multi-output regression. For data preparation, we perform various steps, including cleaning, splitting, selecting, and scaling the data. Afterward, we define a model architecture with multiple LSTM layers and ten output neurons in the last layer. This architecture enables the model to generate projections for ten consecutive steps. After configuring the model architecture, we train the model with the historical daily prices of the Apple stock. Finally, we use this model to generate a ten-day forecast.

Disclaimer

This article does not constitute financial advice. Stock markets can be very volatile and are generally difficult to predict. Predictive models and other forms of analytics applied in this article only serve the purpose of illustrating machine learning use cases.

Multi-Output Regression vs. Single-Output Regression

In time series regression, we train a statistical model on the past values of a time series to make statements about how the time series develops further. During model training, we feed the model with so-called mini-batches and the corresponding target values. The model then creates forecasts for all input batches and compares these predictions to the actual target values to calculate the residuals (prediction errors). In this way, the model can adjust its parameters iteratively and learn to make better predictions.

Multivariate forecasting models take into account multiple input variables, such as historical time series data and additional features like moving averages or momentum indicators, to improve the accuracy of their predictions. The idea is that these various variables can help the model identify patterns in the data that suggest future price movements.

An exemplary architecture of a neural network with five input neurons (blue) and four output neurons (red)

The Architecture of a Neural Network with Multiple Outputs

Next, we will discuss the architecture of a neural network with multiple outputs. The architecture consists of several layers, including an input layer, several hidden layers, and an output layer. The number of neurons in the first layer must match the input data, and the number of neurons in the output layer determines the period length of the predictions.

Models with a single neuron in the output layer are used to predict a single time step. It is possible to predict multiple price steps with a single-output model. It requires a rolling forecasting approach in which the outputs are iteratively reused to make further-reaching predictions. However, this way is somewhat cumbersome. A more elegant way is to train a multi-output model right away.

The inputs and outputs of a neural network for time series regression with five input neurons and four outputs

Training Neural Networks with Multiple Outputs

A model with multiple neurons in the output layer can predict numerous steps once per batch. Multi-output regression models train on many sequences of subsequent values, followed by the consecutive output sequence. The model architecture thus contains multiple neurons in the initial layer and various neurons in the output layer (as illustrated).

In a multi-output regression model, each neuron in the output layer is responsible for predicting a different time step in the future. To train such a model, you need to provide a sequence of input data followed by the corresponding sequence of output data. For example, if you want to predict the stock price for the next ten days, you would provide a sequence of input data containing the historical stock prices for the past 50 days, followed by a sequence of output data containing the stock prices for the next 10 days.

The model will then learn to map the input sequence to the output sequence so that it can make predictions for multiple time steps in the future based on the input data.

In the next part of this tutorial, we will walk through the process of developing a multi-output regression model in more detail.

Implementing a Neural Network Model for Multi-Output Multi-Step Regression in Python

Let’s get started with the hands-on Python part. In the following, we will develop a neural network with Keras and Tensorflow that forecasts the Apple stock price. To prepare the data for a neural network with multiple outputs in time series forecasting, we will spend the most time preparing it and bringing it into the right shape. Broadly this involves the following steps:

Load the time series data that we want to use as input and output for your model. We use historical price data that is available via the yahoo finance API.
Then we split our data into training and testing sets. We will use the training set to fit the model and the testing set to evaluate the model’s performance.
Preprocess the data: This includes scaling the data and selecting relevant features.
Reshape the data and bring them into a format that can be input into the neural network. This involves converting the data into a 3D array for time series data.
Finally, we will train our model and generate the forecasting.

The code is available on the GitHub repository.

View on GitHub Relataly GitHub Repo

Neural network architectures with multiple outputs allow for more potent solutions but are more complex to train. Image created with Midjourney.

Prerequisites

Before beginning the coding part, ensure that you have set up your Python 3 environment and required packages. If you don’t have a Python environment, consider Anaconda. To set it up, you can follow the steps in this tutorial.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:

In addition, we will be using the machine learning libraries Keras, Scikit-learn, and Tensorflow. For visualization, we will be using the Seaborn package.

Please also have either the pandas_datareader or the yfinance package installed. You will use one of these packages to retrieve the historical stock quotes.

You can install these packages using console commands:

pip install
conda install (if you are using the anaconda packet manager)

Step #1: Load the Data

The Pandas DataReader library is our first choice for interacting with the yahoo finance API. If the library causes a problem (it sometimes does), you can also use the yfinance package, which should return the same data. We begin by loading historical price quotes of the Apple stock from the public yahoo finance API. Running the code below will load the data into a Pandas DataFrame.

# import pandas_datareader as webreader # Remote data access for pandas
import math # Mathematical functions 
import numpy as np # Fundamental package for scientific computing with Python
import pandas as pd # Additional functions for analysing and manipulating data
from datetime import date, timedelta, datetime # Date Functions
from pandas.plotting import register_matplotlib_converters # This function adds plotting functions for calender dates
import matplotlib.pyplot as plt # Important package for visualization - we use this to plot the market data
import matplotlib.dates as mdates # Formatting dates
from sklearn.metrics import mean_absolute_error, mean_squared_error # Packages for measuring model performance / errors
from keras.models import Sequential # Deep learning library, used for neural networks
from keras.layers import LSTM, Dense, Dropout # Deep learning classes for recurrent and regular densely-connected layers
from keras.callbacks import EarlyStopping # EarlyStopping during model training
from sklearn.preprocessing import RobustScaler, MinMaxScaler # This Scaler removes the median and scales the data according to the quantile range to normalize the price data 
import seaborn as sns

# from pandas_datareader.nasdaq_trader import get_nasdaq_symbols
# symbols = get_nasdaq_symbols()

# Setting the timeframe for the data extraction
today = date.today()
date_today = today.strftime("%Y-%m-%d")
date_start = '2010-01-01'

# Getting NASDAQ quotes
stockname = 'Apple'
symbol = 'AAPL'
# df = webreader.DataReader(
#     symbol, start=date_start, end=date_today, data_source="yahoo"
# )

import yfinance as yf #Alternative package if webreader does not work: pip install yfinance
df = yf.download(symbol, start=date_start, end=date_today)

# # Create a quick overview of the dataset
df.head()

Tensorflow Version: 2.6.0
Num GPUs: 1
[*********************100%***********************]  1 of 1 completed
			Open		High		Low			Close		Adj Close	Volume
Date						
2010-01-04	7.622500	7.660714	7.585000	7.643214	6.515213	493729600
2010-01-05	7.664286	7.699643	7.616071	7.656429	6.526477	601904800
2010-01-06	7.656429	7.686786	7.526786	7.534643	6.422666	552160000
2010-01-07	7.562500	7.571429	7.466071	7.520714	6.410791	477131200
2010-01-08	7.510714	7.571429	7.466429	7.570714	6.453413	447610800

The data should comprise the following columns:

Close
Open
High
Low
Adj Close
Volume

The target variable that we are trying to predict is the Closing price (Close).

Step #2: Explore the Data

Once we have loaded the data, we print a quick overview of the time-series data using different line graphs. The following code will plot a line chart for each column in df_plot using the seaborn library. The charts will be organized in a grid with nrows number of rows and ncols number of columns. The sharex parameter is set to True, which means that the x-axes of the subplots will be shared. The figsize parameter determines the size of the plot in inches.

# Plot line charts
df_plot = df.copy()

ncols = 2
nrows = int(round(df_plot.shape[1] / ncols, 0))

fig, ax = plt.subplots(nrows=nrows, ncols=ncols, sharex=True, figsize=(14, 7))
for i, ax in enumerate(fig.axes):
        sns.lineplot(data = df_plot.iloc[:, i], ax=ax)
        ax.tick_params(axis="x", rotation=30, labelsize=10, length=0)
        ax.xaxis.set_major_locator(mdates.AutoDateLocator())
fig.tight_layout()
plt.show()

The line plots look as expected and reflect the Apple stock price history. Because we are fetching daily data from an API, please note that the lineplots will look different depending on when you run the code.

Step #3: Preprocess the Data

Next, we prepare the data for the training process of our multi-output forecasting model. Preparing the data for multivariate forecasting involves several steps:

Selecting features for model training
Scaling and splitting the data into separate sets for training and testing
Slicing the time series into several shifted training batches

Remember that the steps are specific to our data and the use case. The steps required to prepare the data for a neural network with multiple outputs in time series forecasting will depend on the characteristics of your data and the requirements of your model. It is essential to consider these factors and tailor your data preparation accordingly and carefully.

3.1 Basic Preparations

We begin by creating a copy of the initial data and resetting the index.

# Indexing Batches
df_train = df.sort_values(by=['Date']).copy()

# We safe a copy of the dates index, before we need to reset it to numbers
date_index = df_train.index

# We reset the index, so we can convert the date-index to a number-index
df_train = df_train.reset_index(drop=True).copy()
df_train.head(5)

			Open		High		Low			Close		Adj Close	Volume
Date						
2022-11-29	144.289993	144.809998	140.350006	141.169998	141.169998	83763800
2022-11-30	141.399994	148.720001	140.550003	148.029999	148.029999	111224400
2022-12-01	148.210007	149.130005	146.610001	148.309998	148.309998	71250400
2022-12-02	145.960007	148.000000	145.649994	147.809998	147.809998	65421400
2022-12-05	147.770004	150.919998	145.770004	146.630005	146.630005	68732400

3.2 Feature Selection and Scaling

We proceed with feature selection. To keep things simple, we will use the features from the input data without any modifications. After selecting the features, we scale them to a range between 0 and 1. To ease unscaling the predictions after training, we create two different scalers: One for the training data, which takes five columns, and one for the output data that scales a single column (the Close Price). I have covered feature engineering in a separate article if you want to learn more about this topic.

def prepare_data(df):

    # List of considered Features
    FEATURES = ['Open', 'High', 'Low', 'Close', 'Volume']

    print('FEATURE LIST')
    print([f for f in FEATURES])

    # Create the dataset with features and filter the data to the list of FEATURES
    df_filter = df[FEATURES]
    
    # Convert the data to numpy values
    np_filter_unscaled = np.array(df_filter)
    #np_filter_unscaled = np.reshape(np_unscaled, (df_filter.shape[0], -1))
    print(np_filter_unscaled.shape)

    np_c_unscaled = np.array(df['Close']).reshape(-1, 1)
    
    return np_filter_unscaled, np_c_unscaled, df_filter
    
np_filter_unscaled, np_c_unscaled, df_filter = prepare_data(df_train)
                                          
# Creating a separate scaler that works on a single column for scaling predictions
# Scale each feature to a range between 0 and 1
scaler_train = MinMaxScaler()
np_scaled = scaler_train.fit_transform(np_filter_unscaled)
    
# Create a separate scaler for a single column
scaler_pred = MinMaxScaler()
np_scaled_c = scaler_pred.fit_transform(np_c_unscaled)

FEATURE LIST
['Open', 'High', 'Low', 'Close', 'Volume']
(3254, 5)

The final step of the data preparation is to create the structure for the input data. This structure needs to match the input layer of the model architecture.

3.3 Slicing the Data for a Model with Multiple In- and Outputs

The code below starts a sliding window process that cuts the initial time series data into multiple slices, i.e., mini-batches. Each batch is a smaller fraction of the initial time series shifted by a single step. Because we will feed our model with multivariate input data, the time series consists of five input columns/features. Each batch comprises a period of 50 steps from the time series and an output sequence of ten consecutive values. To validate that the batches have the right shape, we visualize mini-batches in a line graph with their consecutive target values.

# Set the input_sequence_length length - this is the timeframe used to make a single prediction
input_sequence_length = 50
# The output sequence length is the number of steps that the neural network predicts
output_sequence_length = 10 #

# Prediction Index
index_Close = df_train.columns.get_loc("Close")

# Split the training data into train and train data sets
# As a first step, we get the number of rows to train the model on 80% of the data 
train_data_length = math.ceil(np_scaled.shape[0] * 0.8)

# Create the training and test data
train_data = np_scaled[:train_data_length, :]
test_data = np_scaled[train_data_length - input_sequence_length:, :]

# The RNN needs data with the format of [samples, time steps, features]
# Here, we create N samples, input_sequence_length time steps per sample, and f features
def partition_dataset(input_sequence_length, output_sequence_length, data):
    x, y = [], []
    data_len = data.shape[0]
    for i in range(input_sequence_length, data_len - output_sequence_length):
        x.append(data[i-input_sequence_length:i,:]) #contains input_sequence_length values 0-input_sequence_length * columns
        y.append(data[i:i + output_sequence_length, index_Close]) #contains the prediction values for validation (3rd column = Close),  for single-step prediction
    
    # Convert the x and y to numpy arrays
    x = np.array(x)
    y = np.array(y)
    return x, y

# Generate training data and test data
x_train, y_train = partition_dataset(input_sequence_length, output_sequence_length, train_data)
x_test, y_test = partition_dataset(input_sequence_length, output_sequence_length, test_data)

# Print the shapes: the result is: (rows, training_sequence, features) (prediction value, )
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# Validate that the prediction value and the input match up
# The last close price of the second input sample should equal the first prediction value
nrows = 3 # number of shifted plots
fig, ax = plt.subplots(nrows=nrows, ncols=1, figsize=(16, 8))
for i, ax in enumerate(fig.axes):
    xtrain = pd.DataFrame(x_train[i][:,index_Close], columns={f'x_train_{i}'})
    ytrain = pd.DataFrame(y_train[i][:output_sequence_length-1], columns={f'y_train_{i}'})
    ytrain.index = np.arange(input_sequence_length, input_sequence_length + output_sequence_length-1)
    xtrain_ = pd.concat([xtrain, ytrain[:1].rename(columns={ytrain.columns[0]:xtrain.columns[0]})])
    df_merge = pd.concat([xtrain_, ytrain])
    sns.lineplot(data = df_merge, ax=ax)
plt.show

(2544, 50, 5) (2544, 10)
(640, 50, 5) (640, 10)

Step #4: Prepare the Neural Network Architecture and Train the Multi-Output Regression Model

Now that we have the training data prepared and ready, the next step is to configure the architecture of the multi-out neural network. Because we will be using multiple input series, our model is, in fact, a multivariate architecture so that it corresponds to the input training batches.

4.1 Configuring and Training the Model

We choose a comparably simple architecture with only two LSTM layers and two additional dense layers. The first dense layer has 20 neurons, and the second layer is the output layer, which has ten output neurons. If you wonder how I got to the number of neurons in the third layer, I conducted several experiments and found that this number leads to solid results.

To ensure that the architecture matches our input data’s structure, we reuse the variables for the previous code section (n_input_neurons, n_output_neurons. The input sequence length is 50, and the output sequence (the steps for the period we want to predict) is ten.

# Configure the neural network model
model = Sequential()
n_output_neurons = output_sequence_length

# Model with n_neurons = inputshape Timestamps, each with x_train.shape[2] variables
n_input_neurons = x_train.shape[1] * x_train.shape[2]
print(n_input_neurons, x_train.shape[1], x_train.shape[2])
model.add(LSTM(n_input_neurons, return_sequences=True, input_shape=(x_train.shape[1], x_train.shape[2]))) 
model.add(LSTM(n_input_neurons, return_sequences=False))
model.add(Dense(20))
model.add(Dense(n_output_neurons))

# Compile the model
model.compile(optimizer='adam', loss='mse')

After configuring the model architecture, we can initiate the training process and illustrate how the loss develops over the training epochs.

# Training the model
epochs = 10
batch_size = 16
early_stop = EarlyStopping(monitor='loss', patience=5, verbose=1)
history = model.fit(x_train, y_train, 
                    batch_size=batch_size, 
                    epochs=epochs,
                    validation_data=(x_test, y_test)
                   )
                    
                    #callbacks=[early_stop])

Epoch 1/5
159/159 [==============================] - 7s 14ms/step - loss: 0.0047 - val_loss: 0.0262
Epoch 2/5
159/159 [==============================] - 2s 11ms/step - loss: 3.6759e-04 - val_loss: 0.0097
Epoch 3/5
159/159 [==============================] - 2s 11ms/step - loss: 1.5222e-04 - val_loss: 0.0056
Epoch 4/5
159/159 [==============================] - 2s 11ms/step - loss: 1.0327e-04 - val_loss: 0.0031
Epoch 5/5
159/159 [==============================] - 2s 11ms/step - loss: 1.1690e-04 - val_loss: 0.0026

4.2 Loss Curve

Next, we plot the loss curve, which represents the amount of error between the model’s predicted values and the actual values in the training data. A lower loss value indicates that the model makes more accurate predictions on the training data.

# Plot training & validation loss values
fig, ax = plt.subplots(figsize=(10, 5), sharex=True)
plt.plot(history.history["loss"])
plt.title("Model loss")
plt.ylabel("Loss")
plt.xlabel("Epoch")
ax.xaxis.set_major_locator(plt.MaxNLocator(epochs))
plt.legend(["Train", "Test"], loc="upper left")
plt.grid()
plt.show()

As we can see, the loss curve drops quickly during training, which typically means that the model is quickly learning to make accurate predictions.

Step #5 Evaluate Model Performance

Now that we have trained the model, we can make forecasts on the test data and use traditional regression metrics such as the MAE, MAPE, or MDAPE to measure the performance of our model.

# Get the predicted values
y_pred_scaled = model.predict(x_test)

# Unscale the predicted values
y_pred = scaler_pred.inverse_transform(y_pred_scaled)
y_test_unscaled = scaler_pred.inverse_transform(y_test).reshape(-1, output_sequence_length)

# Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_unscaled, y_pred)
print(f'Median Absolute Error (MAE): {np.round(MAE, 2)}')

# Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled))) * 100
print(f'Mean Absolute Percentage Error (MAPE): {np.round(MAPE, 2)} %')

# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled)) ) * 100
print(f'Median Absolute Percentage Error (MDAPE): {np.round(MDAPE, 2)} %')


def prepare_df(i, x, y, y_pred_unscaled):
    # Undo the scaling on x, reshape the testset into a one-dimensional array, so that it fits to the pred scaler
    x_test_unscaled_df = pd.DataFrame(scaler_pred.inverse_transform((x[i]))[:,index_Close]).rename(columns={0:'x_test'})
    
    y_test_unscaled_df = []
    # Undo the scaling on y
    if type(y) == np.ndarray:
        y_test_unscaled_df = pd.DataFrame(scaler_pred.inverse_transform(y)[i]).rename(columns={0:'y_test'})

    # Create a dataframe for the y_pred at position i, y_pred is already unscaled
    y_pred_df = pd.DataFrame(y_pred_unscaled[i]).rename(columns={0:'y_pred'})
    return x_test_unscaled_df, y_pred_df, y_test_unscaled_df


def plot_multi_test_forecast(x_test_unscaled_df, y_test_unscaled_df, y_pred_df, title): 
    # Package y_pred_unscaled and y_test_unscaled into a dataframe with columns pred and true   
    if type(y_test_unscaled_df) == pd.core.frame.DataFrame:
        df_merge = y_pred_df.join(y_test_unscaled_df, how='left')
    else:
        df_merge = y_pred_df.copy()
    
    # Merge the dataframes 
    df_merge_ = pd.concat([x_test_unscaled_df, df_merge]).reset_index(drop=True)
    
    # Plot the linecharts
    fig, ax = plt.subplots(figsize=(20, 8))
    plt.title(title, fontsize=12)
    ax.set(ylabel = stockname + "_stock_price_quotes")
    sns.lineplot(data = df_merge_, linewidth=2.0, ax=ax)

# Creates a linechart for a specific test batch_number and corresponding test predictions
batch_number = 50
x_test_unscaled_df, y_pred_df, y_test_unscaled_df = prepare_df(i, x_test, y_test, y_pred)
title = f"Predictions vs y_test - test batch number {batch_number}"
plot_multi_test_forecast(x_test_unscaled_df, y_test_unscaled_df, y_pred_df, title)

The quality of the predictions is acceptable, considering that this tutorial aimed not to achieve excellent predictions but to demonstrate the process and architecture of training a multi-output regression. So, there is certainly room for improvement. Feel free to experiment with different features or try other hyperparameters and neural network layers.

Step #6 Create a New Forecast

Finally, let’s create a forecast on a new dataset. We take the scaled dataset from section 2 (np_scaled) and extract a series with the latest 50 values. The data is reshaped into a 3D array with shape (1, 50, 5) to match the expected input shape of the model. We use these values to generate a new prediction for the next ten days using the predict method. We store the result in the y_pred_scaled variable. In addition, we need to transform the predictions back to the original scale. We do this by using the inverse_transform method of the scaler_pred object, which was fit on the training data. Finally, we visualize the multi-step forecast in another line chart.

# Get the latest input batch from the test dataset, which is contains the price values for the last ten trading days
x_test_latest_batch = np_scaled[-50:,:].reshape(1,50,5)

# Predict on the batch
y_pred_scaled = model.predict(x_test_latest_batch)
y_pred_unscaled = scaler_pred.inverse_transform(y_pred_scaled)

# Prepare the data and plot the input data and the predictions
x_test_unscaled_df, y_test_unscaled_df, _ = prepare_df(0, x_test_latest_batch, '', y_pred_unscaled)
plot_multi_test_forecast(x_test_unscaled_df, '', y_test_unscaled_df, "x_new Vs. y_new_pred")

Summary

In this tutorial, we demonstrated how to use multiple output neural networks to make predictions at different time steps. We first discussed the architecture of a recurrent neural network and how it can be used to process sequential data. We then showed how to properly preprocess the data and split it into training and test sets for training a multi-output regression model.

Next, we trained a model to predict the stock price of Apple ten steps into the future using historical data. We also discussed how to use the trained model to make multi-step predictions on new data and how to visualize the results.

To further improve the performance of the model, you can experiment with different hyperparameters and adjust the model architecture. For example, adding more neurons to the output layers will increase the prediction horizon, but remember that prediction error will also increase as the horizon lengthens. You can also try using different activation functions or adding more layers to the model to see how it affects the performance.

I hope this article was helpful in understanding multi-output neural networks better. If you have any questions or comments, please let me know.

Sources and Further Reading

The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.

If you want to learn about an alternative approach to univariate stock market forecasting, consider taking a look at Facebook Prophet or ARIMA models

The post Stock Market Forecasting Neural Networks for Multi-Output Regression in Python appeared first on relataly.com.

Mastering Multivariate Stock Market Prediction with Python: A Guide to Effective Feature Engineering Techniques

Florian Follonier — Mon, 29 Jun 2020 21:47:28 +0000

Are you interested in learning how multivariate forecasting models can enhance the accuracy of stock market predictions? Look no further! While traditional time series data provides valuable insights into historical trends, multivariate forecasting models utilize additional features to identify patterns and predict future price movements. This process, known as “feature engineering,” is a crucial step in creating accurate stock market forecasts.

In this article, we dive into the world of feature engineering and demonstrate how it can improve stock market predictions. We explore popular financial analysis metrics, including Bollinger bands, RSI, and Moving Averages, and show how they can be used to create powerful forecasting models.

But we don’t just stop at theory. We provide a hands-on tutorial using Python to prepare and analyze time-series data for stock market forecasting. We leverage the power of recurrent neural networks with LSTM layers, based on the Keras library, to train and test different model variations with various feature combinations.

By the end of this article, you’ll have a thorough understanding of feature engineering and how it can improve the accuracy of stock market predictions. So, buckle up and get ready to discover how multivariate forecasting models can take your stock market analysis to the next level!

New to time series modeling?
Consider starting with the following tutorial on univariate time series models: Stock-market forecasting using Keras Recurrent Neural Networks and Python.

Disclaimer: This article does not constitute financial advice. Stock markets can be very volatile and are generally difficult to predict. Predictive models and other forms of analytics applied in this article only illustrate machine learning use cases.

Squirrels mastered the art of multivariate feature engineering for time series analysis a long time ago. You can do it too! Image generated with Midjourney

Feature Engineering for Stock Market Forecasting – Borrowing Features from Chart Analysis

The idea behind multivariate time series models is to feed the model with additional features that improve prediction quality. An example of such an additional feature is a “moving average.” Adding more features does not automatically improve predictive performance but increases the time needed to train the models. The challenge is to find the right combination of features and to create an input form that allows the model to recognize meaningful patterns. There is no way around conducting experiments and trying out feature combinations. This process of trial and error can be time-consuming. It is, therefore, helping to build upon established indicators.

In stock market forecasting, we can use indicators from chart analysis. This domain forecasts future prices by studying historical prices and trading volume. The underlying idea is that specific patterns or chart formations in the data can signal the timing of beneficial buying or selling decisions. We can borrow indicators from this discipline and use them as input features.

When we develop predictive machine learning models, the difference from chart analysis is that we do not aim to analyze the chart ourselves manually, but try to create a machine learning model, for example, a recurrent neural network, that does the job for us.

A multivariate time-series forecast, as we will create it in this article. Exemplary chart with technical indicators (Bollinger bands, RSI, and Double-EMA)

Stock Market Forecasting – Does this really Work?

It is essential to point out that the effectiveness of chart analysis and algorithmic trading is controversial. There is at least as much controversy about whether it is possible to predict the price of stock markets with neural networks. Various studies and researchers have examined the effectiveness of chart analysis with different results. One of the most significant points of criticism is that it cannot take external events into account. Nevertheless, many financial analysts consider financial indicators when making investment decisions, so a lot of money is moved simply because many people believe in statistical indicators.

So without knowing how well this will work, it is worth an attempt to feed a neural network with different financial indicators. But first and foremost, I see this as an excellent way to show how feature engineering works. Just make sure not to rely on the predictions of these models blindly.

Also: Stock Market Prediction using Univariate Recurrent Neural Networks

Selected Statistical Indicators

The following indicators are commonly used in chart analysis and may be helpful when creating forecasting models:

Relative Strength Index
Simple Moving Averages
Exponential Moving Averages
Bolliger Bands

Relative Strength Index (RSI)

The Relative Strength Index (RSI) is one of the most commonly used oscillating indicators. In 1978, Welles Wilder developed it to determine the momentum of price movements and compare the strength of price losses in a period with price gains. It can take percentage values between 0 and 100.

Reference lines determine how long an existing trend will last before expecting a trend reversal. In other words, when the price is heavily oversold or overbought, one should expect a trend reversal.

The reference line is at 40% (oversold) and 80% (overbought) with an upward trend.
The reference line is at 20% (oversold) and 60% (overbought) with a downtrend.

The formula for the RSI is as follows:

Calculate the sum of all positive and negative price changes in a period (e.g., 30 days):
We then calculate the mean value of the sums with the following formula:
Finally, we calculate the RSI with the following formula:

Simple Moving Averages (SMA)

Simple Moving Averages (SMA) is another technical indicator that financial analysts use to determine if a price trend will continue or reverse. The SMA is the average sum of all values within a certain period. Financial analysts pay close attention to the 200-day SMA (SMA-200). When the price crosses the SMA, this may signal a trend reversal. Furthermore, we often use SMAs for 50 (SMA-50) and 100 days (SMA-100) periods. In this regard, two popular trading patterns include the death cross and a golden cross.

A death cross occurs when the trend line of the SMA-50/100 crosses below the 200-day SMA. This suggests that a falling trend will likely accelerate downwards.
A golden cross occurs when the trend line of the SMA-50/100 crosses over the 200-day SMA, suggesting a rising trend will likely accelerate upwards.

We can use the SMA in the input shape of our model simply by measuring the distance between two trendlines.

Exponential Moving Averages (EMA)

The exponential moving average (EMA) is another lagging trend indicator. Like the SMA, the EMA measures the strength of a price trend. The difference between SMA and EMA is that the SMA assigns equal values to all price points, while the EMA uses a multiplier that weights recent prices higher.

The formula for the EMA is as follows: Calculating the EMA for a given data point requires past price values. For example, to calculate the SMA for today, based on 30 past values, we calculate the average price values for the past 30 days. We then multiply the result by a weighting factor that weighs the EMA. The formula for this multiplier is as follows: Smoothing factor / (1+ days)

It is common to use different smoothing factors. For a 30-day moving average, the multiplier would be [2/(30+1)]= 0.064.

As soon as we have calculated the EMA for the first data point, we can use the following formula to calculate the ema for all subsequent data points: EMA = Closing price x multiplier + EMA (previous day) x (1-multiplier)

Bollinger Bands

Bollinger Bands are a popular technical analysis tool used to identify market volatility and potential price movements in financial markets. They are named after their creator, John Bollinger.

Bollinger Bands consist of three lines that are plotted on a price chart. The middle line is a simple moving average (SMA) of the asset price over a specified period (typically 20 days). The upper and lower lines are calculated by adding and subtracting a multiple (usually two) of the standard deviation of the asset price from the middle line.

The upper band is calculated as: Middle band + (2 x Standard deviation) The lower band is calculated as: Middle band – (2 x Standard deviation)

The standard deviation is a measure of how much the asset price deviates from the average. When the asset price is more volatile, the bands widen, and when the price is less volatile, the bands narrow.

Traders use Bollinger Bands to identify potential buy or sell signals. When the price touches or crosses the upper band, it may be a sell signal, indicating that the asset is overbought. Conversely, when the price touches or crosses the lower band, it may be a buy signal, indicating that the asset is oversold.

Feature Engineering for Time Series Prediction Models in Python

In the following, this tutorial will guide you through the process of implementing a multivariate time series prediction model for the NASDAQ stock market index. Our aim is to equip you with the knowledge and practical skills required to create a powerful predictive model that can effectively forecast stock prices.

Throughout this tutorial, we will take you through a step-by-step approach to building a multivariate time series prediction model. You will learn how to implement and utilize different features to train and measure the performance of your model. Our goal is to ensure that you are not only able to understand the underlying concepts of multivariate time series prediction, but that you are also capable of applying these concepts in a practical setting.

The code is available on the GitHub repository.

View on GitHub Relataly GitHub Repo

Let’s do some feature engineering for machine learning!

" data-image-caption="

Let’s do some feature engineering for machine learning!

" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/robot-artificial-intelligence-colorful-midjourney-relataly-min.png" src="https://www.relataly.com/wp-content/uploads/2023/03/robot-artificial-intelligence-colorful-midjourney-relataly-min.png" alt="Let's do some feature engineering for machine learning!" class="wp-image-12671" srcset="https://www.relataly.com/wp-content/uploads/2023/03/robot-artificial-intelligence-colorful-midjourney-relataly-min.png 496w, https://www.relataly.com/wp-content/uploads/2023/03/robot-artificial-intelligence-colorful-midjourney-relataly-min.png 298w, https://www.relataly.com/wp-content/uploads/2023/03/robot-artificial-intelligence-colorful-midjourney-relataly-min.png 140w" sizes="(max-width: 496px) 100vw, 496px" />

Let’s do some feature engineering for machine learning!

Prerequisites

Before starting the coding part, make sure that you have set up your Python 3 environment and required packages. If you don’t have an environment, follow this tutorial to set up the Anaconda environment.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:

In addition, we will be using Keras (2.0 or higher) with Tensorflow backend to train the neural network, the machine learning library scikit-learn, and the pandas-DataReader. You can install these packages using the following console commands:

pip install
conda install (if you are using the anaconda packet manager)

Step #1 Load the Data

Let’s start by setting up the imports and loading the data. Our Python project will use price data from the NASDAQ composite index (symbol: ^IXIC) from yahoo.finance.com.

# Time Series Forecasting - Feature Engineering For Multivariate Models (Stock Market Prediction Example)
# A tutorial for this file is available at www.relataly.com

import math # Mathematical functions  
import numpy as np # Fundamental package for scientific computing with Python 
import pandas as pd # Additional functions for analysing and manipulating data 
from datetime import date # Date Functions 
import matplotlib.pyplot as plt # Important package for visualization - we use this to plot the market data 
import matplotlib.dates as mdates # Formatting dates 
from sklearn.metrics import mean_absolute_error, mean_squared_error # Packages for measuring model performance / errors 
import tensorflow as tf
from tensorflow.keras.models import Sequential # Deep learning library, used for neural networks 
from tensorflow.keras.layers import LSTM, Dense, Dropout # Deep learning classes for recurrent and regular densely-connected layers 
from tensorflow.keras.callbacks import EarlyStopping # EarlyStopping during model training 
from sklearn.preprocessing import RobustScaler # This Scaler removes the median and scales the data according to the quantile range to normalize the price data  
#from keras.optimizers import Adam # For detailed configuration of the optimizer 
import seaborn as sns # Visualization
sns.set_style('white', { 'axes.spines.right': False, 'axes.spines.top': False})


# check the tensorflow version and the number of available GPUs
print('Tensorflow Version: ' + tf.__version__)
physical_devices = tf.config.list_physical_devices('GPU')
print("Num GPUs:", len(physical_devices))

# Setting the timeframe for the data extraction
end_date =  date.today().strftime("%Y-%m-%d")
start_date = '2010-01-01'

# Getting NASDAQ quotes
stockname = 'NASDAQ'
symbol = '^IXIC'

# You can either use webreader or yfinance to load the data from yahoo finance
# import pandas_datareader as webreader
# df = webreader.DataReader(symbol, start=start_date, end=end_date, data_source="yahoo")

import yfinance as yf #Alternative package if webreader does not work: pip install yfinance
df = yf.download(symbol, start=start_date, end=end_date)

# Quick overview of dataset
df.head()

Tensorflow Version: 2.5.0
Num GPUs: 1
[*********************100%***********************]  1 of 1 completed
			Open		High		Low	Close	Adj 		Close		Volume
Date						
2009-12-31	2292.919922	2293.590088	2269.110107	2269.149902	2269.149902	1237820000
2010-01-04	2294.409912	2311.149902	2294.409912	2308.419922	2308.419922	1931380000
2010-01-05	2307.270020	2313.729980	2295.620117	2308.709961	2308.709961	2367860000
2010-01-06	2307.709961	2314.070068	2295.679932	2301.090088	2301.090088	2253340000
2010-01-07	2298.090088	2301.300049	2285.219971	2300.050049	2300.050049	2270050000

Step #2 Explore the Data

Let’s take a quick look at the data by creating line charts for the columns of our data set.

# Plot line charts
df_plot = df.copy()

ncols = 2
nrows = int(round(df_plot.shape[1] / ncols, 0))

fig, ax = plt.subplots(nrows=nrows, ncols=ncols, sharex=True, figsize=(14, 7))
for i, ax in enumerate(fig.axes):
        sns.lineplot(data = df_plot.iloc[:, i], ax=ax)
        ax.tick_params(axis="x", rotation=30, labelsize=10, length=0)
        ax.xaxis.set_major_locator(mdates.AutoDateLocator())
fig.tight_layout()
plt.show()

Our initial dataset includes six features: High, Low, Open, Close, Volumen, and Adj Close.

Step #3 Feature Engineering

Now comes the exciting part – we will implement additional features. We use various indicators from chart analysis, such as averages for different periods and stochastic oscillators to measure price momentum.

# Indexing Batches
train_df = df.sort_values(by=['Date']).copy()

# Adding Month and Year in separate columns
d = pd.to_datetime(train_df.index)
train_df['Day'] = d.strftime("%d") 
train_df['Month'] = d.strftime("%m") 
train_df['Year'] = d.strftime("%Y") 
train_df

			Open		High		Low			Close		Adj Close	Volume		Day	Month	Year
Date									
2009-12-31	2292.919922	2293.590088	2269.110107	2269.149902	2269.149902	1237820000	31	12		2009
2010-01-04	2294.409912	2311.149902	2294.409912	2308.419922	2308.419922	1931380000	04	01		2010
2010-01-05	2307.270020	2313.729980	2295.620117	2308.709961	2308.709961	2367860000	05	01		2010
2010-01-06	2307.709961	2314.070068	2295.679932	2301.090088	2301.090088	2253340000	06	01		2010
2010-01-07	2298.090088	2301.300049	2285.219971	2300.050049	2300.050049	2270050000	07	01		2010

We create a set of indicators for the training data with the following code. However, we will make one more restriction in the next step since a model with all these indicators does not achieve good results and would take far too long to train on a local computer.

# Feature Engineering
def createFeatures(df):
    df = pd.DataFrame(df)

    
    df['Close_Diff'] = df['Adj Close'].diff()
        
    # Moving averages - different periods
    df['MA200'] = df['Close'].rolling(window=200).mean() 
    df['MA100'] = df['Close'].rolling(window=100).mean() 
    df['MA50'] = df['Close'].rolling(window=50).mean() 
    df['MA26'] = df['Close'].rolling(window=26).mean() 
    df['MA20'] = df['Close'].rolling(window=20).mean() 
    df['MA12'] = df['Close'].rolling(window=12).mean() 
    
    # SMA Differences - different periods
    df['DIFF-MA200-MA50'] = df['MA200'] - df['MA50']
    df['DIFF-MA200-MA100'] = df['MA200'] - df['MA100']
    df['DIFF-MA200-CLOSE'] = df['MA200'] - df['Close']
    df['DIFF-MA100-CLOSE'] = df['MA100'] - df['Close']
    df['DIFF-MA50-CLOSE'] = df['MA50'] - df['Close']
    
    # Moving Averages on high, lows, and std - different periods
    df['MA200_low'] = df['Low'].rolling(window=200).min()
    df['MA14_low'] = df['Low'].rolling(window=14).min()
    df['MA200_high'] = df['High'].rolling(window=200).max()
    df['MA14_high'] = df['High'].rolling(window=14).max()
    df['MA20dSTD'] = df['Close'].rolling(window=20).std() 
    
    # Exponential Moving Averages (EMAS) - different periods
    df['EMA12'] = df['Close'].ewm(span=12, adjust=False).mean()
    df['EMA20'] = df['Close'].ewm(span=20, adjust=False).mean()
    df['EMA26'] = df['Close'].ewm(span=26, adjust=False).mean()
    df['EMA100'] = df['Close'].ewm(span=100, adjust=False).mean()
    df['EMA200'] = df['Close'].ewm(span=200, adjust=False).mean()

    # Shifts (one day before and two days before)
    df['close_shift-1'] = df.shift(-1)['Close']
    df['close_shift-2'] = df.shift(-2)['Close']

    # Bollinger Bands
    df['Bollinger_Upper'] = df['MA20'] + (df['MA20dSTD'] * 2)
    df['Bollinger_Lower'] = df['MA20'] - (df['MA20dSTD'] * 2)
    
    # Relative Strength Index (RSI)
    df['K-ratio'] = 100*((df['Close'] - df['MA14_low']) / (df['MA14_high'] - df['MA14_low']) )
    df['RSI'] = df['K-ratio'].rolling(window=3).mean() 

    # Moving Average Convergence/Divergence (MACD)
    df['MACD'] = df['EMA12'] - df['EMA26']
    
    # Replace nas 
    nareplace = df.at[df.index.max(), 'Close']    
    df.fillna((nareplace), inplace=True)
    
    return df

Now that we have created several features, we will limit them. We can now choose from these features and test how different feature combinations affect model performance.

# List of considered Features
FEATURES = [
#             'High',
#             'Low',
#             'Open',
              'Close',
#             'Volume',
#             'Day',
#             'Month',
#             'Year',
#             'Adj Close',
#              'close_shift-1',
#              'close_shift-2',
#             'MACD',
#             'RSI',
#             'MA200',
#             'MA200_high',
#             'MA200_low',
            'Bollinger_Upper',
            'Bollinger_Lower',
#             'MA100',            
#             'MA50',
#             'MA26',
#             'MA14_low',
#             'MA14_high',
#             'MA12',
#             'EMA20',
#             'EMA100',
#             'EMA200',
#               'DIFF-MA200-MA50',
#               'DIFF-MA200-MA100',
#             'DIFF-MA200-CLOSE',
#             'DIFF-MA100-CLOSE',
#             'DIFF-MA50-CLOSE'
           ]

# Create the dataset with features
df_features = createFeatures(train_df)

# Shift the timeframe by 10 month
use_start_date = pd.to_datetime("2010-11-01" )
df_features = df_features[df_features.index > use_start_date].copy()

# Filter the data to the list of FEATURES
data_filtered_ext = df_features[FEATURES].copy()

# We add a prediction column and set dummy values to prepare the data for scaling
#data_filtered_ext['Prediction'] = data_filtered_ext['Close'] 
print(data_filtered_ext.tail().to_string())

# remove Date column before training
dfs = data_filtered_ext.copy()

# Create a list with the relevant columns
assetname_list = [dfs.columns[i-1] for i in range(dfs.shape[1])]

# Create the lineplot
fig, ax = plt.subplots(figsize=(16, 8))
sns.lineplot(data=data_filtered_ext[assetname_list], linewidth=1.0, dashes=False, palette='muted')

# Configure and show the plot    
ax.set_title(stockname + ' price chart')
ax.legend()
plt.show

 			Close  			Bollinger_Upper  Bollinger_Lower
Date                                                      
2022-05-18  11418.150391     13404.779247     11065.040772
2022-05-19  11388.500000     13285.741255     11005.463725
2022-05-20  11354.620117     13214.664450     10928.073538
2022-05-23  11535.269531     13075.594634     10920.185347
2022-05-24  11264.450195     13035.543222     10837.607755

Step #4 Scaling and Transforming the Data

Before training our model, we need to transform the data. This step includes scaling the data (to a range between 0 and 1) and dividing it into separate sets for training and testing the prediction model. Most of the code used in this section stems from the previous article on multivariate time-series prediction, which covers the steps to transform the data. So we don’t go into too much detail here.

# Calculate the number of rows in the data
nrows = dfs.shape[0]
np_data_unscaled = np.reshape(np.array(dfs), (nrows, -1))
print(np_data_unscaled.shape)

# Transform the data by scaling each feature to a range between 0 and 1
scaler = RobustScaler()
np_data = scaler.fit_transform(np_data_unscaled)

# Creating a separate scaler that works on a single column for scaling predictions
scaler_pred = RobustScaler()
df_Close = pd.DataFrame(data_filtered_ext['Close'])
np_Close_scaled = scaler_pred.fit_transform(df_Close)

Out: (2619, 6)

Once we have scaled the data, we will split the data into a train and test set. This step creates four datasets x_train and x_test, and y_train and y_test. x_train and x_test contain the data with our selected features. The two sets, y_train and y_test, have the actual values, which our model will try to predict.

# Set the sequence length - this is the timeframe used to make a single prediction
sequence_length = 50 # = number of neurons in the first layer of the neural network

# Split the training data into train and train data sets
# As a first step, we get the number of rows to train the model on 80% of the data 
train_data_len = math.ceil(np_data.shape[0] * 0.8)

# Create the training and test data
train_data = np_data[:train_data_len, :]
test_data = np_data[train_data_len - sequence_length:, :]

# The RNN needs data with the format of [samples, time steps, features]
# Here, we create N samples, sequence_length time steps per sample, and 6 features
def partition_dataset(sequence_length, data):
    x, y = [], []
    data_len = data.shape[0]

    for i in range(sequence_length, data_len):
        x.append(data[i-sequence_length:i,:]) #contains sequence_length values 0-sequence_length * columsn
        y.append(data[i, 0]) #contains the prediction values for validation,  for single-step prediction
    
    # Convert the x and y to numpy arrays
    x = np.array(x)
    y = np.array(y)
    return x, y

# Generate training data and test data
x_train, y_train = partition_dataset(sequence_length, train_data)
x_test, y_test = partition_dataset(sequence_length, test_data)

# Print the shapes: the result is: (rows, training_sequence, features) (prediction value, )
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# Validate that the prediction value and the input match up
# The last close price of the second input sample should equal the first prediction value
print(x_train[1][sequence_length-1][0])
print(y_train[0])

Out:
(1914, 30, 3) (1914,) 
(486, 30, 3) (486,)

Step #5 Train the Time Series Forecasting Model

Now that we have prepared the data, we can train our forecasting model. For this purpose, we will use a recurrent neural network from the Keras library. A recurrent neural network (RNN) is a type of artificial neural network that can process sequential data, such as text, audio, or time series data. Unlike traditional feedforward neural networks, in which data flows through the network in only one direction, RNNs have connections that form a directed cycle, allowing information to flow in multiple directions and be processed in a temporal manner.

The model architecture of our RNN looks as follows:

LSTM layer that receives a mini-batch as input.
LSTM layer that has the same number of neurons as the mini-batch
Another LSTM layer that does not return the sequence
Dense layer with 32 neurons
Dense layer with one neuron that outputs the forecast

The architecture is not too complex and is suitable for experimenting with different features. I arrived at this architecture by trying out different layers and configurations. However, I did not spend too much time fine-tuning the architecture since this tutorial focuses on feature engineering.

During model training, the neural network processes several mini-batches. The shape of the mini-batch is defined by the number of features and the period chosen. Multiplying these two dimensions (number of features x number of time steps) gives the input shape of our model.

The following code defines the model architecture, trains the model, and then prints the training loss curve:

# Configure the neural network model
model = Sequential()

# Configure the Neural Network Model with n Neurons - inputshape = t Timestamps x f Features
n_neurons = x_train.shape[1] * x_train.shape[2]
print('timesteps: ' + str(x_train.shape[1]) + ',' + ' features:' + str(x_train.shape[2]))
model.add(LSTM(n_neurons, return_sequences=True, input_shape=(x_train.shape[1], x_train.shape[2]))) 
#model.add(Dropout(0.1))
model.add(LSTM(n_neurons, return_sequences=True))
#model.add(Dropout(0.1))
model.add(LSTM(n_neurons, return_sequences=False))
model.add(Dense(32))
model.add(Dense(1, activation='relu'))


# Configure the Model   
optimizer='adam'; loss='mean_squared_error'; epochs = 100; batch_size = 32; patience = 8; 

# uncomment to customize the learning rate
learn_rate = "standard" # 0.05
# adam = Adam(learn_rate=learn_rate) 

parameter_list = ['epochs ' + str(epochs), 'batch_size ' + str(batch_size), 'patience ' + str(patience), 'optimizer ' + str(optimizer) + ' with learn rate ' + str(learn_rate), 'loss ' + str(loss)]
print('Parameters: ' + str(parameter_list))

# Compile and Training the model
model.compile(optimizer=optimizer, loss=loss)
early_stop = EarlyStopping(monitor='loss', patience=patience, verbose=1)
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, callbacks=[early_stop], shuffle = True,
                  validation_data=(x_test, y_test))

# Plot training & validation loss values
fig, ax = plt.subplots(figsize=(12, 6), sharex=True)
plt.plot(history.history["loss"])
plt.title("Model loss")
plt.ylabel("Loss")
plt.xlabel("Epoch")
ax.xaxis.set_major_locator(plt.MaxNLocator(epochs))
plt.legend(["Train", "Test"], loc="upper left")
plt.show()

timesteps: 50, features:1
Parameters: ['epochs 100', 'batch_size 32', 'patience 8', 'optimizer adam with learn rate standard', 'loss mean_squared_error']
Epoch 1/100
72/72 [==============================] - 9s 55ms/step - loss: 0.0990 - val_loss: 0.2985
Epoch 2/100
72/72 [==============================] - 3s 37ms/step - loss: 0.0932 - val_loss: 0.1768
Epoch 3/100
72/72 [==============================] - 3s 39ms/step - loss: 0.0931 - val_loss: 0.1246
Epoch 4/100
72/72 [==============================] - 3s 37ms/step - loss: 0.0931 - val_loss: 0.0902
Epoch 5/100
72/72 [==============================] - 3s 38ms/step - loss: 0.0929 - val_loss: 0.0846
Epoch 6/100
72/72 [==============================] - 3s 38ms/step - loss: 0.0930 - val_loss: 0.0611
Epoch 7/100
72/72 [==============================] - 3s 38ms/step - loss: 0.0929 - val_loss: 0.0498
Epoch 8/100
72/72 [==============================] - 3s 37ms/step - loss: 0.0928 - val_loss: 0.0208
Epoch 9/100
72/72 [==============================] - 3s 38ms/step - loss: 0.0929 - val_loss: 0.0588
Epoch 10/100
72/72 [==============================] - 3s 37ms/step - loss: 0.0928 - val_loss: 0.0437
Epoch 11/100
72/72 [==============================] - 3s 36ms/step - loss: 0.0928 - val_loss: 0.0192
Epoch 12/100
...
72/72 [==============================] - 3s 38ms/step - loss: 0.0925 - val_loss: 0.0094
Epoch 46/100
72/72 [==============================] - 3s 37ms/step - loss: 0.0925 - val_loss: 0.0113
Epoch 00046: early stopping

The loss drops quickly, and the training process looks promising.

Step #6 Evaluate Model Performance

If we test a feature, we also want to know how it impacts the performance of our model. Feature Engineering is therefore closely related to evaluating model performance. So, let’s check the prediction performance. For this purpose, we score the model with the test data set (x_test). Then we can compare the predictions with the actual values (y_test) in a lineplot.

# Get the predicted values
y_pred_scaled = model.predict(x_test)

# Unscale the predicted values
y_pred = scaler_pred.inverse_transform(y_pred_scaled)
y_test_unscaled = scaler_pred.inverse_transform(y_test.reshape(-1, 1))
y_test_unscaled.shape

# Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_unscaled, y_pred)
print(f'Median Absolute Error (MAE): {np.round(MAE, 2)}')

# Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled))) * 100
print(f'Mean Absolute Percentage Error (MAPE): {np.round(MAPE, 2)} %')

# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled)) ) * 100
print(f'Median Absolute Percentage Error (MDAPE): {np.round(MDAPE, 2)} %')

# The date from which on the date is displayed
display_start_date = "2019-01-01" 

# Add the difference between the valid and predicted prices
train = pd.DataFrame(dfs['Close'][:train_data_len + 1]).rename(columns={'Close': 'y_train'})
valid = pd.DataFrame(dfs['Close'][train_data_len:]).rename(columns={'Close': 'y_test'})
valid.insert(1, "y_pred", y_pred, True)
valid.insert(1, "residuals", valid["y_pred"] - valid["y_test"], True)
df_union = pd.concat([train, valid])

# Zoom in to a closer timeframe
df_union_zoom = df_union[df_union.index > display_start_date]

# Create the lineplot
fig, ax1 = plt.subplots(figsize=(16, 8))
plt.title("y_pred vs y_test")
plt.ylabel(stockname, fontsize=18)
sns.set_palette(["#090364", "#1960EF", "#EF5919"])
sns.lineplot(data=df_union_zoom[['y_pred', 'y_train', 'y_test']], linewidth=1.0, dashes=False, ax=ax1)

# Create the barplot for the absolute errors
df_sub = ["#2BC97A" if x > 0 else "#C92B2B" for x in df_union_zoom["residuals"].dropna()]
ax1.bar(height=df_union_zoom['residuals'].dropna(), x=df_union_zoom['residuals'].dropna().index, width=3, label='absolute errors', color=df_sub)
plt.legend()
plt.show()

Median Absolute Error (MAE): 547.23 
Mean Absolute Percentage Error (MAPE): 4.04 % 
Median Absolute Percentage Error (MDAPE): 3.73 %

On average, the predictions of our model deviate from the actual values by about one percent. Although one percent may not sound like a lot, the prediction errors can quickly accumulate to larger values.

Step #7 Overview of Selected Models

In writing this article, I tested various models based on different features. The neural network architecture remained unchanged. Likewise, I kept the hyperparameters the same except for the learning rate. Below are the results of these model variants:

Step #8 Conclusions

Estimating which indicators will lead to good results in advance is difficult. More indicators do not necessarily lead to better results because they increase the model complexity and add data without predictive power. This so-called noise makes it harder for the model to separate important influencing factors from less important ones. Also, each additional indicator increases the time needed to train the model. So there is no way around testing different variants.

Besides the feature, various hyperparameters such as the learning rate, optimizer, batch size, and the selected time frame of the data (sequence_length) impact the model’s performance. Tuning these hyperparameters can further improve model performance.

A learning rate of 0.05 achieves the best results from the tested configurations.
Of all features, only the Bollinger bands positively affected the model’s performance.
As expected, the performance tends to decrease with the number of features.
In our case, the hyperparameters seem to affect the performance of the models more than the choice of features.

Finally, we have optimized only a single parameter. We searched for optimal learning rates while leaving all other parameters unchanged, such as the optimizer, the neural network architecture, or the sequence length. Based on the results, we can draw several conclusions:

There is plenty of room for improvement and experimentation. With more time for experiments and computational power, it will undoubtedly be possible to identify better features and model configurations. So, have fun experimenting! 🙂

Summary

In this tutorial, we have delved into the fascinating world of feature engineering for stock market forecasting using Python. By exploring various features from chart analysis, such as RSI, moving averages, and Bollinger bands, we have developed multiple variants of a recurrent neural network that produce distinct prediction models.

Our experiments have shown that the choice of features can have a significant impact on the performance of the prediction model. Therefore, it’s essential to carefully select features and consider their potential impact on the model. Additionally, keep in mind that the most effective features for recognizing patterns in historical data will vary depending on the specific time series data being analyzed.

By following the crucial steps outlined in this tutorial, you now have the knowledge and tools to apply feature engineering techniques to any multivariate time series forecasting problem. With further experimentation and testing, you can fine-tune your models to achieve the best possible results for your specific use case.

We hope you found this tutorial both informative and helpful. If you have any questions or comments, don’t hesitate to reach out and let us know.

And if you want to learn more about feature preparation and exploration, check out my recent article on Exploratory Feature Preparation for Regression Models.

Sources and Further Reading

The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.

Books on Applied Machine Learning

The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.

The post Mastering Multivariate Stock Market Prediction with Python: A Guide to Effective Feature Engineering Techniques appeared first on relataly.com.

Stock Market Prediction using Multivariate Time Series and Recurrent Neural Networks in Python

Florian Follonier — Mon, 01 Jun 2020 17:20:46 +0000

Regression models based on recurrent neural networks (RNN) can recognize patterns in time series data, making them an exciting technology for stock market forecasting. What distinguishes these RNNs from traditional neural networks is their architecture. It consists of multiple layers of long-term, short-term memory (LSTM). These LSTM layers allow the model to learn patterns in a time series that occur over different periods and are often difficult for human analysts to detect. We can train such models with one feature (univariate forecasting models) or multiple features (multivariate models). Multivariate Models can take more data into account, and if we provide them with relevant features, they can make better predictions. This tutorial uses Python and Keras to implement a multivariate RNN for stock price prediction. We define the architecture of our regression model and then train this model to predict the NASDAQ index.

The remainder of this tutorial proceeds in two parts: We start with a brief intro in which we compare modeling univariate and multivariate time series data. Then we turn to the hands-on part, in which we prepare the multivariate time series data and use it to train a neural network in Python. The model is a recurrent neural network with LSTM layers that forecasts the NASDAQ stock market index. Finally, we evaluate the performance of our model and make a forecast for the next day.

Disclaimer

Stock market forecasting has become an exciting application for recurrent neural networks.

Univariate vs. Multivariate Time Series Models

Multivariate models and univariate models differ in the number of their input features. While univariate models consider only a single feature, multivariate models use several input variables (features). In stock market forecasting, we can create additional features from price history. Examples are performance indicators such as moving averages, the RSI, or the Sales Volume. We can also include features from other sources, for example, social media sentiment, weather forecasts, etc. Multivariate models that have additional relevant information available have a chance to outperform univariate models. However, this is only true if the features are relevant and are indicative of future price movements.

Preparing data for training univariate models is more straightforward than for multivariate models. If you are new to time series prediction, you might want to look at my earlier articles. These explain how to develop and evaluate univariate time series models:

Univariate Prediction Models

In time series regression, the standard approach is to train a model using past values from the time series that need to be predicted. The assumption is that the value of a time series at time t is closely related to the previous time steps t-1, t-2, t-3, and so on. This approach is similar to chart analysis, which involves identifying patterns in a price chart that can indicate future movements. Both approaches rely on the ability to identify recurring patterns in the data and make accurate predictions based on them. The performance of the model or analysis depends on the ability to identify these patterns and draw the right conclusions from them.

Several techniques can be used to improve the performance of time series regression models, including feature engineering, hyperparameter optimization, and ensemble methods. In addition to these techniques, it is also important to carefully evaluate the performance of the model using appropriate metrics, such as mean squared error or mean absolute error, and to continuously monitor the model’s performance to ensure it remains accurate over time.

Univariate Time Series Prediction

Multivariate Prediction Models

Predicting the price of a financial asset is a challenging task due to the numerous variables that can influence it, including economic cycles, political events, unforeseen occurrences, psychological factors, market sentiment, and even the weather. These variables are often interdependent, which makes statistical modeling even more complex. While multivariate models can take into account several factors, they are still a simplification of reality and may not fully capture the complexity of the market. On the other hand, univariate models only consider a single dependent variable, ignoring the other dimensions.

Even with good features, predicting financial prices can be difficult because patterns and market rules may change frequently. As a result, models may make mistakes. However, as Georg Box famously said, “All models are wrong, but some are useful.” Despite their limitations, multivariate models can provide a more detailed representation of reality compared to univariate models, and can still be useful in forecasting financial prices.

Multivariate Time Series Prediction

Implementing a Multivariate Time Series Prediction Model in Python

Now that we have a solid understanding of multivariate time series forecasting, it’s time to put our knowledge into practice by building a model using Python and TensorFlow. Specifically, we will create a multivariate recurrent neural network (RNN) to predict the NASDAQ stock market index. RNNs are well-suited for time series forecasting because they can process sequential data, considering the dependencies between past and future events.

To build our RNN model, we will need to go through several essential steps.

Creating features and scaling: This involves loading and preparing the time series data for modeling, including selecting the time period and relevant features and scaling the data.
Splitting the data: We split the data into train and test sets.
Sliding window approach: The time series data is sliced into mini-batches using the sliding window approach.
Model design and training: The appropriate architecture for the RNN model is chosen, and an optimization algorithm is used to adjust the model’s weights and biases to minimize prediction error.
Model validation and predictions: We evaluate the model’s performance by comparing the predicted values to the actual values of the NASDAQ index. In addition, we will use the model to make predictions about future events.
Unscaling the predictions: The predictions are unscaled to bring them back to their original scale.

The code is available on the GitHub repository.

View on GitHub Relataly Github Repo

Six Essential Steps for Developing a Multivariate Time Series Model

Prerequisites

Before starting the coding part, make sure that you have set up your Python 3 environment and required packages. If you don’t have a Python environment, follow the steps in this tutorial to set up the Anaconda environment.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:

In addition, we will be using Keras (2.0 or higher) with Tensorflow backend, the machine learning library sci-kit-learn, and the pandas-DataReader.

You can install packages using console commands:

pip install
conda install (if you are using the anaconda packet manager)

Step #1 Load the Time Series Data

Let’s start by loading price data on the NASDAQ composite index (symbol: ^IXIC) from yahoo.finance.com into our Python project. To download the data, we use Pandas DataReader – a popular Python library that provides functions to extract data from various sources on the web. Alternatively, you can also use the “yfinance” library.

We provide the technical symbol for the NASDAQ index, “^IXIC.” Alternatively, you could use other asset symbols, for example, BTC-USD, to get price quotes for Bitcoin. In addition, we limit the data in the API request to the timeframe between 2010-01-01 and the current date.

Running the code below will load the data into a new DataFrame object. Be aware that input data and predictions will vary depending on when you execute the code.

# Time Series Forecasting - Multivariate Time Series Models for Stock Market Prediction

import math # Mathematical functions 
import numpy as np # Fundamental package for scientific computing with Python
import pandas as pd # Additional functions for analysing and manipulating data
from datetime import date, timedelta, datetime # Date Functions
from pandas.plotting import register_matplotlib_converters # This function adds plotting functions for calender dates
import matplotlib.pyplot as plt # Important package for visualization - we use this to plot the market data
import matplotlib.dates as mdates # Formatting dates
import tensorflow as tf
from sklearn.metrics import mean_absolute_error, mean_squared_error # Packages for measuring model performance / errors
from tensorflow.keras import Sequential # Deep learning library, used for neural networks
from tensorflow.keras.layers import LSTM, Dense, Dropout # Deep learning classes for recurrent and regular densely-connected layers
from tensorflow.keras.callbacks import EarlyStopping # EarlyStopping during model training
from sklearn.preprocessing import RobustScaler, MinMaxScaler # This Scaler removes the median and scales the data according to the quantile range to normalize the price data 
import seaborn as sns # Visualization
sns.set_style('white', { 'axes.spines.right': False, 'axes.spines.top': False})

# check the tensorflow version and the number of available GPUs
print('Tensorflow Version: ' + tf.__version__)
physical_devices = tf.config.list_physical_devices('GPU')
print("Num GPUs:", len(physical_devices))

# Setting the timeframe for the data extraction
end_date =  date.today().strftime("%Y-%m-%d")
start_date = '2010-01-01'

# Getting NASDAQ quotes
stockname = 'NASDAQ'
symbol = '^IXIC'

# You can either use webreader or yfinance to load the data from yahoo finance
# import pandas_datareader as webreader
# df = webreader.DataReader(symbol, start=start_date, end=end_date, data_source="yahoo")

import yfinance as yf #Alternative package if webreader does not work: pip install yfinance
df = yf.download(symbol, start=start_date, end=end_date)

# Create a quick overview of the dataset
df.head()

Tensorflow Version: 2.5.0
Num GPUs: 1
[*********************100%***********************]  1 of 1 completed
			Open		High		Low			Close		Adj Close	Volume
Date						
2009-12-31	2292.919922	2293.590088	2269.110107	2269.149902	2269.149902	1237820000
2010-01-04	2294.409912	2311.149902	2294.409912	2308.419922	2308.419922	1931380000
2010-01-05	2307.270020	2313.729980	2295.620117	2308.709961	2308.709961	2367860000
2010-01-06	2307.709961	2314.070068	2295.679932	2301.090088	2301.090088	2253340000
2010-01-07	2298.090088	2301.300049	2285.219971	2300.050049	2300.050049	2270050000

The data looks as expected and has the following columns:

High – the daily high
Low – the daily low
Open – the opening price
Close – the closing price
Volume – the daily trading volume
Adj Close – the adjacent closing price

Step #2 Explore the Data

Let’s first familiarize ourselves with the data before processing them further. Line plots are an excellent choice to gain a quick overview of time series data. By running the code below, we loop over the columns to plot a line chart for each column of the dataframe.

# Plot line charts
df_plot = df.copy()

ncols = 2
nrows = int(round(df_plot.shape[1] / ncols, 0))

fig, ax = plt.subplots(nrows=nrows, ncols=ncols, sharex=True, figsize=(14, 7))
for i, ax in enumerate(fig.axes):
        sns.lineplot(data = df_plot.iloc[:, i], ax=ax)
        ax.tick_params(axis="x", rotation=30, labelsize=10, length=0)
        ax.xaxis.set_major_locator(mdates.AutoDateLocator())
fig.tight_layout()
plt.show()

The line plots look as expected. We continue with preprocessing and feature engineering.

Step #3 Feature Selection and Scaling

Before we can train the neural network, we need to transform the data into a processable shape. In this section, we perform the following tasks:

Selecting features
Scaling the data to a standard value range

3.1 Selecting Features

First, we will select the features upon which we want to train our neural network. The selection and engineering of relevant feature variables is a complex topic. We could also create additional features such as moving averages, but I want to keep things simple. Therefore, we select features that are already present in our data. To learn more about feature engineering for stock market prediction, check out the relataly feature engineering tutorial.

Feature Selection of Multivariate Time Series Models

Running the code below selects the features. We add a dummy column to our record called “Predictions,” which will help us later when we need to reverse the scaling of our data.

# Indexing Batches
train_df = df.sort_values(by=['Date']).copy()

# List of considered Features
FEATURES = ['High', 'Low', 'Open', 'Close', 'Volume'
            #, 'Month', 'Year', 'Adj Close'
           ]

print('FEATURE LIST')
print([f for f in FEATURES])

# Create the dataset with features and filter the data to the list of FEATURES
data = pd.DataFrame(train_df)
data_filtered = data[FEATURES]

# We add a prediction column and set dummy values to prepare the data for scaling
data_filtered_ext = data_filtered.copy()
data_filtered_ext['Prediction'] = data_filtered_ext['Close']

# Print the tail of the dataframe
data_filtered_ext.tail()

FEATURE LIST
['High', 'Low', 'Open', 'Close', 'Volume']
			High			Low				Open			Close			Volume		Prediction
Date						
2022-05-09	11990.610352	11574.940430	11923.030273	11623.250000	5911380000	11623.250000
2022-05-10	11944.940430	11566.280273	11900.339844	11737.669922	6199090000	11737.669922
2022-05-11	11844.509766	11339.179688	11645.570312	11364.240234	6120860000	11364.240234
2022-05-12	11547.330078	11108.759766	11199.250000	11370.959961	6647400000	11370.959961
2022-05-13	11856.709961	11510.259766	11555.969727	11805.000000	5868610000	11805.000000

3.2 Scaling the Multivariate Input Data

Another necessary step in data preparation for neural networks is scaling the input data. Scaling will increase training times and improve model accuracy. The scikit-learn package offers different scaling approaches. We use the MinMaxScaler to scale the input data to a range between 0 and 1.

A model that is trained on scaled data will also produce scaled predictions. Therefore, when we make predictions later with our model, we must not forget to scale the predictions back. The scaler_model will adapt to the shape of the data (6-dimensional). However, our predictions will be one-dimensional. Because the scaler has a fixed input shape, we cannot simply reuse it for unscaling our model predictions. To unscale the predictions later, we create an additional scaler that works on a single feature column (scaler_pred).

# Get the number of rows in the data
nrows = data_filtered.shape[0]

# Convert the data to numpy values
np_data_unscaled = np.array(data_filtered)
np_data = np.reshape(np_data_unscaled, (nrows, -1))
print(np_data.shape)

# Transform the data by scaling each feature to a range between 0 and 1
scaler = MinMaxScaler()
np_data_scaled = scaler.fit_transform(np_data_unscaled)

# Creating a separate scaler that works on a single column for scaling predictions
scaler_pred = MinMaxScaler()
df_Close = pd.DataFrame(data_filtered_ext['Close'])
np_Close_scaled = scaler_pred.fit_transform(df_Close)

Out: (2619, 6)

Step #4 Transforming the Multivariate Data

Next, we train our multivariate regression model based on a three-dimensional data structure. The first dimension is the sequences, the second dimension is the time steps (mini-batches), and the third dimension is the features. The illustration below shows the steps to bring the multivariate data into a shape our neural model can process during training. We must keep this form and perform the same steps when using the model to create a forecast.

An essential step in the preparation process is slicing the data into multiple input data sequences with associated target values. We write a simple Python script that uses a “sliding window.” This approach moves a window through the time series data, adding a sequence of multiple data points to the input data with each step. The target value (e.g., Closing Price) follows this sequence, and we store it in a separate target dataset. Then we push the window one step further and repeat these activities. This process results in a data set with many input sequences (mini-batches), each with a corresponding target value in the target record. This process applies both to the training and the test data.

Sliding Window

We will apply the sliding window approach to our data. The result is a training set (x_train) containing 2258 input sequences, each with 50 steps and six features. The related target dataset (y_train) has 2258 target values.

# Set the sequence length - this is the timeframe used to make a single prediction
sequence_length = 50

# Prediction Index
index_Close = data.columns.get_loc("Close")

# Split the training data into train and train data sets
# As a first step, we get the number of rows to train the model on 80% of the data 
train_data_len = math.ceil(np_data_scaled.shape[0] * 0.8)

# Create the training and test data
train_data = np_data_scaled[0:train_data_len, :]
test_data = np_data_scaled[train_data_len - sequence_length:, :]

# The RNN needs data with the format of [samples, time steps, features]
# Here, we create N samples, sequence_length time steps per sample, and 6 features
def partition_dataset(sequence_length, data):
    x, y = [], []
    data_len = data.shape[0]
    for i in range(sequence_length, data_len):
        x.append(data[i-sequence_length:i,:]) #contains sequence_length values 0-sequence_length * columsn
        y.append(data[i, index_Close]) #contains the prediction values for validation,  for single-step prediction
    
    # Convert the x and y to numpy arrays
    x = np.array(x)
    y = np.array(y)
    return x, y

# Generate training data and test data
x_train, y_train = partition_dataset(sequence_length, train_data)
x_test, y_test = partition_dataset(sequence_length, test_data)

# Print the shapes: the result is: (rows, training_sequence, features) (prediction value, )
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# Validate that the prediction value and the input match up
# The last close price of the second input sample should equal the first prediction value
print(x_train[1][sequence_length-1][index_Close])
print(y_train[0])

(2474, 50, 5) (2474,)
(630, 50, 5) (630,)
0.02049456793614579
0.02049456793614579

Step #5 Train the Multivariate Prediction Model

Once we have the data prepared and ready, we can train our model. The architecture of our neural network consists of the following four layers:

An LSTM layer, which takes our mini-batches as input and returns the whole sequence
Another LSTM layer that takes the sequence from the previous layer but only returns five values
Dense layer with five neurons
A final dense layer that outputs the predicted value

The number of neurons in the first layer must equal the size of a minibatch of the input data. Each minibatch in our dataset consists of a matrix with 50 steps and six features. Thus, the input layer of our recurrent neural network consists of 300 neurons. Keeping this architecture in mind is essential because, later, we need to bring the data into the same shape when we want to predict a new dataset. Running the code below creates the model architecture and compiles the model.

# Configure the neural network model
model = Sequential()

# Model with n_neurons = inputshape Timestamps, each with x_train.shape[2] variables
n_neurons = x_train.shape[1] * x_train.shape[2]
print(n_neurons, x_train.shape[1], x_train.shape[2])
model.add(LSTM(n_neurons, return_sequences=True, input_shape=(x_train.shape[1], x_train.shape[2]))) 
model.add(LSTM(n_neurons, return_sequences=False))
model.add(Dense(5))
model.add(Dense(1))

# Compile the model
model.compile(optimizer='adam', loss='mse')

Running the code below starts the training process.

# Training the model
epochs = 50
batch_size = 16
early_stop = EarlyStopping(monitor='loss', patience=5, verbose=1)
history = model.fit(x_train, y_train, 
                    batch_size=batch_size, 
                    epochs=epochs,
                    validation_data=(x_test, y_test)
                   )
                    
                    #callbacks=[early_stop])

Output exceeds the size limit. Open the full output data in a text editor
Epoch 1/50
155/155 [==============================] - 6s 19ms/step - loss: 6.7374e-04 - val_loss: 0.0011
Epoch 2/50
155/155 [==============================] - 2s 13ms/step - loss: 6.6207e-05 - val_loss: 8.4700e-04
Epoch 3/50
155/155 [==============================] - 2s 14ms/step - loss: 5.0667e-05 - val_loss: 6.6467e-04
Epoch 4/50
155/155 [==============================] - 2s 14ms/step - loss: 5.8446e-05 - val_loss: 6.8575e-04
Epoch 5/50
155/155 [==============================] - 2s 14ms/step - loss: 4.8430e-05 - val_loss: 8.4892e-04
Epoch 6/50
155/155 [==============================] - 2s 14ms/step - loss: 7.1283e-05 - val_loss: 8.2255e-04
Epoch 7/50
155/155 [==============================] - 2s 15ms/step - loss: 6.0554e-05 - val_loss: 8.0583e-04
Epoch 8/50
155/155 [==============================] - 2s 15ms/step - loss: 5.5977e-05 - val_loss: 4.5830e-04
Epoch 9/50
155/155 [==============================] - 2s 15ms/step - loss: 4.2453e-05 - val_loss: 6.1866e-04
Epoch 10/50
155/155 [==============================] - 2s 14ms/step - loss: 3.5722e-05 - val_loss: 4.5288e-04
Epoch 11/50
155/155 [==============================] - 2s 14ms/step - loss: 4.1409e-05 - val_loss: 8.5975e-04
Epoch 12/50
155/155 [==============================] - 3s 18ms/step - loss: 7.0007e-05 - val_loss: 5.0300e-04
Epoch 13/50
...
Epoch 49/50
155/155 [==============================] - 2s 13ms/step - loss: 2.7064e-05 - val_loss: 2.8202e-04
Epoch 50/50
155/155 [==============================] - 2s 13ms/step - loss: 2.9009e-05 - val_loss: 2.5486e-04

Let’s take a quick look at the loss curve.

# Plot training & validation loss values
fig, ax = plt.subplots(figsize=(16, 5), sharex=True)
sns.lineplot(data=history.history["loss"])
plt.title("Model loss")
plt.ylabel("Loss")
plt.xlabel("Epoch")
ax.xaxis.set_major_locator(plt.MaxNLocator(epochs))
plt.legend(["Train", "Test"], loc="upper left")
plt.grid()
plt.show()

The loss drops quickly to a lower plateau, which signals that the model has improved throughout the training process.

Step #6 Evaluate Model Performance

Once we have trained the neural network regression model, we want to measure its performance. As mentioned in section 3, we first have to reverse the scaling of the predictions. Afterward, we calculate different error metrics, MAE, MAPE, and MDAPE. Then we will compare the predictions in a line plot with the actual values. For more information on measuring the performance of regression models, see this relataly article.

# Get the predicted values
y_pred_scaled = model.predict(x_test)

# Unscale the predicted values
y_pred = scaler_pred.inverse_transform(y_pred_scaled)
y_test_unscaled = scaler_pred.inverse_transform(y_test.reshape(-1, 1))

# Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_unscaled, y_pred)
print(f'Median Absolute Error (MAE): {np.round(MAE, 2)}')

# Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled))) * 100
print(f'Mean Absolute Percentage Error (MAPE): {np.round(MAPE, 2)} %')

# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled)) ) * 100
print(f'Median Absolute Percentage Error (MDAPE): {np.round(MDAPE, 2)} %')

Median Absolute Error (MAE): 175.28
Mean Absolute Percentage Error (MAPE): 1.48 %
Median Absolute Percentage Error (MDAPE): 1.16 %

The MAPE is 22.15, which means that the mean of our predictions deviates from the actual values by 3.12%. The MDAPE is 2.88 % and a bit lower than the mean, thus indicating there are some outliers among the prediction errors. 50% of the predictions deviate by more than 2.88%, and 50% differ by less than 2.88% from the actual values.

Next, we create a line plot showing the forecast and compare it to the actual values. Adding a bar plot to the chart helps highlight the deviations of the predictions from the actual values. Running the code below creates the line plot.

# The date from which on the date is displayed
display_start_date = "2019-01-01" 

# Add the difference between the valid and predicted prices
train = pd.DataFrame(data_filtered_ext['Close'][:train_data_len + 1]).rename(columns={'Close': 'y_train'})
valid = pd.DataFrame(data_filtered_ext['Close'][train_data_len:]).rename(columns={'Close': 'y_test'})
valid.insert(1, "y_pred", y_pred, True)
valid.insert(1, "residuals", valid["y_pred"] - valid["y_test"], True)
df_union = pd.concat([train, valid])

# Zoom in to a closer timeframe
df_union_zoom = df_union[df_union.index > display_start_date]

# Create the lineplot
fig, ax1 = plt.subplots(figsize=(16, 8))
plt.title("y_pred vs y_test")
plt.ylabel(stockname, fontsize=18)
sns.set_palette(["#090364", "#1960EF", "#EF5919"])
sns.lineplot(data=df_union_zoom[['y_pred', 'y_train', 'y_test']], linewidth=1.0, dashes=False, ax=ax1)

# Create the bar plot with the differences
df_sub = ["#2BC97A" if x > 0 else "#C92B2B" for x in df_union_zoom["residuals"].dropna()]
ax1.bar(height=df_union_zoom['residuals'].dropna(), x=df_union_zoom['residuals'].dropna().index, width=3, label='residuals', color=df_sub)
plt.legend()
plt.show()

The line plot shows that the forecast is close to the actual values but partially deviates from it. The deviations between actual values and predictions are called residuals. For our mode, they seem to be most significant during periods of increased market volatility and least during periods of steady market movement, which makes sense because sudden movements are generally more difficult to predict.

Step #7 Predict the Next Day’s Price

After training the neural network, we want to forecast the stock market for the next day. For this purpose, we extract a new dataset from the Yahoo-Finance API and preprocess it as we did for model training.

We trained our model with mini-batches of 50-steps and six features. Thus, we must also provide the model with 50-steps when making the forecast. As before, we transform the data into the shape of 1 x 50 x 6, whereby the last figure is the number of feature columns. After generating the forecast, we unscale the stock market predictions back to the original range of values.

df_temp = df[-sequence_length:]
new_df = df_temp.filter(FEATURES)

N = sequence_length

# Get the last N day closing price values and scale the data to be values between 0 and 1
last_N_days = new_df[-sequence_length:].values
last_N_days_scaled = scaler.transform(last_N_days)

# Create an empty list and Append past N days
X_test_new = []
X_test_new.append(last_N_days_scaled)

# Convert the X_test data set to a numpy array and reshape the data
pred_price_scaled = model.predict(np.array(X_test_new))
pred_price_unscaled = scaler_pred.inverse_transform(pred_price_scaled.reshape(-1, 1))

# Print last price and predicted price for the next day
price_today = np.round(new_df['Close'][-1], 2)
predicted_price = np.round(pred_price_unscaled.ravel()[0], 2)
change_percent = np.round(100 - (price_today * 100)/predicted_price, 2)

plus = '+'; minus = ''
print(f'The close price for {stockname} at {end_date} was {price_today}')
print(f'The predicted close price is {predicted_price} ({plus if change_percent > 0 else minus}{change_percent}%)')

The close price for NASDAQ on 2021-06-27 was 14360.39. The predicted closing price is 14232.8095703125 (-0.9%)

Summary

This tutorial has shown multivariate time series modeling for stock market prediction in Python. We trained a neural network regression model for predicting the NASDAQ index. Before training our model, we performed several steps to prepare the data. The steps included splitting the data and scaling them. In addition, we created and tested various new features from the original time series data to account for the multivariate modeling approach. You now have the knowledge and code to conduct further experiments with the features of your choice.

Multivariate time series forecasting is a complex topic. You might want to take the time to retrace the different steps. Especially the transformation of the data can be challenging. The best way to learn is to practice. Therefore I encourage you to develop more time series models and experiment with other data sources.

Another interesting approach to stock market prediction uses candlestick images and convolutional neural networks. If this topic interests you, check out the following article: Deep reinforcement learning stock market trading, utilizing a CNN with candlestick images

I am always trying to learn and improve. If you want to give feedback or have remarks, feel free to share them in the comments.

Stockmarket forecasting with a neural network is about identifying meaningful patterns. Be aware that there is no guarantee that these patterns are present in the data. Image created with Midjourney.

Sources and Further Reading

The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.

The post Stock Market Prediction using Multivariate Time Series and Recurrent Neural Networks in Python appeared first on relataly.com.

Measuring Regression Errors with Python

Florian Follonier — Mon, 04 May 2020 09:34:09 +0000

Evaluating performance is a crucial step in developing regression models. Because regression models return continuous outputs, such models allow for different gradations of right or wrong. Therefore, we measure the deviation between predictions and actual values in numerical terms. However, a universal metric to measure the performance of regression models does not exist. Instead, there are several metrics, each with its advantages and disadvantages. None of these metrics is sufficient alone, and it is often necessary to use them in combination. This article presents six regression error metrics and explains how to implement them in Python with Scikit-learn.

The rest of this article proceeds in two parts. The first part is conceptual and introduces six error metrics for measuring regression performance. We look at formulas and discuss their pros and cons. The discussion is summarized in a cheat sheet. The second part is a hands-on Python tutorial in which we generate synthetic time series data and use them for training a prediction model. Then we implement the six regression error metrics and evaluate the performance of our model.

Note that this article deals with regression errors. If you are looking for an overview of classification error metrics, check out this tutorial on classification error metrics.

goal arrow shot archery machine learning error metrics

" data-image-caption="

goal arrow shot archery machine learning error metrics

" data-large-file="https://www.relataly.com/wp-content/uploads/2023/02/goal-arrow-shot-archery-machine-learning-error-metrics.png" src="https://www.relataly.com/wp-content/uploads/2023/02/goal-arrow-shot-archery-machine-learning-error-metrics.png" alt="goal arrow shot archery machine learning error metrics" class="wp-image-12494" srcset="https://www.relataly.com/wp-content/uploads/2023/02/goal-arrow-shot-archery-machine-learning-error-metrics.png 504w, https://www.relataly.com/wp-content/uploads/2023/02/goal-arrow-shot-archery-machine-learning-error-metrics.png 230w" sizes="(max-width: 504px) 100vw, 504px" />

goal arrow shot archery machine learning error metrics

Measuring Regression Errors

In general, we measure the performance of regression models by calculating the deviations between the predictions (y_pred) and the actual values (y_test). If the prediction value is below the actual value, the prediction error is positive. If the prediction lies above the real value, the prediction error is negative. However, in a sample of prediction values, the errors can vary greatly depending on the data point. Therefore, it is not enough to look at individual error values. Error metrics can inform us about the statistical distribution of errors in a prediction sample and, in this way, help us to measure the performance of regression models objectively.

Various metrics exist to measure regression errors. Each error metric can only cover a part of the overall picture. For instance, imagine you have developed a model to predict the consumption of a power plant. The model predictions are generally accurate, but the projections are wrong in a few cases. In other words, outliers among the prediction errors make it difficult to conclude the model performance. It is insufficient to calculate the average prediction error to understand this situation. Instead, a more robust measuring approach would combine different error metrics to conclude the probability that prediction errors lie within a specific range.

Predictions vs. Actual Values in Time Series Forecasting

Six Error Metrics for Measuring Regression Errors

The following six metrics help measure prediction errors. We can apply them to various regression problems, including time series forecasting.

Mean Absolute Error (MAE)
Mean Absolute Percentage Error (MAPE)
Median Absolute Error (MedAE)
Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
Median Absolute Percent Error (MdAPE)

Mean Absolute Error (MAE)

Mean Absolute Error (MAE) is a metric commonly used to measure the arithmetic average of deviations between predictions and actual values.

An MAE of “5” tells us that, on average, our predictions deviate from the actual values by 5. Whether this error is considered small or large will depend on the application case and the scale of the predictions. For instance, 5 nanometers in the case of a building might be small, but if it’s five nanometers in the case of a biological membrane, it might be significant. So when working with the MAE, mind the scale.

It is scale-dependent
The MAE considers the absolute values to take both positive and negative deviations from the actual.
The MAE is sensitive to outliers, as large values can substantially impact. For this reason, we should use the MAE in combination with additional metrics.
The MAE shares the same unit with the predictions.

\[MAE=\ \frac{\sum_{i=1}^{n}{|y_i-x_i|}}{n} \]

\[x_i\ = actual value \\ y_i = predictions \\ n = sample size \]

Mean Absolute Percentage Error (MAPE)

The mean absolute percentage error calculates the mean percentage deviation between predictions and actual values.

The mean absolute percentage error is scale-independent, making it easier to interpret.
We must not use the MAPE whenever a single value is zero
The MAPE puts a heavier penalty on negative errors

\[\mathrm{MAPE=\ }\frac{\mathrm{1}}{\mathrm{n}}\sum_{\mathrm{t=1}}^{\mathrm{n}}\left|\frac{\mathrm{A}_\mathrm{t}\mathrm{\ -\ } \mathrm{F}_\mathrm{t}}{\mathrm{A}_\mathrm{t}}\right|\mathrm{\ \ast100} \]

Mean Squared Error (MSE)

We can calculate the MSE by measuring the average squares of the differences between the estimated and actual values.

Since all values are squared, the MSE is very sensitive to outliers.
An MSE much larger than the MAE indicates strong outliers among the prediction errors. The formula of the MSE is:

\[MSE=\frac{\sum_{i=1}^{n}{|y_i-x_i|}^2}{n} \]

Median Absolute Error (MedAE)

The Median Absolute Error (MedAE) calculates the median deviation between predictions and actual values.

The MedAE has the same unit as the predictions.
A MedAE of value 10 means that 50% of the errors are greater than 10 and 50% are below this value.
The MedAE is resistant to outliers. Therefore, we often use it in combination with the MAE. A substantial deviation between MAE and MedAE is an indication that there are outliers among the errors. In other words, the prediction model deviates more from the actual value than on average.

\[MedAE=\frac{\sum_{i=1}^{n}{|y_i-{\hat{x}}_i|}}{n}\]

Root Mean Squared Error (RMSE)

The root-mean-squared error is another standard way to measure the performance of a forecasting model.

Has the same unit as the predictions
A good measure of how accurately the model predicts the response
Robust to outliers

\[RMSE=\frac{1}{n}\sqrt{\sum_{i=1}^{n}{|y_i-x_i|}^2}\]

Median Absolute Percentage Error (MdAPE)

The median absolute percentage error (MdAPE) measures the accuracy of a prediction model. It is similar to the median absolute percentage error but, as the name implies, calculates the median error for a set of forecasts. As a result, the MdAPE is more resilient to outliers than the MAPE. However, it is also less intuitive. A MdAPE of 5% means that half of the absolute percentage errors are less than 5%, and half are over 5%.

Scale-dependent
Not to be used whenever a single value is zero.
More robust to distortion from outliers than the MAPE

\[MdAPE= \ median(\left|\frac{A_t\ -\ F_t}{A_t}\right|)\ast100\]

Regression Error Cheat Sheet

The cheat sheet provides an overview of the six regression error metrics. It contains the mathematical formula for each regression metric, a short code sequence to implement in Python, and some hints for their interpretation.

Cheat-Sheet.pdf Download

Implementing Regression Error Metrics in Python: Time Series Prediction Example

Now that we have familiarized ourselves with standard regression error metrics, it’s time to see them in action. In the following, we will develop and test a regression model in Python. We begin by generating some synthetic time series data. Subsequently, we use the data to train a simple regression model based on a Keras neural network. The model will try to continue the time series and predicts a continious value for the next time step. We will use this model to predict a test dataset and measure the prediction performance using the error metrics.

The code of this Python example is available on the GitHub repository.

View on GitHub Relataly Github Repo

Prerequisites

Before starting the coding part, make sure that you have set up your Python 3 environment and required packages. If you don’t have an environment yet, you can follow the steps in this tutorial to set up the Anaconda environment.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:

In addition, we will be using Keras (2.0 or higher) with Tensorflow backend and the machine learning library scikit-learn.

You can install packages using console commands:

pip install
conda install (if you are using the anaconda packet manager)

Step #1 Generate Synthetic Time Series Data

We begin by generating synthetic time series data. The script below creates the time series by multiplying different sine curves.

# A tutorial for this file is available at www.relataly.com

import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl 
from tensorflow.keras.models import Sequential
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.layers import LSTM, Dense
import seaborn as sns
sns.set_style('white', { 'axes.spines.right': False, 'axes.spines.top': False})

# Creating the sample sinus curve dataset
steps = 1000; gradient = 0.002
list_a = []
for i in range(0, steps, 1):
    y = 100 * round(math.sin(math.pi * i * 0.02 + 0.01), 4) * round(math.sin(math.pi * i * 0.005 + 0.01), 4) * round(math.sin(math.pi * i * 0.005 + 0.01), 4)
    list_a.append(y)
df = pd.DataFrame({"valid": list_a}, columns=["valid"])

# Visualizing the data
fig, ax1 = plt.subplots(figsize=(16, 4))
sns.lineplot(data=df)
ax1.xaxis.set_major_locator(plt.MaxNLocator(30))
plt.title("sine curve data")

Step #2 Data Preparation

Now that we have the synthetic data available, we can prepare it as inputs for training our regression model. Running the following code will scale and split the data and bring it into a shape that we can use as input batches to a neural network.

# Feature Selection - Only Close Data
train_df = df.copy()
data_unscaled = df.values

# Transform features by scaling each feature to a range between 0 and 1
mmscaler = MinMaxScaler(feature_range=(0, 1))
np_data = mmscaler.fit_transform(data_unscaled)

# Set the sequence length - this is the timeframe used to make a single prediction
sequence_length = 15

# Prediction Index
index_Close = 0

# Split the training data into train and train data sets
# As a first step, we get the number of rows to train the model on 80% of the data 
train_data_length = math.ceil(np_data.shape[0] * 0.8)

# Create the training and test data
train_data = np_data[0:train_data_length, :]
test_data = np_data[train_data_length - sequence_length:, :]

# The RNN needs data with the format of [samples, time steps, features]
# Here, we create N samples, sequence_length time steps per sample, and 6 features
def partition_dataset(sequence_length, train_df):
    x, y = [], []
    data_len = train_df.shape[0]
    for i in range(sequence_length, data_len):
        x.append(train_df[i-sequence_length:i,:]) #contains sequence_length values 0-sequence_length * columsn
        y.append(train_df[i, index_Close]) #contains the prediction values for validation (3rd column = Close),  for single-step prediction
    
    # Convert the x and y to numpy arrays
    x = np.array(x)
    y = np.array(y)
    return x, y

# Generate training data and test data
x_train, y_train = partition_dataset(sequence_length, train_data)
x_test, y_test = partition_dataset(sequence_length, test_data)

# Print the shapes: the result is: (rows, training_sequence, features) (prediction value, )
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# Validate that the prediction value and the input match up
# The last close price of the second input sample should equal the first prediction value
print(x_test[1][sequence_length-1][index_Close])
print(y_test[0])

Out: x_tain.shape: (584, 15, 1) -- y_tain.shape: (584,)

Step #3 Training a Time Series Regression Model

Once we have prepared the data, we can train the regression model. Our model uses a simple neural network architecture. Running the code below defines the model architecture and compiles the model.

# Settings
batch_size = 5
epochs = 4
n_features = 1

# Configure and compile the neural network model
# The number of input neurons is defined by the sequence length multiplied by the number of features
lstm_neuron_number = sequence_length * n_features

# Create the model
model = Sequential()
model.add(LSTM(lstm_neuron_number, return_sequences=False, input_shape=(x_train.shape[1], 1))
)
model.add(Dense(1))
model.compile(optimizer="adam", loss="mean_squared_error")

# Train the model
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs)

Epoch 1/4 
584/584 [==============================] - 1s 2ms/step - loss: 0.1047 Epoch 2/4 
584/584 [==============================] - 1s 1ms/step - loss: 0.0153 Epoch 3/4 
584/584 [==============================] - 1s 1ms/step - loss: 0.0102 Epoch 4/4 
584/584 [==============================] - 1s 1ms/step - loss: 0.0064

Now that the model architecture is defined, you can run the code below to initiate the training process.

# Settings
batch_size = 5

# Train the model
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs)

Step #4 Making Test Predictions

Let’s see how good or bad our model performs. We will make predictions on our test dataset by running the code below. We store the results in a new DataFrame called “predictions.”

# Get the predicted values
y_pred_scaled = model.predict(x_test)
y_pred = mmscaler.inverse_transform(y_pred_scaled)
y_test_unscaled = mmscaler.inverse_transform(y_test.reshape(-1, 1))

Next, we want to get an idea of how our model performs. We, therefore, create a line plot that shows the predictions and the actual values of the time series. We colorize the differences between predictions and actual values to highlight the prediction errors.

# Create the line plot
test_df = pd.DataFrame({'y_test': y_test_unscaled.flatten(), 'y_pred': y_pred.flatten()})
fig, ax1 = plt.subplots(figsize=(16, 8), sharex=True)
sns.lineplot(data=test_df)
ax1.tick_params(axis="x", rotation=0, labelsize=10, length=0)
plt.title("y_pred vs y_test Truth")
plt.legend(["y_pred", "y_test"], loc="upper left")

# Fill between plotlines
mpl.rc('hatch', color='k', linewidth=2)
ax1.fill_between(test_df.index, test_df["y_test"], test_df["y_pred"],  facecolor = 'white', alpha=.9) 
plt.show()

The plot shows that the prediction errors vary and are sometimes positive and sometimes negative.

Step #5 Calculating the Regression Error Metrics: Implementation and Evaluation

Now that we have predicted the test set, we calculate the six regression error metrics. In most cases, you won’t have to use all six regression error metrics to understand how well a model performs. In most cases, it is sufficient to use a combination of two or three of them.

# Mean Absolute Error (MAE)
MAE = np.mean(abs(y_pred - y_test_unscaled))
print('Mean Absolute Error (MAE): ' + str(np.round(MAE, 2)))

# Median Absolute Error (MedAE)
MEDAE = np.median(abs(y_pred - y_test_unscaled))
print('Median Absolute Error (MedAE): ' + str(np.round(MEDAE, 2)))

# Mean Squared Error (MSE)
MSE = np.square(np.subtract(y_pred, y_test_unscaled)).mean()
print('Mean Squared Error (MSE): ' + str(np.round(MSE, 2)))

# Root Mean Squarred Error (RMSE) 
RMSE = np.sqrt(np.mean(np.square(y_pred - y_test_unscaled)))
print('Root Mean Squared Error (RMSE): ' + str(np.round(RMSE, 2)))

# Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled))) * 100
print('Mean Absolute Percentage Error (MAPE): ' + str(np.round(MAPE, 2)) + ' %')

# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled))) * 100
print('Median Absolute Percentage Error (MDAPE): ' + str(np.round(MDAPE, 2)) + ' %')

Mean Absolute Error (MAE): 6.95 
Median Absolute Error (MedAE): 5.05  
Mean Squared Error (MSE): 78.7 
Root Mean Squared Error (RMSE): 8.87  
Mean Absolute Percentage Error (MAPE): 10339.13 % 
Median Absolute Percentage Error (MDAPE): 26.8 %

Step #6 Interpreting the Regression Error Metrics

Let’s look at the regression error metrics, starting with the MAE and the MedAE. The MAE is 6.95, and the MedAE is 5.05. These values are close, indicating that our prediction errors are equally distributed but might include some outliers.

To better understand possible outliers, we look at the MSE. With a value of 78.7, the MAE is a little bit higher than the square of the MAE. The RMSE is slightly higher than the MAE, which is another indication that the prediction errors lie in a narrow range.

How much deviate the predictions of our model from the actual values in percentage terms? The MAPE is typically used as a starting point to answer this question. With 10339.13 percent, it is exceptionally high. So is our model very much mistaken? The answer is no – the MAPE is misleading. The problem is that several actual values are close to zero, e.g., 0.00001. While the predictions of our model are close to the actual values in absolute numbers, the MAPE divides the residual values by the actual values, e.g., 0.000001, and sums them up. Thus the MAPE becomes very large.

The Median of the MDAPE is 26.8%. So, 50% of our forecasting errors are higher than 26.8%, and 50% are lower. Consequently, we can assume that when our model makes a prediction, the probability that the deviation is 26.8% from the actual value is 50% – that is not as terrible as the MAPE would indicate. The plotlines of the predictions and actual values support these findings.

Summary

This article has presented six error metrics for evaluating regression errors. Remember that none of these metrics alone is sufficient to evaluate a model’s performance. Instead, we should use a combination of multiple metrics. We have discussed the advantages and disadvantages of the metrics. In the second part of this tutorial, we implemented a time series regression example. After training an exemplary regression model, we used the six regression metrics to evaluate the model performance.

It’s important to remember that different error metrics are suitable for different types of regression problems. For example, mean absolute error (MAE) is a good choice when you want to know how close the predictions are to the true values, but it is not very sensitive to large errors. Root mean squared error (RMSE) is a more sensitive metric that punishes large errors more heavily, but it can be difficult to interpret because it is in the same units as the original data.

I hope this article was helpful. If you have any remarks or questions remaining, write them in the comments.

Sources and Further Reading

The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.

The post Measuring Regression Errors with Python appeared first on relataly.com.

Rolling Time Series Forecasting: Creating a Multi-Step Prediction for a Rising Sine Curve using Neural Networks in Python

Florian Follonier — Sun, 19 Apr 2020 09:59:49 +0000

Many time forecasting problems can be solved by predicting just one step into the future. However, some problems require a forecast for an extended period of time, which calls for a multi-step time series forecasting approach. This approach involves modeling the distribution of future values of a signal over a prediction horizon. In this article, we use the rising sine curve as an example to demonstrate how to apply a multi-step prediction approach using Keras neural networks with LSTM layers in Python. We create a rolling forecast for the sine curve by generating several single-step predictions and iteratively using them as input to predict further steps in the future.

The remainder of this article proceeds as follows: We begin by looking at the sine curve problem and provide a quick intro to recurrent neural networks. After this conceptual introduction, we turn to the hands-on part in Python. We generate synthetic sine curve data and preprocess the data to train univariate models using a Keras neural network. We thereby experiment with different architectures and hyperparameters. Then, we use these model variants to create rolling multi-step forecasts. Finally, we evaluate the model performance.

A multi-step time series forecast for a rising sine curve, as we will create in this article.

If you are just getting started with time-series forecasting, we have covered the single-step forecasting approach in two previous articles:

Predicting Stock Markets with Neural Networks – A Step-by-Step Guide
Stock Market Prediction – Adjusting Time Series Prediction Intervals in Python

The Problem of a Rising Sine Curve

The line plot below illustrates the sample of a rising sine curve. The goal is to use the ground truth values (blue) to forecast several points in this curve (purple).

Traditional mathematical methods can resolve this function by decomposing it into constituent parts. Thus, we could easily foresee the further course of the curve. But in practice, the function might not be precisely periodic and change over a more extended period. Therefore, recurrent neural networks can achieve better results than traditional mathematical approaches, especially when they train on extensive data.

Also: Stock Market Prediction using Univariate Recurrent Neural Networks (RNN) with Python

Application Domains with Similar Time Series Forecasting Problems

A rising sine wave may sound like an abstract problem at first. However, similar issues are widespread. Imagine you are the owner of an online shop. The users who visit your shop and buy something fluctuate depending on the time and the weekday. For instance, there are fewer visitors at night, and on weekends, the number of visitors rises sharply. At the same time, the overall number of users increases over a more extended period as the shop becomes known to a broader audience. To plan the number of goods to be held in stock, you need to see the number of orders at any point in time over several weeks. It is a typical multi-step time series problem, and similar problems exist in various domains:

Healthcare: e.g., forecasting of health signals such as heart, blood, or breathing signals
Network Security: e.g., analysis of network traffic in intrusion detection systems
Sales and Marketing: e.g., forecasting of market demand
Production demand forecasting: e.g., for power consumption and capacity planning
Prediction and filtering of sensor signals, e.g., of audio signals

A time series forecasting problem: predicting sine curve data

Recurrent Neural Networks for Time Series Forecasting

The model used in this article is a recurrent neural network with Long short-term memory (LSTM) layers. Unlike feedforward neural networks, recurrent networks with LSTM layers have loops to pass output values from one training instance to the next. LSTM layers are a powerful and widely-used tool for deep learning, and they work particularly well for time series data. By using LSTM layers, it is possible to train machine learning models that can make accurate predictions based on time series data, which can be useful for a wide range of applications, including finance, weather forecasting, and many others.

Also: Mastering Multivariate Stock Market Prediction with Python: A Guide to Effective Feature Engineering Techniques

The Training Process of a Recurrent Neural Network

When training neural networks, the process typically involves multiple epochs, where an epoch refers to a single pass through the entire training dataset. During each epoch, the neural network receives the entire training data and adjusts its weights accordingly through forward and backward propagation. The batch size, which determines the number of examples passed through the network at once, also affects how the weights are updated between neurons.

It’s important to note that one epoch is usually not enough for the model to learn the underlying patterns and relationships in the data, leading to underfitting and poor performance in prediction tasks. Hence, multiple epochs are often needed to fine-tune the network and improve its predictive capabilities. However, care must be taken not to overfit the model by choosing too many epochs. Overfitting occurs when the model becomes too complex and performs well on the training data but poorly on any other data, resulting in poor generalization.

To avoid overfitting, we can employ various techniques such as early stopping, where the training process is stopped when the validation loss begins to increase, indicating that the model is overfitting. Another method is to use regularization techniques such as L1 or L2 regularization, dropout, or data augmentation to prevent overfitting and improve the model’s generalization capabilities.

Functioning of an LSTM layer

An LSTM (Long Short-Term Memory) layer is a specialized type of recurrent neural network (RNN) layer that is specifically designed for processing and making predictions based on time series data. Unlike traditional RNNs, which suffer from the vanishing gradient problem, LSTM layers can maintain a memory of past events and selectively use this memory to make predictions about future events.

LSTM layers accomplish this by incorporating several unique mechanisms, such as gates and cell states. The gates control the flow of information into and out of the cell, while the cell state represents the memory of the network. These mechanisms work together to enable LSTM layers to learn and make predictions based on long-term dependencies in the data, which is a characteristic of many time series datasets.

One of the key advantages of using LSTM layers for time series forecasting is their ability to generate predictions for multiple timesteps. This is achieved by iteratively reusing the model outputs from the previous training run as input for the next prediction step. This approach, known as a rolling forecast, allows the model to make accurate predictions for multiple time steps into the future.

Functioning of an LSTM layer

LSTM Components

To understand how an LSTM layer works, it is helpful to think about the layer as consisting of several different components, each of which has a specific role in the overall operation of the layer. These components include the following:

Input gate: The input gate controls which information from the input data will be passed on to the cell state. The input gate uses a sigmoid activation function to determine which information should be retained and which should be discarded.
Forget gate: The forget gate controls which information from the cell state will be discarded. The forget gate uses a sigmoid activation function to determine which information should be forgotten and which should be retained.
Cell state: The cell state is a vector that contains the information that is retained by the LSTM layer. This information is updated at each time step based on the input from the input gate and the forget gate.
Output gate: The output gate controls which information from the cell state will be passed on to the output of the LSTM layer. The output gate uses a sigmoid activation function to determine which information should be retained and which should be discarded.

This article uses LSTM layers combined with a rolling forecast approach to predict the course of a sinus curve with a linear slope. The result is a multi-step time series forecast.

Creating a Rolling Multi-Step Time Series Forecast in Python

In this tutorial, we will explore the process of creating a rolling multi-step forecast using Python. Our dataset for this exercise will be a synthetically generated rising sine curve, which we will use to demonstrate the principles of multi-step time series forecasting.

By following the steps outlined in this tutorial, you will gain a comprehensive understanding of the process involved in multi-step time series forecasting. This will include techniques for data preprocessing, feature engineering, and model selection.

To get started, we have made the code available on our GitHub repository. You can easily access and follow along with the tutorial by downloading the code and running it on your local machine. This will enable you to experiment with different parameters and modify the code to suit your specific requirements.

View on GitHub Relataly GitHub Repo

Prerequisites

Before starting the coding part, make sure that you have set up your Python 3 environment and required packages. If you don’t have an environment yet, you can follow this tutorial to set up the Anaconda environment.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:

In addition, we will be using Keras (2.0 or higher) with Tensorflow backend and the machine learning library Scikit-learn.

You can install packages using console commands:

pip install
conda install (if you are using the anaconda packet manager)

Step #1 Generating Synthetic Data

We begin by loading the required packages and creating a synthetic dataset. The synthetic data contains 300 values of the sinus function combined with a slight linear upward slope of 0.02. The code below creates the data and visualizes it in a line plot.

# A tutorial for this file is available at www.relataly.com

import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, TimeDistributed, Dropout, Activation
from sklearn.preprocessing import RobustScaler
from sklearn.metrics import mean_absolute_error
import tensorflow as tf
import seaborn as sns
sns.set_style('white', { 'axes.spines.right': False, 'axes.spines.top': False})

# check the tensorflow version and the number of available GPUs
print('Tensorflow Version: ' + tf.__version__)
physical_devices = tf.config.list_physical_devices('GPU')
print("Num GPUs:", len(physical_devices))

# Creating the synthetic sinus curve dataset
steps = 300
gradient = 0.02
list_a = []
for i in range(0, steps, 1):
    y = round(gradient * i + math.sin(math.pi * 0.125 * i), 5)
    list_a.append(y)
df = pd.DataFrame({"sine_curve": list_a}, columns=["sine_curve"])

# Visualizing the data
fig, ax1 = plt.subplots(figsize=(16, 4))
ax1.xaxis.set_major_locator(plt.MaxNLocator(30))
plt.title("Sinus Data")
sns.lineplot(data=df[["sine_curve"]], color="#039dfc")
plt.legend()
plt.show()

The signal curve oscillates and is steadily moving upward.

Step #2 Preprocessing

Next, we preprocess the data to bring it into the shape required by our neural network model. Running the code below will perform the following tasks:

Scale the data to a standard range.
Partition the data into multiple periods with a predefined sequence length. Each partition series contains 110 data points. The number of steps needs to match the number of neurons in the first layer of the neural network architecture.
Split the data into two separate datasets for training and testing.

Also: Mastering Multivariate Stock Market Prediction with Python: A Guide to Effective Feature Engineering Techniques

data = df.copy()

# Get the number of rows in the data
nrows = df.shape[0]

# Convert the data to numpy values
np_data_unscaled = np.array(df)
np_data_unscaled = np.reshape(np_data_unscaled, (nrows, -1))
print(np_data_unscaled.shape)

# Transform the data by scaling each feature to a range between 0 and 1
scaler = RobustScaler()
np_data_scaled = scaler.fit_transform(np_data_unscaled)

# Set the sequence length - this is the timeframe used to make a single prediction
sequence_length = 110

# Prediction Index
index_Close = 0

# Split the training data into train and train data sets
# As a first step, we get the number of rows to train the model on 80% of the data 
train_data_len = math.ceil(np_data_scaled.shape[0] * 0.8)

# Create the training and test data
train_data = np_data_scaled[0:train_data_len, :]
test_data = np_data_scaled[train_data_len - sequence_length:, :]

# The RNN needs data with the format of [samples, time steps, features]
# Here, we create N samples, sequence_length time steps per sample, and 6 features
def partition_dataset(sequence_length, data):
    x, y = [], []
    data_len = data.shape[0]
    for i in range(sequence_length, data_len):
        x.append(data[i-sequence_length:i,:]) #contains sequence_length values 0-sequence_length * columsn
        y.append(data[i, index_Close]) #contains the prediction values for validation (3rd column = Close),  for single-step prediction
    
    # Convert the x and y to numpy arrays
    x = np.array(x)
    y = np.array(y)
    return x, y

# Generate training data and test data
x_train, y_train = partition_dataset(sequence_length, train_data)
x_test, y_test = partition_dataset(sequence_length, test_data)

# Print the shapes: the result is: (rows, training_sequence, features) (prediction value, )
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# Validate that the prediction value and the input match up
# The last close price of the second input sample should equal the first prediction value
print(x_test[1][sequence_length-1][index_Close])
print(y_test[0])

(130, 110, 1) (130,)
(60, 110, 1) (60,)
0.599655121905751
0.599655121905751

Step #3 Preparing Data for the Multi-Step Time Series Forecasting Model

Now that we have prepared the synthetic data, we need to determine the architecture of our neural network. There are no clear guidelines available on how to configure a neural network. Most problems are unique, and the model settings depend on the nature and extent of the data. Thus, configuring LSTM layers can be a real challenge.

3.1 Overview of Model Parameters

Finding the optimal configuration is often a process of trial and error. Below you find a list of model parameters with which you can experiment:

Keras model parameters used in this tutorial
epoch: An epoch is an iteration over the entire x_train and y_train data provided.
Batch_size: Number of samples per gradient update. If unspecified, batch_size is 32.
Activation: Mathematical equations determine whether a neuron in the network should activate (“fired”) or not.
Input_shape: Tensor with shape: (batch_size, …, input_dim).
Input_len: the length of the generated input sequence.
Return_sequences: If set to false, the layer will return the final output in the output sequence. If true, the layer returns the entire series.
Loss: The loss function tells the model how to evaluate its performance so that the weights can be updated to reduce the loss on the following evaluation.
Optimizer: Every time a neural network finishes processing a batch through the network and generates prediction results, the optimizer decides how to adjust weights between the nodes.

Source: keras.io/layers/recurrent – view the Keras documentation for a complete list of parameters.

3.2 Choosing Model Parameters

Finding a suitable architecture for a neural network is not easy and requires a systematic approach. A common practice is to start with an established standard architecture. We can then change the model parameters slightly and record the configurations and outcomes. We can also automate configuring and testing the model (Hyperparameter Tuning). Still, because this is not the focus of this article, we will use a manual approach.

We start with five epochs, a batch_size of 1, and configure our recurrent model with one LSTM layer with 110 neurons, corresponding to a data slice of 110 input values. In addition, we add a dense layer that provides us with a single output value.

# Configure the neural network model
epochs = 12; batch_size = 1;

# Model with n_neurons = inputshape Timestamps, each with x_train.shape[2] variables
n_neurons = x_train.shape[1] * x_train.shape[2]
model = Sequential()
model.add(LSTM(n_neurons, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(LSTM(n_neurons, return_sequences=False))
model.add(Dense(5))
model.add(Dense(1))
model.compile(optimizer="adam", loss="mean_squared_error")

Step #3 Training the Prediction Model

Now that we have defined the architecture of our recurrent neural network, the next step is to train the model. Training the model involves using a training dataset to adjust the weights and biases of the network so that it can make accurate predictions on new data. This process is done using an optimization algorithm, such as gradient descent, to minimize the difference between the predicted values and the true values.

Training can be time-consuming, especially for large datasets or complex models, and the amount of time it takes may vary depending on the performance of your computer. The complexity of our model is relatively low, and we only use five training epochs. It should thus only take a couple of minutes to train the model.

It is important to monitor the training process to ensure that the model is learning effectively and to identify any potential issues or problems. Once the training is complete, we can test the model on a separate test dataset to evaluate its performance.

# Train the model
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs)

Epoch 1/12
130/130 [==============================] - 7s 28ms/step - loss: 0.0544
Epoch 2/12
130/130 [==============================] - 4s 27ms/step - loss: 0.0041
Epoch 3/12
130/130 [==============================] - 4s 28ms/step - loss: 3.5105e-04
Epoch 4/12
130/130 [==============================] - 4s 28ms/step - loss: 4.3903e-05
Epoch 5/12
130/130 [==============================] - 4s 28ms/step - loss: 5.9422e-05
Epoch 6/12
130/130 [==============================] - 4s 28ms/step - loss: 5.7988e-05
Epoch 7/12
130/130 [==============================] - 4s 28ms/step - loss: 8.3814e-04
Epoch 8/12
130/130 [==============================] - 4s 28ms/step - loss: 1.9122e-04
Epoch 9/12
130/130 [==============================] - 4s 28ms/step - loss: 0.0014
Epoch 10/12
130/130 [==============================] - 4s 29ms/step - loss: 0.0114
Epoch 11/12
130/130 [==============================] - 4s 28ms/step - loss: 0.0013
Epoch 12/12
130/130 [==============================] - 4s 28ms/step - loss: 3.4571e-05

Step #4 Predicting a Single-step Ahead

We continue by making single-step predictions based on the training data. Then we calculate the mean squared error and the median error to measure the performance of our model.

# Reshape the data, so that we get an array with multiple test datasets
x_test_np = np.array(x_test)
x_test_reshape = np.reshape(x_test_np, (x_test_np.shape[0], x_test_np.shape[1], 1))

# Get the predicted values
y_pred = model.predict(x_test_reshape)
y_pred_unscaled = scaler.inverse_transform(y_pred)
y_test_unscaled = scaler.inverse_transform(y_test.reshape(-1, 1))

# Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_unscaled, y_pred_unscaled)
print(f'Median Absolute Error (MAE): {np.round(MAE, 2)}')

# Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(y_test_unscaled, y_pred_unscaled)/ y_test_unscaled))) * 100
print(f'Mean Absolute Percentage Error (MAPE): {np.round(MAPE, 2)} %')

# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(y_test_unscaled, y_pred_unscaled)/ y_test_unscaled)) ) * 100
print(f'Median Absolute Percentage Error (MDAPE): {np.round(MDAPE, 2)} %')

Out: me: 0.0223, rmse: 0.0206

Both the mean error and the squared mean error are pretty small. As a result, the values predicted by our model are close to the actual values of the ground truth. Even though it is unlikely because of the small number of epochs, it could still be that the model is over-fitting.

Step #5 Visualizing Predictions and Loss

Next, we look at the quality of the training predictions by plotting the training predictions and the actual values (i.e., ground truth).

# Visualize the data
#train = df[:train_data_len]
df_valid_pred = df[train_data_len:]
df_valid_pred.insert(1, "y_pred", y_pred_unscaled, True)

# Create the lineplot
fig, ax1 = plt.subplots(figsize=(32, 5), sharex=True)
ax1.tick_params(axis="x", rotation=0, labelsize=10, length=0)
plt.title("Predictions vs Ground Truth")
sns.lineplot(data=df_valid_pred)
plt.show()

The smaller the area between the two lines, the better our model predictions. So we can tell from the plot that the predictions are not entirely wrong. We also check the learning path of the regression model.

# Plot training & validation loss values
fig, ax = plt.subplots(figsize=(10, 5), sharex=True)
sns.lineplot(data=history.history["loss"])
plt.title("Model loss")
plt.ylabel("Loss")
plt.xlabel("Epoch")
ax.xaxis.set_major_locator(plt.MaxNLocator(epochs))
plt.legend(["Train"], loc="upper left")
plt.grid()
plt.show()

The loss drops quickly, and after five epochs, the model seems to have converged.

Step #6 Rolling Forecasting: Creating a Multi-step Time Series Forecast

Next, we will generate the rolling multi-step forecast. This approach is different from a single-step approach in that we predict several points of a signal within a prediction window and not just a single value. However, the prediction quality decreases over more extended periods because reusing predictions creates a feedback loop that amplifies potential errors over time.

Also: Measuring Regression Errors with Python

The forecasting process begins with an initial prediction for a single time step. After that, we add the predicted value to the input values for another projection, and so on. In this way, we create the rolling forecast with multiple time steps.

# Settings and Model Labels
rolling_forecast_range = 30
titletext = "Forecast Chart Model A"
ms = [
    ["epochs", epochs],
    ["batch_size", batch_size],
    ["lstm_neuron_number", n_neurons],
    ["rolling_forecast_range", rolling_forecast_range],
    ["layers", "LSTM, DENSE(1)"],
]
settings_text = ""
lms = len(ms)
for i in range(0, lms):
    settings_text += ms[i][0] + ": " + str(ms[i][1])
    
    if i < lms - 1:
        settings_text = settings_text + ",  "

# Making a Multi-Step Prediction
# Create the initial input data
new_df = df.filter(["sine_curve"])
for i in range(0, rolling_forecast_range):
    # Select the last sequence from the dataframe as input for the prediction model
    last_values = new_df[-n_neurons:].values
    
    # Scale the input data and bring it into shape
    last_values_scaled = scaler.transform(last_values)
    X_input = np.array(last_values_scaled).reshape([1, 110, 1])
    X_test = np.reshape(X_input, (X_input.shape[0], X_input.shape[1], 1))
    
    # Predict and unscale the predictions
    pred_value = model.predict(X_input)
    pred_value_unscaled = scaler.inverse_transform(pred_value)
    
    # Add the prediction to the next input dataframe
    new_df = pd.concat([new_df, pd.DataFrame({"sine_curve": pred_value_unscaled[0, 0]}, index=new_df.iloc[[-1]].index.values + 1)])
    new_df_length = new_df.size
forecast = new_df[new_df_length - rolling_forecast_range : new_df_length].rename(
    columns={"sine_curve": "Forecast"}
)

We can plot the forecast together with the ground truth.

#Visualize the results
validxs = df_valid_pred.copy()
dflen = new_df.size - 1
dfs = pd.concat([validxs, forecast], sort=False)
dfs.at[dflen, "y_pred_train"] = dfs.at[dflen, "y_pred"]
dfs

# Zoom in to a closer timeframe
df_zoom = dfs[dfs.index > 200]

# Visualize the data
fig, ax = plt.subplots(figsize=(16, 5), sharex=True)
ax.tick_params(axis="x", rotation=0, labelsize=10, length=0)
ax.xaxis.set_major_locator(plt.MaxNLocator(rolling_forecast_range))
plt.title('Forecast And Predictions (y_pred)')
sns.lineplot(data=df_zoom[["sine_curve", "y_pred"]], ax=ax)
sns.scatterplot(data=df_zoom[["Forecast"]], x=df_zoom.index, y='Forecast', linewidth=1.0, ax=ax)
plt.legend(["x_train", "y_pred", "y_test_rolling"], loc="lower right")
ax.annotate('ModelSettings: ' + settings_text, xy=(0.06, .015),  xycoords='figure fraction', horizontalalignment='left', verticalalignment='bottom', fontsize=10)
plt.show()

The model learned to predict the periodic movement of the curve and thus succeeds in modeling its further course. However, the amplitude of the predicted curve increases over time, which increases the prediction errors.

Step #7 Comparing Results for Different Parameters

There is plenty of room to improve the forecasting model further. So far, our model seems not to consider that the amplitude of the sinus curve is gradually increasing. Over time, errors are amplified, leading to a growing deviation from the ground truth signal.

We can improve the model by changing the model parameters and the model architecture. However, I would not recommend changing them one at a time.

I tested several model configurations with varying epochs and neuron numbers/sample sizes to further optimize the model. Parameters such as Batch_size (=1), the timeframe for the forecast (=30 timesteps), and the model architecture (=1 LSTM Layer, 1 DENSE Layer) were left unchanged. The illustrations below show the results:

Different designs of the rolling time series forecasting model

The model that looks most promising is model #6. This model considers both the periodic movements of the sinus curve and the steadily increasing slope. The configuration of this model uses 15 epochs and 115 neurons/sample values.

Errors are amplified over more extended periods when they enter a feedback loop. For this reason, we can see in the graph below that the prediction accuracy decreases over time.

Long-time multi-step forecast (model #6)

Summary

In this tutorial, we have created a rolling time-series forecast for a rising sine curve. A multi-step forecast helps better understand how a signal will develop over a more extended period. Finally, we have tested and compared different model variants and selected the best-performing model.

There are several ways to improve model accuracy further. For example, the current network architecture was kept simple and only had a single LSTM layer. Consequently, we could try to add additional layers. Another possibility is to experiment with different hyperparameters. For example, we could increase the training epochs and use dropout to prevent overfitting. In addition, we could experiment with other activation functions. Feel free to try it out.

I hope this article was helpful. If you have questions remaining, let me know in the comments.

Another forecasting approach is to train a neural network model that predicts multiple outputs per input batch. Another relataly article demonstrates how this multi-output regression approach works: Stock Market Prediction – Multi-output regression in Python.

Also, consider non-neural network approaches to time series forecasting, such as ARIMA, which can achieve great results too.

Sources and Further Reading

The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.

The post Rolling Time Series Forecasting: Creating a Multi-Step Prediction for a Rising Sine Curve using Neural Networks in Python appeared first on relataly.com.

Stock Market Prediction – Adjusting Time Series Prediction Intervals in Python

Florian Follonier — Wed, 01 Apr 2020 16:37:56 +0000

Get ready to level up your time-series forecasting game! In this tutorial, we’re going to take things up a notch by showing you how to adjust prediction intervals using Keras recurrent neural networks and Python.

Now, you may remember our previous article on stock market forecasting where we made a forecast for the S&P500 stock market index using a prediction interval of just one day. However, other time-series prediction problems may require us to look further ahead – maybe several days, weeks, or even months. Fear not! We’ve got you covered.

By tweaking our data preparation and model architecture, we can modify the prediction interval and create a single-step forecast for a longer time frame. In this article, we’re going to show you exactly how to do that.

First, we’ll give you a quick rundown of different methods for adjusting the time series prediction interval. Then, we’ll dive into the practical stuff. We’ll use Python to train a simple neural network on stock market data and validate its performance. Once we’re happy with the model, we’ll prepare the data in a way that allows us to forecast a single but more extended step into the future.

Ready to take your time-series forecasting to the next level? Then let’s get started!

Disclaimer

Time Series Analysis Clocks

" data-image-caption="

Time Series Analysis Clocks

" data-large-file="https://www.relataly.com/wp-content/uploads/2020/04/Time-Series-Analysis-Clocks.png" src="https://www.relataly.com/wp-content/uploads/2020/04/Time-Series-Analysis-Clocks-512x287.png" alt="Time Series Analysis Clocks" class="wp-image-13471" srcset="https://www.relataly.com/wp-content/uploads/2020/04/Time-Series-Analysis-Clocks.png 512w, https://www.relataly.com/wp-content/uploads/2020/04/Time-Series-Analysis-Clocks.png 300w, https://www.relataly.com/wp-content/uploads/2020/04/Time-Series-Analysis-Clocks.png 768w, https://www.relataly.com/wp-content/uploads/2020/04/Time-Series-Analysis-Clocks.png 1456w" sizes="(max-width: 512px) 100vw, 512px" />

Time Series Analysis

Ways of Adjusting Prediction Intervals

When forecasting one step, the prediction interval is the point in time for which a prediction model will simulate the next value. There are three different ways to change the prediction interval:

Multi-Step Rolling Forecasting: Another way is to train the model on its output. We do this by maintaining the predictions and reusing them as input in the subsequent training run. In this way, the projections range one-time step further ahead with each iteration. After seven iterations, based on daily input time steps, the model will have provided the output for a weekly prediction. We have covered this rolling forecasting approach in a separate tutorial.
Deep Multi-Output Forecasting: A third option is to create a multi-output model that provides an entire series of predictions with multiple timesteps. We have covered this multi-output forecasting approach in a separate tutorial.
Single-step forecasting with bigger timesteps: In a single-stage forecasting approach, the input data defines the length of a time step. Changing the size of the input steps will change the output steps to the same extent. For example, a model that uses daily prices as input data will also provide day-to-day forecasts. We will cover this forecasting approach in the following section.

Predicting the Price of the S&P500 One Week Ahead

Let’s begin with the hands-on part. In this part, we will use Python to create a single-step forecasting model with more extended timesteps. Our model will make projections that reach one week ahead. For this purpose, we reuse most of the code from the previous article on univariate single-step daily forecasting. So we won’t go into all the details and will only speak about the areas in which we must adjust the code. Changes are necessary to data preparation and model architecture.

In the following, we develop a single-variate neural network model that forecasts the S&P500 stock market index. The code is available on the GitHub repository.

View on GitHub Relataly GitHub Repo

Prerequisites

Before starting the coding part, make sure that you have set up your Python 3 environment and required packages. If you don’t have an environment, you can follow these steps to set up the Anaconda environment.

Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:

In addition, we will be using Keras (2.0 or higher) with Tensorflow backend, the machine learning library sci-kit-learn, and pandas DataReader to interact with the yahoo finance API.

You can install packages using console commands:

pip install
conda install (if you are using the anaconda packet manager)

Step #1 Load the Data

In the following, we will modify the prediction interval of the neural network model we developed in a previous post. As a result, the model will generate predictions for the market price of the S&P500 Index that range one week ahead.

As before, we start loading the stock market data via an API.

# A tutorial for this file is available at www.relataly.com

import math # Fundamental package for scientific computing with Python
import numpy as np # Additional functions for analysing and manipulating data
import pandas as pd # Date Functions
from datetime import date, timedelta # This function adds plotting functions for calender dates
from pandas.plotting import register_matplotlib_converters # Important package for visualization - we use this to plot the market data
import matplotlib.pyplot as plt # Formatting dates
import matplotlib.dates as mdates # Packages for measuring model performance / errors
from sklearn.metrics import mean_absolute_error, mean_squared_error # Tools for predictive data analysis. We will use the MinMaxScaler to normalize the price data 
from sklearn.preprocessing import MinMaxScaler # Deep learning library, used for neural networks
from tensorflow.keras.models import Sequential # Deep learning classes for recurrent and regular densely-connected layers
from tensorflow.keras.layers import LSTM, Dense
from tensorflow.keras.callbacks import EarlyStopping
import tensorflow as tf
import seaborn as sns
sns.set_style('white', { 'axes.spines.right': False, 'axes.spines.top': False})

# check the tensorflow version and the number of available GPUs
print('Tensorflow Version: ' + tf.__version__)
physical_devices = tf.config.list_physical_devices('GPU')
print("Num GPUs:", len(physical_devices))


# Setting the timeframe for the data extraction
end_date = date.today().strftime("%Y-%m-%d")
start_date = '2010-01-01'

# Getting S&P500 quotes
stockname = 'S&P500'
symbol = '^GSPC'

# You can either use webreader or yfinance to load the data from yahoo finance
# import pandas_datareader as webreader
# df = webreader.DataReader(symbol, start=start_date, end=end_date, data_source="yahoo")

import yfinance as yf #Alternative package if webreader does not work: pip install yfinance
df = yf.download(symbol, start=start_date, end=end_date)

df.head(5)

Tensorflow Version: 2.5.0
Num GPUs: 0
[*********************100%***********************]  1 of 1 completed
			Open		High		Low			Close		Adj Close	Volume
Date						
2009-12-31	1126.599976	1127.640015	1114.810059	1115.099976	1115.099976	2076990000
2010-01-04	1116.560059	1133.869995	1116.560059	1132.989990	1132.989990	3991400000
2010-01-05	1132.660034	1136.630005	1129.660034	1136.520020	1136.520020	2491020000
2010-01-06	1135.709961	1139.189941	1133.949951	1137.140015	1137.140015	4972660000
2010-01-07	1136.270020	1142.459961	1131.319946	1141.689941	1141.689941	5270680000

Step #2 Adjusting the Shape of the Input Data and Exploration

We have a DataFrame that contains the daily price quotes for the S&P 500. Next, we prepare the data to include the weekly price quotes. If we want our model to provide weekly price predictions, we need to change the data so that the input contains weekly price quotes. A simple way to achieve this is to iterate through the rows and only keep every 7th row.

# Changing the data structure to a dataframe with weekly price quotes
df["index1"] = range(1, len(df) + 1)
rownumber = df.shape[0]
lst = list(range(rownumber))
list_of_relevant_numbers = lst[0::7]
df_weekly = df[df["index1"].isin(list_of_relevant_numbers)]
df_weekly.head(5)

	Open	High	Low	Close	Adj Close	Volume	index1
Date							
2010-01-11	1145.959961	1149.739990	1142.020020	1146.979980	1146.979980	4255780000	7
2010-01-21	1138.680054	1141.579956	1114.839966	1116.479980	1116.479980	6874290000	14
2010-02-01	1073.890015	1089.380005	1073.890015	1089.189941	1089.189941	4077610000	21
2010-02-10	1069.680054	1073.670044	1059.339966	1068.130005	1068.130005	4251450000	28
2010-02-22	1110.000000	1112.290039	1105.380005	1108.010010	1108.010010	3814440000	35

After this, we quickly create a line plot to validate that everything looks as expected.

# Creating a Lineplot
years = mdates.YearLocator() 
fig, ax1 = plt.subplots(figsize=(16, 6))
ax1.xaxis.set_major_locator(years)
ax1.legend([stockname], fontsize=12)
plt.title(stockname + ' from '+ start_date + ' to ' + end_date)
sns.lineplot(data=df['Close'], label=stockname, linewidth=1.0)
plt.ylabel('S&P500 Points')
plt.show()

			Open		High		Low			Close		Adj Close	Volume		index1
Date							
2010-01-11	1145.959961	1149.739990	1142.020020	1146.979980	1146.979980	4255780000	7
2010-01-21	1138.680054	1141.579956	1114.839966	1116.479980	1116.479980	6874290000	14
2010-02-01	1073.890015	1089.380005	1073.890015	1089.189941	1089.189941	4077610000	21
2010-02-10	1069.680054	1073.670044	1059.339966	1068.130005	1068.130005	4251450000	28
2010-02-22	1110.000000	1112.290039	1105.380005	1108.010010	1108.010010	3814440000	35

Step #3 Preprocess the Data

Before we can train the neural network, we first need to define the shape of the training data. We use weekly price quotes and define an input_sequence_length of 50 weeks.

# Feature Selection - Only Close Data
train_df = df.filter(['Close'])
data_unscaled = df.values

# Transform features by scaling each feature to a range between 0 and 1
mmscaler = MinMaxScaler(feature_range=(0, 1))
np_data = mmscaler.fit_transform(data_unscaled)

# Creating a separate scaler that works on a single column for scaling predictions
scaler_pred = MinMaxScaler()
df_Close = pd.DataFrame(df['Close'])
np_Close_scaled = scaler_pred.fit_transform(df_Close)

# Set the sequence length - this is the timeframe used to make a single prediction
sequence_length = 25

# Prediction Index
index_Close = train_df.columns.get_loc("Close")

# Split the training data into train and train data sets
# As a first step, we get the number of rows to train the model on 80% of the data 
train_data_length = math.ceil(np_data.shape[0] * 0.8)

# Create the training and test data
train_data = np_data[0:train_data_length, :]
test_data = np_data[train_data_length - sequence_length:, :]

# The RNN needs data with the format of [samples, time steps, features]
# Here, we create N samples, sequence_length time steps per sample, and 6 features
def partition_dataset(sequence_length, train_df):
    x, y = [], []
    data_len = train_df.shape[0]
    for i in range(sequence_length, data_len):
        x.append(train_df[i-sequence_length:i,:]) #contains sequence_length values 0-sequence_length * columsn
        y.append(train_df[i, index_Close]) #contains the prediction values for validation (3rd column = Close),  for single-step prediction
    
    # Convert the x and y to numpy arrays
    x = np.array(x)
    y = np.array(y)
    return x, y

# Generate training data and test data
x_train, y_train = partition_dataset(sequence_length, train_data)
x_test, y_test = partition_dataset(sequence_length, test_data)

# Print the shapes: the result is: (rows, training_sequence, features) (prediction value, )
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# Validate that the prediction value and the input match up
# The last close price of the second input sample should equal the first prediction value
print(x_test[1][sequence_length-1][index_Close])
print(y_test[0])

(2465, 25, 7) (2465,)
(622, 25, 7) (622,)
0.5509444640253316
0.5509444640253316

Step #4 Building a Time Series Prediction Model

The first layer of neurons in our neural network needs to fit the input values from the data. Therefore, we need 50 neurons – one for each input price quote.

The model architecture of the recurrent neural network

We use the following input arguments for the model fit:

x_train: Vector, matrix, or array of training data. It can also be a list (as in our case) if the model has multiple inputs.
y_train: Vector, matrix, or array of target data. This is the labeled data the model tries to predict; in other words, these are the results of x_train.
Epochs: The integer value defines how often the model goes through the training set.
Batch size: Integer value that defines the number of samples that will be propagated through the network. After each propagation, the network adjusts the weights of the nodes in each layer.

# Configure the neural network model
model = Sequential()

# Model with n_neurons Neurons
n_neurons = x_train.shape[1] * x_train.shape[2]
print(n_neurons, x_train.shape[1], x_train.shape[2])
model.add(LSTM(n_neurons, return_sequences=True, input_shape=(x_train.shape[1], x_train.shape[2])))
model.add(LSTM(n_neurons, return_sequences=False))
model.add(Dense(25, activation="relu"))
model.add(Dense(1))

# Compile the model
model.compile(optimizer="adam", loss="mean_squared_error")

# Training the model
epochs = 10
early_stop = EarlyStopping(monitor='loss', patience=2, verbose=1)
history = model.fit(x_train, y_train, 
                    batch_size=16, 
                    epochs=epochs, 
                    callbacks=[early_stop])

Epoch 1/10
155/155 [==============================] - 7s 25ms/step - loss: 8.2791e-04
Epoch 2/10
155/155 [==============================] - 4s 24ms/step - loss: 1.3465e-04
Epoch 3/10
155/155 [==============================] - 4s 24ms/step - loss: 1.0998e-04
Epoch 4/10
155/155 [==============================] - 4s 25ms/step - loss: 1.0241e-04
Epoch 5/10
155/155 [==============================] - 4s 24ms/step - loss: 7.4277e-05
Epoch 6/10
155/155 [==============================] - 4s 24ms/step - loss: 6.5786e-05
Epoch 7/10
155/155 [==============================] - 4s 24ms/step - loss: 6.8482e-05
Epoch 8/10
155/155 [==============================] - 4s 24ms/step - loss: 5.0326e-05
Epoch 9/10
155/155 [==============================] - 4s 24ms/step - loss: 4.8574e-05
Epoch 10/10
155/155 [==============================] - 4s 25ms/step - loss: 4.1287e-05

Step #5 Evaluate Model Performance

Next, we validate the model by calculating our predictions’ mean-squared and root-mean-squared errors. However, in time series forecasting metrics can be misleading. It is good to double-check model results using illustrations. Therefore, we plot the input sequences and the forecast to see if our model can continue the time series in a plausible way.

# Get the predicted values
y_pred_scaled = model.predict(x_test)
y_pred = scaler_pred.inverse_transform(y_pred_scaled)
y_test_unscaled = scaler_pred.inverse_transform(y_test.reshape(-1, 1))

# Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_unscaled, y_pred)
print(f'Median Absolute Error (MAE): {np.round(MAE, 2)}')

# Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled))) * 100
print(f'Mean Absolute Percentage Error (MAPE): {np.round(MAPE, 2)} %')

# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled)) ) * 100
print(f'Median Absolute Percentage Error (MDAPE): {np.round(MDAPE, 2)} %')

# The date from which on the date is displayed
display_start_date = "2018-01-01" 

# Add the difference between the valid and predicted prices
train = pd.DataFrame(train_df[:train_data_length + 1]).rename(columns={'Close': 'x_train'})
valid = pd.DataFrame(train_df[train_data_length:]).rename(columns={'Close': 'y_test'})
valid.insert(1, "y_pred", y_pred, True)
valid.insert(1, "residuals", valid["y_pred"] - valid["y_test"], True)
df_union = pd.concat([train, valid])

# Zoom in to a closer timeframe
df_union_zoom = df_union[df_union.index > display_start_date]

# Create the lineplot
fig, ax1 = plt.subplots(figsize=(16, 8))
plt.title("Predictions vs Ground Truth")
plt.ylabel(stockname, fontsize=18)
sns.despine();
sns.lineplot(data=df_union_zoom, linewidth=1.0, palette='CMRmap', ax=ax1)
plt.show()

Median Absolute Error (MAE): 42.7
Mean Absolute Percentage Error (MAPE): 1.16 %
Median Absolute Percentage Error (MDAPE): 0.92 %

At the bottom, we can see the differences between predictions and valid data. Positive values signal that the projections were too optimistic. Negative values mean that the predictions were too pessimistic and that the actual value turned out to be higher than the prediction.

Step #6 Predicting for the Next Week

What can be more satisfying than to see a newly trained model at work? Let’s use our new model to predict next week’s price for the S&P500. We will create a fresh input_sequence with prices from the past N days. Then we scale this input sequence and include it as input in our call to the model.predict() function.

# Get fresh data until today and create a new dataframe with only the price data

new_df = df

N = sequence_length

# Get the last N steps closing price values and scale the data to be values between 0 and 1
last_N_steps = new_df[-sequence_length:].values
last_N_steps_scaled = mmscaler.transform(last_N_steps)

# Create an empty list and Append past N steps
X_test_new = []
X_test_new.append(last_N_steps_scaled)

# Convert the X_test data set to a numpy array and reshape the data
pred_price_scaled = model.predict(np.array(X_test_new))
pred_price_unscaled = scaler_pred.inverse_transform(pred_price_scaled.reshape(-1, 1))

# Print last price and predicted price for the next week
price_today = np.round(new_df['Close'][-1], 2)
predicted_price = np.round(pred_price_unscaled.ravel()[0], 2)
change_percent = np.round(100 - (price_today * 100)/predicted_price, 2)

plus = '+'; minus = ''
print(f'The close price for {stockname} at {end_date} was {price_today}')
print(f'The predicted close price is {predicted_price} ({plus if change_percent > 0 else minus}{change_percent}%)')

The close price for S&P500 at 2022-05-11 was 4001.05
The predicted close price is 4046.300048828125 (+1.12%)

So for the 9th of April 2020, the model predicts that the S&P500 will close at:

4046.300048828125

Considering today’s (2nd of April 2020) price is 2528 points, our model expects the S&P to gain roughly 124 points in the coming seven days. Of course, this is by no means financial advice. As we have seen before, our model is often wrong.

Summary

This article has shown how to adjust the prediction intervals for a time series forecasting model. We have created a neural network that predicts the price of the S&P500 one week in advance. Finally, we trained and validated the model and made a forecast for the next week.

Varying the input shape is a quick approach to changing the forecasting time steps. However, increasing the length of the time steps also reduces the amount of data we can use for training and testing. In our case, we still have enough data available. But in other cases, where less information is available, this can become a problem. The preferred method is to use a rolling forecast approach or create a multi-output forecast in such a case.

I hope this article was helpful. Should you have questions or remarks, let me know in the comments.

Sources and Further Readings

The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.

The post Stock Market Prediction – Adjusting Time Series Prediction Intervals in Python appeared first on relataly.com.