<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Yahoo Finance API Archives - relataly.com</title>
	<atom:link href="https://www.relataly.com/category/rest-apis/yahoo-finance-api/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.relataly.com/category/rest-apis/yahoo-finance-api/</link>
	<description>The Business AI Blog</description>
	<lastBuildDate>Sat, 27 May 2023 10:26:34 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=7.0</generator>

<image>
	<url>https://www.relataly.com/wp-content/uploads/2023/04/cropped-AI-cat-Icon-White.png</url>
	<title>Yahoo Finance API Archives - relataly.com</title>
	<link>https://www.relataly.com/category/rest-apis/yahoo-finance-api/</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">175977316</site>	<item>
		<title>Predictive Maintenance: Predicting Machine Failure using Sensor Data with XGBoost and Python</title>
		<link>https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/</link>
					<comments>https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/#comments</comments>
		
		<dc:creator><![CDATA[Florian Follonier]]></dc:creator>
		<pubDate>Sun, 08 Jan 2023 20:34:44 +0000</pubDate>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Classification (multi-class)]]></category>
		<category><![CDATA[Cross-Validation]]></category>
		<category><![CDATA[Data Visualization]]></category>
		<category><![CDATA[Exploratory Data Analysis (EDA)]]></category>
		<category><![CDATA[Gradient Boosting]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Manufacturing]]></category>
		<category><![CDATA[Plotly]]></category>
		<category><![CDATA[Predictive Maintenance]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Scikit-Learn]]></category>
		<category><![CDATA[Seaborn]]></category>
		<category><![CDATA[Yahoo Finance API]]></category>
		<category><![CDATA[AI in Manufacturing]]></category>
		<category><![CDATA[Classic Machine Learning]]></category>
		<category><![CDATA[Intermediate Tutorials]]></category>
		<category><![CDATA[Multivariate Models]]></category>
		<guid isPermaLink="false">https://www.relataly.com/?p=10618</guid>

					<description><![CDATA[<p>Predictive maintenance is a game-changer for the modern industry. Still, it is based on a simple idea: By using machine learning algorithms, businesses can predict equipment failures before they happen. This approach can help businesses improve their operations by reducing the need for reactive, unplanned maintenance and by enabling them to schedule maintenance activities during ... <a title="Predictive Maintenance: Predicting Machine Failure using Sensor Data with XGBoost and Python" class="read-more" href="https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/" aria-label="Read more about Predictive Maintenance: Predicting Machine Failure using Sensor Data with XGBoost and Python">Read more</a></p>
<p>The post <a href="https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/">Predictive Maintenance: Predicting Machine Failure using Sensor Data with XGBoost and Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">Predictive maintenance is a game-changer for the modern industry. Still, it is based on a simple idea: By using machine learning algorithms, businesses can predict equipment failures before they happen. This approach can help businesses improve their operations by reducing the need for reactive, unplanned maintenance and by enabling them to schedule maintenance activities during planned downtime. In this article, we&#8217;ll explore the use of machine learning algorithms to predict machine failures using the robust XGBoost algorithm in Python. By the end of this tutorial, you&#8217;ll have the knowledge and skills to start implementing predictive maintenance in your organization. So, let&#8217;s get started!</p>



<p class="wp-block-paragraph">We begin by discussing the concept of predictive maintenance and show different ways to implement it. Then we will turn to the coding part in python and implement the prediction model based on machine sensor data. We train a classification model that predicts different types of machine failure using XGBoost.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img fetchpriority="high" decoding="async" width="509" height="467" data-attachment-id="12909" data-permalink="https://www.relataly.com/robot-factory-machine-learning-predictive-maintenance-min/" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/03/robot-factory-machine-learning-predictive-maintenance-min.png" data-orig-size="509,467" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="robot factory machine learning predictive maintenance-min" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/robot-factory-machine-learning-predictive-maintenance-min.png" src="https://www.relataly.com/wp-content/uploads/2023/03/robot-factory-machine-learning-predictive-maintenance-min.png" alt="Predictive maintenance is a game-changer for the modern industry. Image generated with Midjourney." class="wp-image-12909" srcset="https://www.relataly.com/wp-content/uploads/2023/03/robot-factory-machine-learning-predictive-maintenance-min.png 509w, https://www.relataly.com/wp-content/uploads/2023/03/robot-factory-machine-learning-predictive-maintenance-min.png 300w" sizes="(max-width: 509px) 100vw, 509px" /><figcaption class="wp-element-caption">Predictive maintenance is a game-changer for the modern industry. Image generated with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a>.</figcaption></figure>
</div>
</div>



<h2 class="wp-block-heading">What is Predictive Maintenance?</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">Predictive maintenance is a data-driven approach that uses predictive modeling to assess the state of equipment and determine the optimal timing for maintenance activities. This technique is particularly beneficial in industries that heavily rely on equipment for their operations, such as manufacturing, transportation, energy, and healthcare. Depending on the requirements and challenges of an organization, predictive maintenance may contribute to one or several of the following goals:</p>



<ul class="wp-block-list">
<li><strong>Improve equipment reliability</strong>: By proactively identifying and addressing potential problems with equipment, predictive maintenance can help improve the reliability of the equipment, reducing the risk of unexpected downtime or failure.</li>



<li><strong>Increase efficiency</strong>: Predictive maintenance can help improve the efficiency of equipment by identifying and fixing problems before they cause equipment failure or downtime. This can help reduce maintenance costs and increase productivity.</li>



<li><strong>Improve safety:</strong> Predictive maintenance can help improve safety by identifying and addressing potential problems with equipment before they occur. This can help prevent accidents and injuries caused by equipment failure.</li>



<li><strong>Reduce maintenance costs</strong>: By proactively identifying and fixing potential problems with equipment, predictive maintenance can help reduce the overall cost of maintenance by minimizing the need for unscheduled downtime.</li>



<li><strong>Improve asset management</strong>: Predictive maintenance can help improve asset management by providing data and insights into the condition and performance of equipment. This can help organizations decide when to replace or upgrade equipment.</li>
</ul>



<p class="wp-block-paragraph">Next, we look at the different ways organizations can implement predictive maintenance.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img decoding="async" width="511" height="510" data-attachment-id="12380" data-permalink="https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/monitoring-predictive-maintenance-safety-manufacturing-min/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/02/monitoring-predictive-maintenance-safety-manufacturing-min.png" data-orig-size="511,510" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="monitoring-predictive-maintenance-safety-manufacturing-min" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/02/monitoring-predictive-maintenance-safety-manufacturing-min.png" src="https://www.relataly.com/wp-content/uploads/2023/02/monitoring-predictive-maintenance-safety-manufacturing-min.png" alt="" class="wp-image-12380" srcset="https://www.relataly.com/wp-content/uploads/2023/02/monitoring-predictive-maintenance-safety-manufacturing-min.png 511w, https://www.relataly.com/wp-content/uploads/2023/02/monitoring-predictive-maintenance-safety-manufacturing-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/02/monitoring-predictive-maintenance-safety-manufacturing-min.png 140w" sizes="(max-width: 511px) 100vw, 511px" /><figcaption class="wp-element-caption">Utilities and manufacturing are only two of the many industries that use predictive maintenance. Image generated with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a>.</figcaption></figure>



<p class="wp-block-paragraph"></p>
</div>
</div>
</div>
</div>



<h2 class="wp-block-heading">Approaches to Predictive Maintenance</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">There are several approaches to implementing a predictive maintenance solution, depending on the type of equipment being monitored and the resources available. These approaches include:</p>



<ul class="wp-block-list">
<li><strong>Condition-based monitoring:</strong> This involves continuously monitoring the condition of the equipment using sensors. When certain thresholds or conditions are met, an alert is triggered, or corrective measures are launched. The goal is to reduce the risk of failure. For example, if the temperature of a motor exceeds a certain level, this may indicate that the motor is about to fail.</li>



<li><strong>Predictive modeling:</strong> This approach involves using machine learning algorithms to analyze historical lifetime data about the equipment to identify patterns that may indicate an impending failure. This can be done using data from sensors, as well as operational data and maintenance records. When historical or failure data is not available, a degradation model can be created to estimate failure times based on a threshold value. This approach is often used when there is limited data available.</li>



<li><strong>Prognostic algorithms: </strong>By using data from sensors and other sources, prognostic algorithms can predict the remaining useful life of a piece of equipment. This information can help organizations determine the likelihood of a breakdown and plan for replacements or maintenance activities. By understanding the equipment better, organizations can potentially extend maintenance cycles, which can reduce costs for replacements and maintenance.</li>
</ul>



<p class="wp-block-paragraph">It is important to choose an approach that is appropriate for the specific equipment and maintenance challenges faced by the organization. </p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h2 class="wp-block-heading">Data Requirements</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">When implementing predictive maintenance, it is important to consider that each approach comes with its own set of data requirements. Types of data include the following:</p>



<ul class="wp-block-list">
<li><strong>Current condition data</strong> includes information about the state of the equipment, such as its temperature, pressure, vibration, and other physical parameters.</li>



<li><strong>Operating data </strong>includes information about how the equipment is being used, such as its load, speed, and other operating parameters.</li>



<li><strong>Maintenance history data</strong> includes information about past maintenance activities that have been performed on the equipment.</li>



<li><strong>Failure history data</strong> includes information about past equipment failures, such as the date of the failure, the cause of the failure, and the impact on operations.</li>
</ul>



<p class="wp-block-paragraph">Collecting these data requires investing in sensors and other data collection infrastructure and ensuring that data collection is accurate and storage is proper. By combining various data types, organizations can create a comprehensive view of equipment condition and performance and use it to predict maintenance requirements.</p>



<p class="wp-block-paragraph">The specific types of data needed will depend on the implementation approach. Organizations must ensure they have access to the necessary data to implement the selected approach effectively. Some specific data requirements for each approach include the following:</p>



<figure class="wp-block-table"><table><thead><tr><th>Approach</th><th>Data Requirements</th></tr></thead><tbody><tr><td>Condition-based monitoring</td><td>Sensor data from the equipment being monitored. </td></tr><tr><td>Predictive modeling</td><td>A combination of sensor data, operational data, and maintenance records. </td></tr><tr><td>Prognostic algorithms</td><td>Sensor data, as well as data about past failures and maintenance events. </td></tr></tbody></table><figcaption class="wp-element-caption">Data requirements per implementation approach</figcaption></figure>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img decoding="async" width="1018" height="856" data-attachment-id="12379" data-permalink="https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/jejimga_a_factory_using_technology_for_safety_efficiency_qualit_5daef8a5-5ab0-49d2-9821-4588049635a2-min/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/02/jejimga_a_factory_using_technology_for_safety_efficiency_qualit_5daef8a5-5ab0-49d2-9821-4588049635a2-min.png" data-orig-size="1018,856" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="jejimga_a_factory_using_technology_for_safety_efficiency_qualit_5daef8a5-5ab0-49d2-9821-4588049635a2-min" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/02/jejimga_a_factory_using_technology_for_safety_efficiency_qualit_5daef8a5-5ab0-49d2-9821-4588049635a2-min.png" src="https://www.relataly.com/wp-content/uploads/2023/02/jejimga_a_factory_using_technology_for_safety_efficiency_qualit_5daef8a5-5ab0-49d2-9821-4588049635a2-min.png" alt="Predictive maintenance - Machine learning can make maintenance cycles more cost-efficient. Image generated using Midjourney" class="wp-image-12379" srcset="https://www.relataly.com/wp-content/uploads/2023/02/jejimga_a_factory_using_technology_for_safety_efficiency_qualit_5daef8a5-5ab0-49d2-9821-4588049635a2-min.png 1018w, https://www.relataly.com/wp-content/uploads/2023/02/jejimga_a_factory_using_technology_for_safety_efficiency_qualit_5daef8a5-5ab0-49d2-9821-4588049635a2-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/02/jejimga_a_factory_using_technology_for_safety_efficiency_qualit_5daef8a5-5ab0-49d2-9821-4588049635a2-min.png 768w" sizes="(max-width: 1018px) 100vw, 1018px" /><figcaption class="wp-element-caption">Predictive maintenance &#8211; Machine learning can make maintenance cycles more cost-efficient. Image generated using&nbsp;<a href="http://www.Midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a></figcaption></figure>
</div>
</div>



<h2 class="wp-block-heading">Predicting Failures in Milling Machines using XGBoost in Python</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">Now that we have a basic understanding of predictive maintenance, it&#8217;s time to get hands-on with Python. We will use sensor data and machine learning to predict failures in milling machines. But why do these machines break down in the first place? Milling machines have many moving parts that can suffer from wear and tear over time, leading to failures. Additionally, improper maintenance can cause issues with machine operation and lead to costly damage. Efficient maintenance can be challenging due to the varying loads that milling machines are subjected to. However, by implementing a predictive maintenance solution with Python, we can proactively identify and address issues to prevent costly downtime and ensure the smooth operation of our milling machines. Our goal is to predict one of five failure types, which corresponds to a predictive modeling approach. Let&#8217;s get started on building our predictive maintenance solution.</p>



<p class="wp-block-paragraph">The code is available on the GitHub repository.</p>



<div class="wp-block-kadence-advancedbtn kb-buttons-wrap kb-btns_c6038a-a9"><a class="kb-button kt-button button kb-btn_614436-7b kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit kt-btn-has-text-true kt-btn-has-svg-true wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-tutorials/blob/master/02%20Classification/022%20Predicting%20Machine%20Malfunction%20of%20Milling%20Machines%20in%20Python.ipynb" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fe_eye kt-btn-icon-side-left"><svg viewBox="0 0 24 24"  fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M1 12s4-8 11-8 11 8 11 8-4 8-11 8-11-8-11-8z"/><circle cx="12" cy="12" r="3"/></svg></span><span class="kt-btn-inner-text">View on GitHub </span></a>

<a class="kb-button kt-button button kb-btn_a119d2-89 kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit kt-btn-has-text-true kt-btn-has-svg-true wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-API-tutorials" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fa_github kt-btn-icon-side-left"><svg viewBox="0 0 496 512"  fill="currentColor" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg></span><span class="kt-btn-inner-text">Relataly GitHub Repo </span></a></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="12384" data-permalink="https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/cnc_milling_machine_cyberpunk/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/02/cnc_milling_machine_cyberpunk.png" data-orig-size="253,253" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="cnc_milling_machine_cyberpunk" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/02/cnc_milling_machine_cyberpunk.png" src="https://www.relataly.com/wp-content/uploads/2023/02/cnc_milling_machine_cyberpunk.png" alt="Image of a CNC milling machine. Image created with Midjourney" class="wp-image-12384" width="375" height="375" srcset="https://www.relataly.com/wp-content/uploads/2023/02/cnc_milling_machine_cyberpunk.png 253w, https://www.relataly.com/wp-content/uploads/2023/02/cnc_milling_machine_cyberpunk.png 140w" sizes="(max-width: 375px) 100vw, 375px" /><figcaption class="wp-element-caption">Image of a CNC milling machine. Image created with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a></figcaption></figure>
</div>
</div>



<h3 class="wp-block-heading">Prerequisites</h3>



<p class="wp-block-paragraph">Before starting the coding part, make sure that you have set up your <a href="https://www.python.org/downloads/" target="_blank" rel="noreferrer noopener">Python 3</a> environment and required packages. </p>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<p class="wp-block-paragraph"><strong>Python Environment</strong></p>



<p class="wp-block-paragraph">Before diving into the FairLearn Python tutorial, it is important to take the necessary steps to ensure that your Python environment is properly set up and that you have all the required packages installed. This will ensure a seamless learning experience and prevent any potential roadblocks or issues that may arise due to an improperly configured environment.</p>



<p class="wp-block-paragraph">If you don&#8217;t have an environment, follow&nbsp;<a href="https://www.relataly.com/anaconda-python-environment-machine-learning/1663/" target="_blank" rel="noreferrer noopener">this tutorial</a>&nbsp;to set up the&nbsp;<a href="https://www.anaconda.com/products/individual" target="_blank" rel="noreferrer noopener">Anaconda environment</a>.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<p class="wp-block-paragraph"><strong>Python Packages</strong></p>



<p class="wp-block-paragraph">Make sure you install all required packages. In this tutorial, we will be working with the following packages:&nbsp;</p>



<ul class="wp-block-list">
<li>Pandas</li>



<li>NumPy</li>



<li>Matplotlib</li>



<li>Seaborn</li>



<li>Plotly</li>
</ul>



<p class="wp-block-paragraph">In addition, we will be using the machine learning library <strong><em>Scikit-learn</em></strong> and the XGBoost library, which is a popular library for training gradient-boosting models.</p>



<p class="wp-block-paragraph">You can install packages using console commands:</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">pip install &lt;package name&gt;
conda install &lt;package name&gt; (if you are using the anaconda packet manager)</pre></div>
</div>
</div>



<h3 class="wp-block-heading">About the Sensor Dataset</h3>



<p class="wp-block-paragraph">In this tutorial, we will work with a synthetic sensor dataset from the <a href="https://archive.ics.uci.edu/ml/datasets/AI4I+2020+Predictive+Maintenance+Dataset" target="_blank" rel="noreferrer noopener">UCL ML archives</a> that simulates the typical life cycle of a milling machine. The dataset contains the following fields:</p>



<p class="wp-block-paragraph">The dataset consists of 10 000 data points stored as rows with 14 features in columns:</p>



<ul class="wp-block-list">
<li>UID: unique identifier ranging from 1 to 10000</li>



<li>productID: consisting of a letter L, M, or H for low (50% of all products), medium (30%), and high (20%) as product quality variants and a variant-specific serial number</li>



<li>air temperature [K]</li>



<li>process temperature [K]</li>



<li>rotational speed [rpm]</li>



<li>torque [Nm]</li>



<li>tool wear [min]</li>



<li>machine failure. A label that indicates whether the machine has failed or not</li>



<li>Failure type (prediction label). The label contains five failure types: tool wear failure (TWF), heat dissipation failure (HDF), power failure (PWF), overstrain failure (OSF), random failures (RNF)</li>
</ul>



<p class="wp-block-paragraph">Source: <a href="https://archive.ics.uci.edu/ml/datasets/AI4I+2020+Predictive+Maintenance+Dataset" target="_blank" rel="noreferrer noopener">UCL ML Repository</a></p>



<p class="wp-block-paragraph">You can download the dataset from <a href="https://www.kaggle.com/code/potongpasir/predicting-machine-malfunction/data" target="_blank" rel="noreferrer noopener">Kaggle.com</a>. Unzip the file predictive_maintenance.csv and save it under the following file path: &#8220;/data/iot/classification/&#8221;</p>



<h3 class="wp-block-heading">Step #1 Load the Data</h3>



<p class="wp-block-paragraph">We begin by importing the required libraries. This also includes the XGBoost library, which is a popular library for training gradient-boosting models. In addition, we will load the dataset using the pandas library. Then we define our target variable as Failure Type. The dataset contains a second target column, which only contains the binary information of machine failures. We will drop this column, as our goal is to predict the specific type of failure. Then we print the first three rows of the loaded dataset. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># A tutorial for this file is available at www.relataly.com
# Tested with Python 3.9.13, Matplotlib 3.6.2, Scikit-learn 1.2, Seaborn 0.12.1, numpy 1.21.5, xgboost 1.7.2

import pandas as pd 
import matplotlib.pyplot as plt 
import numpy as np
import seaborn as sns
import plotly.express as px
sns.set_style('white', { 'axes.spines.right': False, 'axes.spines.top': False})
from sklearn.metrics import classification_report, confusion_matrix, precision_recall_fscore_support as score, roc_curve
from sklearn.model_selection import cross_val_score, train_test_split, cross_validate
from sklearn.utils import compute_sample_weight
from xgboost import XGBClassifier

# load the train data
path = '/data/iot/classification/'
df = pd.read_csv(path + &quot;predictive_maintenance.csv&quot;) 

# define the target
target_name='Failure Type'

# drop a redundant columns
df.drop(columns=['Target'], inplace=True)

# print a summary of the train data
print(df.shape[0])
df.head(3)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">	UDI	Product ID	Type	Air temperature [K]	Process temperature [K]	Rotational speed [rpm]	Torque [Nm]	Tool wear [min]	Failure Type
0	1	M14860		M		298.1				308.6				1551						42.8		0				No Failure
1	2	L47181		L		298.2				308.7				1408						46.3		3				No Failure
2	3	L47182		L		298.1				308.5				1498						49.4		5				No Failure</pre></div>



<h3 class="wp-block-heading">Step #2 Clean the Data</h3>



<p class="wp-block-paragraph">Next, we quickly check the data quality of our dataset. The following code block checks if there are any missing values in our dataset. If there are missing values, it creates a barplot showing the number of missing values for each column, along with the percentage of missing values. If there are no missing values, it prints a message saying &#8220;no missing values.&#8221;</p>



<p class="wp-block-paragraph">The function then drops any columns with more than 5% missing values from the DataFrame. Finally, it prints the names of the remaining columns in the DataFrame. This function can be used to identify and handle missing values in a dataset before applying machine learning algorithms to it.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># check for missing values
def print_missing_values(df):
    null_df = pd.DataFrame(df.isna().sum(), columns=['null_values']).sort_values(['null_values'], ascending=False)
    fig = plt.subplots(figsize=(16, 6))
    ax = sns.barplot(data=null_df, x='null_values', y=null_df.index, color='royalblue')
    pct_values = [' {:g}'.format(elm) + ' ({:.1%})'.format(elm/len(df)) for elm in list(null_df['null_values'])]
    ax.set_title('Overview of missing values')
    ax.bar_label(container=ax.containers[0], labels=pct_values, size=12)

if df.isna().sum().sum() &gt; 0:
    print_missing_values(df)
else:
    print('no missing values')

# drop all columns with more than 5% missing values
for col_name in df.columns:
    if df[col_name].isna().sum()/df.shape[0] &gt; 0.05:
        df.drop(columns=[col_name], inplace=True) 

df.columns</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">no missing values
Index(['UDI', 'Product ID', 'Type', 'Air temperature [K]',
       'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]',
       'Tool wear [min]', 'Failure Type'],
      dtype='object')</pre></div>



<p class="wp-block-paragraph">Next, we will drop two unnecessary columns and rename the remaining ones to make them easier to work with. The original column names are quite long and contain special characters that could cause errors during the training process. Once the columns are renamed, we will print the updated DataFrame to verify the changes.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># drop id columns
df_base = df.drop(columns=['Product ID', 'UDI'])

# adjust column names
df_base.rename(columns={'Air temperature [K]': 'air_temperature', 
                        'Process temperature [K]': 'process_temperature', 
                        'Rotational speed [rpm]':'rotational_speed', 
                        'Torque [Nm]': 'torque', 
                        'Tool wear [min]': 'tool_wear'}, inplace=True)
df_base.head()</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">	Type	air_temperature	process_temperature	rotational_speed	torque	tool_wear	Failure Type
0	M		298.1			308.6				1551				42.8	0			No Failure
1	L		298.2			308.7				1408				46.3	3			No Failure
2	L		298.1			308.5				1498				49.4	5			No Failure
3	L		298.2			308.6				1433				39.5	7			No Failure
4	L		298.2			308.7				1408				40.0	9			No Failure</pre></div>



<p class="wp-block-paragraph">Everything looks as expected: Our dataset contains six features and the target column with the five failure types.</p>



<h3 class="wp-block-heading" id="h-step-3-explore-the-data">Step #3 Explore the Data</h3>



<p class="wp-block-paragraph">Next, let&#8217;s explore the dataset. </p>



<h4 class="wp-block-heading">Target Class Distribution</h4>



<p class="wp-block-paragraph">The following code uses the plotly express library to create a histogram showing the class distribution of the &#8220;Failure Type&#8221; column in a DataFrame called &#8220;df_base.&#8221; The histogram will have one bar for each unique value in the &#8220;Failure Type&#8221; column, and the height of each bar will represent the number of occurrences of that value in the column. This can be useful for understanding the imbalance in the distribution of classes in a classification problem.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># display class distribution of the target variable
px.histogram(df_base, y=&quot;Failure Type&quot;, color=&quot;Failure Type&quot;) </pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="11828" data-permalink="https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/newplot/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/01/newplot.png" data-orig-size="2042,450" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="newplot" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/01/newplot.png" src="https://www.relataly.com/wp-content/uploads/2023/01/newplot-1024x226.png" alt="Target class distribution in our predictive maintenance dataset" class="wp-image-11828" width="1115" height="246" srcset="https://www.relataly.com/wp-content/uploads/2023/01/newplot.png 1024w, https://www.relataly.com/wp-content/uploads/2023/01/newplot.png 300w, https://www.relataly.com/wp-content/uploads/2023/01/newplot.png 768w, https://www.relataly.com/wp-content/uploads/2023/01/newplot.png 1536w, https://www.relataly.com/wp-content/uploads/2023/01/newplot.png 2042w" sizes="(max-width: 1115px) 100vw, 1115px" /></figure>



<p class="wp-block-paragraph">Our dataset is highly imbalanced, with the vast majority of cases having a &#8220;No Failure&#8221; label. If the dataset is highly imbalanced, with a disproportionate number of cases in one class compared to the others, it can impact the performance of machine learning models. This is because imbalanced datasets can lead to models that are biased towards the majority class, and may not perform well on the minority class. In order to improve model performance on imbalanced datasets, we will later adjust the model hyperparameters accordingly. </p>



<h4 class="wp-block-heading">Feature Pairplots</h4>



<p class="wp-block-paragraph">Next, let&#8217;s construct pair plots to explore feature relations with the target variable. Pair plots, also known as scatter plots, are a type of plot that shows the relationship between two variables. In the context of a predictive maintenance dataset, pair plots can be useful for exploring the relationships between different features and the target variable (e.g., the likelihood of a machine failure). By creating pair plots and visualizing the relationships between different features and the target variable, you can gain insights into which features might be most useful for building a predictive model.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># pairplots on failure type
sns.pairplot(df_base, height=2.5, hue='Failure Type')</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="11829" data-permalink="https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/image-3-2/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/01/image-3.png" data-orig-size="1476,1226" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-3" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/01/image-3.png" src="https://www.relataly.com/wp-content/uploads/2023/01/image-3-1024x851.png" alt="feature plot for our predictive maintenance dataset" class="wp-image-11829" width="874" height="726" srcset="https://www.relataly.com/wp-content/uploads/2023/01/image-3.png 1024w, https://www.relataly.com/wp-content/uploads/2023/01/image-3.png 300w, https://www.relataly.com/wp-content/uploads/2023/01/image-3.png 768w, https://www.relataly.com/wp-content/uploads/2023/01/image-3.png 1476w" sizes="(max-width: 874px) 100vw, 874px" /></figure>



<p class="wp-block-paragraph">The pair plots reveal valuable patterns in our features that can inform the predictions of our model. For instance, we see that Power Failures tend to be correlated with torque values that are either close to the maximum or minimum. Such patterns should allow our predictive model to make solid predictions. </p>



<h4 class="wp-block-heading">Feature Correlation</h4>



<p class="wp-block-paragraph">Next, we will look at feature correlation. The following code block creates a heatmap using the seaborn library that shows the correlation between all pairs of columns in a DataFrame called &#8220;df_base&#8221;. The heatmap is plotted using a color scale, with warmer colors indicating stronger correlations and cooler colors indicating weaker correlations. The correlation values are also displayed in the cells of the heatmap, with values ranging from -1 (perfect negative correlation) to 1 (perfect positive correlation). By creating a heatmap, you can quickly see which variables are positively or negatively correlated with each other, and to what degree. This can be helpful for identifying which features might be most useful for building a predictive model.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># correlation plot
plt.figure(figsize=(6,4))
sns.heatmap(df_base.corr(), cbar=True, fmt='.1f', vmax=0.8, annot=True, cmap='Blues')</pre></div>



<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="11830" data-permalink="https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/image-4-2/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/01/image-4.png" data-orig-size="649,506" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-4" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/01/image-4.png" src="https://www.relataly.com/wp-content/uploads/2023/01/image-4.png" alt="Feature correlation for our predictive maintenance dataset" class="wp-image-11830" width="689" height="537" srcset="https://www.relataly.com/wp-content/uploads/2023/01/image-4.png 649w, https://www.relataly.com/wp-content/uploads/2023/01/image-4.png 300w" sizes="(max-width: 689px) 100vw, 689px" /></figure>



<p class="wp-block-paragraph">From the table, it looks like there is a strong positive correlation between &#8220;air_temperature&#8221; and &#8220;process_temperature&#8221; (0.87). This makes sense since a high process temperature will naturally also heat up the air around the machine. In addition, there is a strong negative correlation between &#8220;rotational_speed&#8221; and &#8220;torque&#8221; (-0.87). The other correlations are weaker and closer to 0, indicating weaker relationships.</p>



<p class="wp-block-paragraph">Understanding the correlations between different variables in a dataset can be helpful for building predictive models, as it can give you an idea of which features might be most important for predicting a given target. It can also help you identify any redundant features that might not add much value to your model. Since our dataset only contains six features, we will keep all of them. </p>



<h4 class="wp-block-heading">Feature Boxplots</h4>



<p class="wp-block-paragraph">Box plots are a useful visualization tool for understanding the distribution of values in a dataset. They show the minimum, first quartile, median, third quartile, and maximum values for each group, as well as any outliers. By creating box plots separated by a categorical variable, you can compare the distributions of values between different groups and see if there are any significant differences. This can be useful for identifying trends or patterns in the data that might be useful for building a predictive model.</p>



<p class="wp-block-paragraph">If there are significant differences between the boxplots for different categories, it could be a good sign for building a predictive model. For example, if the boxplots for one category tend to have higher values for a particular feature than the boxplots for another category, it could indicate that the feature is related to the target variable and could be useful for making predictions.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># create histograms for feature columns separated by target column
def create_histogram(column_name):
    plt.figure(figsize=(16,6))
    return px.box(data_frame=df_base, y=column_name, color='Failure Type', points=&quot;all&quot;, width=1200)

create_histogram('air_temperature')</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="11831" data-permalink="https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/newplot-1/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/01/newplot-1.png" data-orig-size="1200,450" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="newplot-1" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/01/newplot-1.png" src="https://www.relataly.com/wp-content/uploads/2023/01/newplot-1-1024x384.png" alt="feature boxplot for different failure types in predictive maintenance dataset. feature: air temperature" class="wp-image-11831" width="1078" height="405" srcset="https://www.relataly.com/wp-content/uploads/2023/01/newplot-1.png 1024w, https://www.relataly.com/wp-content/uploads/2023/01/newplot-1.png 300w, https://www.relataly.com/wp-content/uploads/2023/01/newplot-1.png 768w, https://www.relataly.com/wp-content/uploads/2023/01/newplot-1.png 1200w" sizes="(max-width: 1078px) 100vw, 1078px" /></figure>



<p class="wp-block-paragraph">Feature boxplot for process_temperature.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">create_histogram('process_temperature')</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="11832" data-permalink="https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/newplot-2/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/01/newplot-2.png" data-orig-size="1200,450" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="newplot-2" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/01/newplot-2.png" src="https://www.relataly.com/wp-content/uploads/2023/01/newplot-2-1024x384.png" alt="feature boxplot for different failure types in predictive maintenance dataset. feature: air temperature" class="wp-image-11832" width="1087" height="408" srcset="https://www.relataly.com/wp-content/uploads/2023/01/newplot-2.png 1024w, https://www.relataly.com/wp-content/uploads/2023/01/newplot-2.png 300w, https://www.relataly.com/wp-content/uploads/2023/01/newplot-2.png 768w, https://www.relataly.com/wp-content/uploads/2023/01/newplot-2.png 1200w" sizes="(max-width: 1087px) 100vw, 1087px" /><figcaption class="wp-element-caption">Feature boxplot for rotational speed.</figcaption></figure>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">create_histogram('rotational_speed')</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="11833" data-permalink="https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/newplot-3/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/01/newplot-3.png" data-orig-size="1200,450" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="newplot-3" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/01/newplot-3.png" src="https://www.relataly.com/wp-content/uploads/2023/01/newplot-3-1024x384.png" alt="feature boxplot for different failure types in predictive maintenance dataset. feature: rotational speed" class="wp-image-11833" width="1110" height="417" srcset="https://www.relataly.com/wp-content/uploads/2023/01/newplot-3.png 1024w, https://www.relataly.com/wp-content/uploads/2023/01/newplot-3.png 300w, https://www.relataly.com/wp-content/uploads/2023/01/newplot-3.png 768w, https://www.relataly.com/wp-content/uploads/2023/01/newplot-3.png 1200w" sizes="(max-width: 1110px) 100vw, 1110px" /></figure>



<p class="wp-block-paragraph">Feature boxplot for torque.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">create_histogram('torque')</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="11834" data-permalink="https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/newplot-4/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/01/newplot-4.png" data-orig-size="1200,450" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="newplot-4" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/01/newplot-4.png" src="https://www.relataly.com/wp-content/uploads/2023/01/newplot-4-1024x384.png" alt="feature boxplot for different failure types in predictive maintenance dataset. feature: torque" class="wp-image-11834" width="1082" height="406" srcset="https://www.relataly.com/wp-content/uploads/2023/01/newplot-4.png 1024w, https://www.relataly.com/wp-content/uploads/2023/01/newplot-4.png 300w, https://www.relataly.com/wp-content/uploads/2023/01/newplot-4.png 768w, https://www.relataly.com/wp-content/uploads/2023/01/newplot-4.png 1200w" sizes="(max-width: 1082px) 100vw, 1082px" /></figure>



<p class="wp-block-paragraph">Feature boxplot for tool wear.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">create_histogram('tool_wear')</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="11835" data-permalink="https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/newplot-5/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/01/newplot-5.png" data-orig-size="1200,450" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="newplot-5" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/01/newplot-5.png" src="https://www.relataly.com/wp-content/uploads/2023/01/newplot-5-1024x384.png" alt="feature boxplot for different failure types in predictive maintenance dataset. feature: tool wear" class="wp-image-11835" width="1097" height="411" srcset="https://www.relataly.com/wp-content/uploads/2023/01/newplot-5.png 1024w, https://www.relataly.com/wp-content/uploads/2023/01/newplot-5.png 300w, https://www.relataly.com/wp-content/uploads/2023/01/newplot-5.png 768w, https://www.relataly.com/wp-content/uploads/2023/01/newplot-5.png 1200w" sizes="(max-width: 1097px) 100vw, 1097px" /></figure>



<p class="wp-block-paragraph">Now that we have a good understanding of our dataset, we can prepare the data for model training. </p>



<h3 class="wp-block-heading" id="h-step-4-data-preparation">Step #4 Data Preparation</h3>



<p class="wp-block-paragraph">To prepare the data for model training, we will need to split our dataset and make additional modifications. </p>



<p class="wp-block-paragraph">The following code block contains a reusable function called data_preparation. The purpose of this function is to prepare the data in a way that is suitable for building and evaluating machine learning models. It performs several preprocessing steps, such as encoding categorical variables and splitting the data into training and test sets. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">def data_preparation(df_base, target_name):
    df = df_base.dropna()

    df['target_name_encoded'] = df[target_name].replace({'No Failure': 0, 'Power Failure': 1, 'Tool Wear Failure': 2, 'Overstrain Failure': 3, 'Random Failures': 4, 'Heat Dissipation Failure': 5})
    df['Type'].replace({'L': 0, 'M': 1, 'H': 2}, inplace=True)
    X = df.drop(columns=[target_name, 'target_name_encoded'])
    y = df['target_name_encoded'] #Prediction label

    # split the data into x_train and y_train data sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, random_state=0)

    # print the shapes: the result is: (rows, training_sequence, features) (prediction value, )
    print('train: ', X_train.shape, y_train.shape)
    print('test: ', X_test.shape, y_test.shape)
    return X, y, X_train, X_test, y_train, y_test

# remove target from training data
X, y, X_train, X_test, y_train, y_test = data_preparation(df_base, target_name)</pre></div>



<h3 class="wp-block-heading" id="h-step-5-model-training">Step #5 Model Training</h3>



<p class="wp-block-paragraph">Now that we have prepared the dataset, we can train the XGBoost classification model. The basic idea behind XGBoost is to train a series of weak models, such as decision trees, and then combine their predictions using gradient boosting. During training, XGBoost uses an optimization algorithm to adjust the weight of each model in the ensemble in order to improve the overall prediction accuracy. XGBoost also includes a number of additional features and techniques that help to improve the performance of the model, such as regularization, feature selection, and handling missing values.</p>



<p class="wp-block-paragraph">XGboost provides several configuration options that we can use to finetune performance and adjust the training process to our dataset. For a complete list of hyperparameters, please see the <a href="https://xgboost.readthedocs.io/en/stable/python/index.html" target="_blank" rel="noreferrer noopener">library documentation</a>.</p>



<p class="wp-block-paragraph">Remember that our class labels are imbalanced. Therefore, we will provide the model with sample weights. The following code creates a weight array for the training and test sets using the &#8220;compute_sample_weight&#8221; function from scikit-learn. We calculate the weight array based on the &#8220;balanced&#8221; mode. This means that the weights are calculated such that the class distribution in the sample is balanced. This can be useful when working with imbalanced datasets, as it helps to mitigate the effects of class imbalance on the model.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">weight_train = compute_sample_weight('balanced', y_train)
weight_test = compute_sample_weight('balanced', y_test)

xgb_clf = XGBClassifier(booster='gbtree', 
                        tree_method='gpu_hist', 
                        sampling_method='gradient_based', 
                        eval_metric='aucpr', 
                        objective='multi:softmax', 
                        num_class=6)
# fit the model to the data
xgb_clf.fit(X_train, y_train.ravel(), sample_weight=weight_train)</pre></div>



<figure class="wp-block-image size-full"><img decoding="async" width="842" height="270" data-attachment-id="11836" data-permalink="https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/image-5-3/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/01/image-5.png" data-orig-size="842,270" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-5" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/01/image-5.png" src="https://www.relataly.com/wp-content/uploads/2023/01/image-5.png" alt="summary of our XGBoost classifier of our predictive maintenance solution" class="wp-image-11836" srcset="https://www.relataly.com/wp-content/uploads/2023/01/image-5.png 842w, https://www.relataly.com/wp-content/uploads/2023/01/image-5.png 300w, https://www.relataly.com/wp-content/uploads/2023/01/image-5.png 768w" sizes="(max-width: 842px) 100vw, 842px" /></figure>



<p class="wp-block-paragraph">We can see that the blue box summarizes the configuration of our model and indicates that the training process has been successful. Now that we have the classifier, we can use it to make predictions on new data.</p>



<h3 class="wp-block-heading" id="h-step-6-model-evaluation">Step #6 Model Evaluation</h3>



<p class="wp-block-paragraph">Finally, we will evaluate the model&#8217;s performance. This will involve three steps:</p>



<ul class="wp-block-list">
<li>Model scoring</li>



<li>Cross-validation</li>



<li>Confusion matrix</li>
</ul>



<h4 class="wp-block-heading">Model Scoring</h4>



<p class="wp-block-paragraph">First, we calculate the accuracy of the classifier on the test set using the &#8220;score&#8221; method. To account for the imbalance of class labels, we pass in the weight array for the test set as an additional parameter. This returns the fraction of correct predictions made by the classifier. Next, the code uses the classifier to make predictions on the test set using the &#8220;predict&#8221; method. It then generates a classification report using the &#8220;classification_report&#8221; function from scikit-learn. The report displays a summary of the model&#8217;s performance in terms of various evaluation metrics such as precision, recall, and f1-score.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># score the model with the test dataset
score = xgb_clf.score(X_test, y_test.ravel(), sample_weight=weight_test)

# predict on the test dataset
y_pred = xgb_clf.predict(X_test)

# print a classification report
results_log = classification_report(y_test, y_pred)
print(results_log)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">precision    recall  f1-score   support

           0       0.99      0.98      0.99      2903
           1       0.64      0.88      0.74        24
           2       0.04      0.08      0.06        12
           3       0.77      0.89      0.83        27
           4       0.00      0.00      0.00         4
           5       0.76      0.97      0.85        30

    accuracy                           0.98      3000
   macro avg       0.53      0.63      0.58      3000
weighted avg       0.98      0.98      0.98      3000</pre></div>



<p class="wp-block-paragraph">The classification report shows the performance of our XGBoost classifier on the test dataset. The model appears to perform well, with a high accuracy of 0.98 and a high weighted average f1-score of 0.98. </p>



<p class="wp-block-paragraph">However, there are a few classes where the model&#8217;s performance is not as strong. Class 1 has a relatively low precision of 0.64 and a low f1-score of 0.74, while class 2 has a very low precision of 0.04 and a low f1-score of 0.06. Class 4 has a precision and f1-score of 0.00, which suggests that the model is not making any correct predictions for this class.</p>



<p class="wp-block-paragraph">It is also worth noting that the support for some classes is much lower than for others. Class 1 has a support of 24, while class 0 has a support of 2903. This is due to the fact that there are relatively few instances of class 1 in the test dataset compared to class 0, which affects the model&#8217;s performance on class 1.</p>



<h4 class="wp-block-heading">Confusion Matrix</h4>



<p class="wp-block-paragraph">Next, we create a confusion matrix. We input the true labels of the test set (y_test) and the predicted labels produced by the model (y_pred) to generate the matrix. The matrix shows us the number of correct and incorrect predictions made by the model for each class.</p>



<p class="wp-block-paragraph">We then create a DataFrame from the confusion matrix and use the seaborn library to visualize the matrix as a heatmap. The heatmap allows us to easily see which classes are being predicted correctly and which are being misclassified. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># create predictions on the test dataset
y_pred = xgb_clf.predict(X_test)

# print a multi-Class Confusion Matrix
cnf_matrix = confusion_matrix(y_test, y_pred)
df_cm = pd.DataFrame(cnf_matrix, columns=np.unique(y_test), index=np.unique(y_test))
df_cm.index.name = 'Actual'
df_cm.columns.name = 'Predicted'
plt.figure(figsize = (8, 5))
sns.set(font_scale=1.1) #for label size
sns.heatmap(df_cm, cbar=True, cmap= &quot;inferno&quot;, annot=True, fmt='.0f') </pre></div>



<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="11837" data-permalink="https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/image-6-2/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/01/image-6.png" data-orig-size="668,456" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-6" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/01/image-6.png" src="https://www.relataly.com/wp-content/uploads/2023/01/image-6.png" alt="Evaluating the performance of our predictive maintenance solution using a confusion matrix" class="wp-image-11837" width="663" height="452" srcset="https://www.relataly.com/wp-content/uploads/2023/01/image-6.png 668w, https://www.relataly.com/wp-content/uploads/2023/01/image-6.png 300w" sizes="(max-width: 663px) 100vw, 663px" /></figure>



<p class="wp-block-paragraph">The color scale of the heatmap indicates the magnitude of the values in the matrix. In this case, the darker the color, the higher the number of predictions. This visualization helps us to understand the performance of the model and identify areas for improvement. </p>



<p class="wp-block-paragraph">Here are a few things that we can learn from this matrix:</p>



<ul class="wp-block-list">
<li>The model made a total of 2902 correct predictions and 67 incorrect predictions.</li>



<li>For the &#8220;No Failure&#8221; class, the model made 2854 correct predictions and 29 incorrect predictions. The majority of the incorrect predictions were false negatives.</li>



<li>For the &#8220;Power Failure&#8221; class, the model made 21 correct predictions and three incorrect predictions. </li>



<li>For the &#8220;Tool Wear Failure&#8221; class, the model made 1 correct prediction and 1 incorrect prediction. </li>



<li>For the &#8220;Overstrain Failure&#8221; class, the model made 24 correct predictions and 2 incorrect predictions. </li>



<li>For the &#8220;Random Failures&#8221; class, the model made 29 correct predictions and 4 incorrect predictions. </li>



<li>For the &#8220;Heat Dissipation Failure&#8221; class, the model made 29 correct predictions and 1 incorrect prediction. </li>
</ul>



<p class="wp-block-paragraph">Overall, the model seems to be performing relatively well, but it is making a lot of false negatives for some classes. </p>



<h4 class="wp-block-heading">Cross Validation</h4>



<p class="wp-block-paragraph">Finally, we perform cross-validation on the training set using the &#8220;cross_validate&#8221; function from scikit-learn. Cross-validation is a technique for evaluating the performance of a machine learning model by training it on different subsets of the data and evaluating it on the remaining data. </p>



<p class="wp-block-paragraph">In this case, we will train and evaluate our model 10 times using different splits of the data (specified by the &#8220;cv&#8221; parameter). We also specify that the evaluation metric should be the weighted f1-score (specified by the &#8220;scoring&#8221; parameter). We then pass the weight array for the training set to the classifier.</p>



<p class="wp-block-paragraph">The &#8220;cross_validate&#8221; function returns a dictionary containing various evaluation metrics for each fold of the cross-validation. We will convert the dictionary to a DataFrame and create a bar plot using the plotly express library to visualize the results. This helps us to understand the consistency and stability of the model&#8217;s performance.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># cross validation
scores  = cross_validate(xgb_clf, X_train, y_train, cv=10, scoring=&quot;f1_weighted&quot;, fit_params={ &quot;sample_weight&quot; :weight_train})
scores_df = pd.DataFrame(scores)
px.bar(x=scores_df.index, y=scores_df.test_score, width=800)</pre></div>



<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="11838" data-permalink="https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/newplot-6/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/01/newplot-6.png" data-orig-size="800,450" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="newplot-6" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/01/newplot-6.png" src="https://www.relataly.com/wp-content/uploads/2023/01/newplot-6.png" alt="Evaluation the performance of our predictive maintenance solution. cross validation scores for the XGBoost model. " class="wp-image-11838" width="644" height="362" srcset="https://www.relataly.com/wp-content/uploads/2023/01/newplot-6.png 800w, https://www.relataly.com/wp-content/uploads/2023/01/newplot-6.png 300w, https://www.relataly.com/wp-content/uploads/2023/01/newplot-6.png 768w" sizes="(max-width: 644px) 100vw, 644px" /></figure>



<p class="wp-block-paragraph">The model performance remains consistent across all folds. </p>



<h2 class="wp-block-heading">Summary</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">In this article, we have presented the concept of predictive maintenance and demonstrated how organizations can use this approach to improve their maintenance cycles. The second part of the article provided a hands-on tutorial showing how to implement a predictive maintenance solution for predicting different failure types of a milling machine. We trained a classification model using the XGBoost algorithm and sensor data from the machine. </p>



<p class="wp-block-paragraph">While the model demonstrated good performance overall, we observed that it was not able to predict all classes with the same level of accuracy. This suggests that there may be opportunities to improve the model&#8217;s performance. One potential approach is to balance the dataset by up or down-sampling the data to achieve a more even distribution of classes. By doing so, we can mitigate the effects of class imbalance and potentially improve the model&#8217;s predictions for all classes.</p>



<p class="wp-block-paragraph">By implementing such a predictive maintenance approach, organizations can improve their operational efficiency and ensure the smooth running of their machinery.</p>



<p class="wp-block-paragraph">I hope this article was helpful. If you have any questions or feedback, let me know in the comments. </p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img decoding="async" width="497" height="493" data-attachment-id="12901" data-permalink="https://www.relataly.com/smart-factory-iot-sensors-relataly-midjourney-min/" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/03/smart-factory-iot-sensors-relataly-midjourney-min.png" data-orig-size="497,493" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="smart factory iot sensors relataly midjourney-min" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/smart-factory-iot-sensors-relataly-midjourney-min.png" src="https://www.relataly.com/wp-content/uploads/2023/03/smart-factory-iot-sensors-relataly-midjourney-min.png" alt="" class="wp-image-12901" srcset="https://www.relataly.com/wp-content/uploads/2023/03/smart-factory-iot-sensors-relataly-midjourney-min.png 497w, https://www.relataly.com/wp-content/uploads/2023/03/smart-factory-iot-sensors-relataly-midjourney-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/03/smart-factory-iot-sensors-relataly-midjourney-min.png 140w" sizes="(max-width: 497px) 100vw, 497px" /><figcaption class="wp-element-caption">Predictive maintenance also plays an essential role in a smart factory. Image created with Midjourney. </figcaption></figure>
</div>
</div>



<p class="wp-block-paragraph"></p>



<h2 class="wp-block-heading">Sources and Further Reading</h2>



<p class="wp-block-paragraph">There are many books available on the topics of IoT and predictive maintenance. Here are a few recommendations:</p>



<ul class="wp-block-list">
<li><a href="https://amzn.to/3XgrX7L" target="_blank" rel="noreferrer noopener">An Introduction to Predictive Maintenance</a> by R Keith Mobley</li>



<li><a href="https://amzn.to/3CzYL3A" target="_blank" rel="noreferrer noopener">Predictive Analytics: The Secret to Predicting Future Events Using Big Data and Data Science Techniques Such as Data Mining, Predictive Modelling, Statistics, Data Analysis, and Machine</a> by Richard Hurley</li>



<li>Stephan Matzka, <a href="https://ieeexplore.ieee.org/document/9253083" target="_blank" rel="noreferrer noopener">Explainable Artificial Intelligence for Predictive Maintenance Applications</a>, Third International Conference on Artificial Intelligence for Industries (AI4I 2020)</li>



<li><a href="https://amzn.to/3TrBdDY" target="_blank" rel="noreferrer noopener">David Forsyth (2019) Applied Machine Learning Springer</a></li>



<li>ChatGPT was used to revise certain parts of this article</li>



<li>Images created using Midjourney and OpenAI Dall-E</li>
</ul>



<p class="has-contrast-2-color has-base-3-background-color has-text-color has-background wp-block-paragraph"><em>The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.</em></p>
<p>The post <a href="https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/">Predictive Maintenance: Predicting Machine Failure using Sensor Data with XGBoost and Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.relataly.com/predictive-maintenance-predicting-machine-failure-with-python/10618/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">10618</post-id>	</item>
		<item>
		<title>Univariate Stock Market Forecasting using Facebook Prophet in Python</title>
		<link>https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/</link>
					<comments>https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/#comments</comments>
		
		<dc:creator><![CDATA[Florian Follonier]]></dc:creator>
		<pubDate>Thu, 15 Dec 2022 22:54:34 +0000</pubDate>
				<category><![CDATA[CryptoCompare API]]></category>
		<category><![CDATA[Facebook Prophet]]></category>
		<category><![CDATA[Finance]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[REST APIs]]></category>
		<category><![CDATA[Seaborn]]></category>
		<category><![CDATA[Stock Market Forecasting]]></category>
		<category><![CDATA[Time Series Forecasting]]></category>
		<category><![CDATA[Use Cases]]></category>
		<category><![CDATA[Yahoo Finance API]]></category>
		<category><![CDATA[AI in Finance]]></category>
		<category><![CDATA[Intermediate Tutorials]]></category>
		<category><![CDATA[Supervised Learning]]></category>
		<guid isPermaLink="false">https://www.relataly.com/?p=10351</guid>

					<description><![CDATA[<p>Have you ever wondered how Facebook predicts the future? Meet Facebook Prophet, the open-source time series forecasting tool developed by Facebook&#8217;s Core Data Science team. Built on top of the PyStan library, Facebook Prophet offers a simple and intuitive interface for creating forecasts using historical data. What sets Facebook Prophet apart is its highly modular ... <a title="Univariate Stock Market Forecasting using Facebook Prophet in Python" class="read-more" href="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/" aria-label="Read more about Univariate Stock Market Forecasting using Facebook Prophet in Python">Read more</a></p>
<p>The post <a href="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/">Univariate Stock Market Forecasting using Facebook Prophet in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">Have you ever wondered how Facebook predicts the future? Meet Facebook Prophet, the open-source time series forecasting tool developed by Facebook&#8217;s Core Data Science team. Built on top of the PyStan library, Facebook Prophet offers a simple and intuitive interface for creating forecasts using historical data. What sets Facebook Prophet apart is its highly modular design, allowing for a range of customizable components that can be combined to create a wide variety of forecasting models. This makes it perfect for modeling data with strong seasonal effects, like daily or weekly patterns, and it can handle missing data and outliers with ease. In this tutorial, we will take a closer look at the capabilities of Facebook Prophet and see how it can be used to make accurate predictions.</p>



<p class="wp-block-paragraph">We begin with a brief discussion of how the Facebook Prophet decomposes a time series into different components. Then we turn to the hands-on part. you can use its model in Python to generate a stock market forecast. We will train our Facebook Prophet model using the historical price of the Coca-Cola stock. We will also cover different options to customize the model settings.</p>



<p class="has-accent-color has-text-color has-background wp-block-paragraph" style="background:linear-gradient(135deg,rgb(255,206,236) 68%,rgba(150,149,240,0.4) 100%)"><strong>Disclaimer</strong>: This article does not constitute financial advice. Stock markets can be very volatile and are generally difficult to predict. Predictive models and other forms of analytics applied in this article only illustrate machine learning use cases.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="10371" data-permalink="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/image-31-7/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-31.png" data-orig-size="474,143" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-31" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-31.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-31.png" alt="Facebook Prophet - an open-source tool for univariate time series forecasting" class="wp-image-10371" width="380" height="113"/><figcaption class="wp-element-caption">Facebook Prophet &#8211; an open-source tool for time series forecasting</figcaption></figure>
</div>
</div>



<div style="height:34px" aria-hidden="true" class="wp-block-spacer"></div>



<h2 class="wp-block-heading">What is Facebook Prophet?</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">Facebook Prophet is a tool that can be used to make predictions about future events based on historical data. It was developed by <a href="https://peerj.com/preprints/3190/" target="_blank" rel="noreferrer noopener">Taylor and Letham, 2017</a>, who later made it available as an open-source project. The authors developed Facebook Prophet to solve various business forecasting problems without requiring much prior knowledge. In this way, the framework addresses a significant problem many companies face today. They have various prediction problems (e.g., capacity and demand forecasting) but face a skill gap when it comes to generating reliable forecasts with techniques such as ARIMA or neural networks. Compared to that, Facebook Prophet requires minimal fine-tuning and can deal with various challenges, including seasonality, outliers, and changing trend lines. This allows Facebook Prophet to handle a wide range of forecasting problems flexibly. Before we dive into the hands-on part, let&#8217;s gain a quick overview of how Facebook Prophet works.</p>



<p class="wp-block-paragraph">Also: <a href="https://www.relataly.com/stock-market-prediction-using-multivariate-time-series-in-python/1815/" target="_blank" rel="noreferrer noopener">Stock Market Prediction using Multivariate Time Series</a></p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="1024" data-attachment-id="12356" data-permalink="https://www.relataly.com/an_ancient_prophet_looking_into_a_crystal_ball/" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/02/an_ancient_prophet_looking_into_a_crystal_ball.png" data-orig-size="1024,1024" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="an_ancient_prophet_looking_into_a_crystal_ball" data-image-description="&lt;p&gt;time series forecasting with facebook prophet python tutorial&lt;/p&gt;
" data-image-caption="&lt;p&gt;time series forecasting with facebook prophet python tutorial&lt;/p&gt;
" data-large-file="https://www.relataly.com/wp-content/uploads/2023/02/an_ancient_prophet_looking_into_a_crystal_ball.png" src="https://www.relataly.com/wp-content/uploads/2023/02/an_ancient_prophet_looking_into_a_crystal_ball-1024x1024.png" alt="time series forecasting with facebook prophet python tutorial" class="wp-image-12356" srcset="https://www.relataly.com/wp-content/uploads/2023/02/an_ancient_prophet_looking_into_a_crystal_ball.png 1024w, https://www.relataly.com/wp-content/uploads/2023/02/an_ancient_prophet_looking_into_a_crystal_ball.png 300w, https://www.relataly.com/wp-content/uploads/2023/02/an_ancient_prophet_looking_into_a_crystal_ball.png 140w, https://www.relataly.com/wp-content/uploads/2023/02/an_ancient_prophet_looking_into_a_crystal_ball.png 768w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">Time-series forecasting with Facebook Prophet. Image generated with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a>.</figcaption></figure>
</div>
</div>



<h3 class="wp-block-heading">How Facebook Prophet Works</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">Facebook Prophet uses a technique called additive regression to model time series data. This involves breaking the time series into a series of components:</p>



<ul class="wp-block-list">
<li>Trends</li>



<li>Seasonality</li>



<li>Holiday</li>
</ul>



<p class="wp-block-paragraph">Traditional time series <a href="https://www.relataly.com/category/machine-learning-algorithms/arima-models/" target="_blank" rel="noreferrer noopener">methods such as (S)ARIMA</a> base their prediction on a model that weights the linear sum of past observations or lags. Facebook&#8217;s Prophet is similar in that it uses a decreasing weight for past observations. This means current observations have a higher significance for the model than those that date back a long time. It then models each component separately using a combination of linear and non-linear functions. Finally, Facebook Prophet combines these components to form the complete forecast model. Let&#8217;s take a closer look at these components and how Facebook Prophet handles them.</p>



<h4 class="wp-block-heading">A) Dealing with Trends</h4>



<p class="wp-block-paragraph">Time series often have a trendline. However, even more often, a time series will not follow a single trend, but it has several trend components that are separated by breakpoints. Facebook Prophet tries to handle these trends in several ways. First, the model tries to identify the breakpoints (knots) in a time series that divide different periods. Each breakpoint separates two periods with different trendlines. Facebook Prophet then uses these inflection points between periods to fit the model to the data and create the forecast.  In addition, trendlines do not have to be linear but can also be logarithmic. This is all done automatically, but it is also possible to specify breakpoints manually.</p>



<h4 class="wp-block-heading">B) Seasonality</h4>



<p class="wp-block-paragraph">Facebook Prophet works very well when the data shows a strong seasonal pattern. It uses Fourier transformations (adding different sine and cosine frequencies) to account for daily, weekly and yearly seasonality. The Facebook Prophet model is flexible on the type of data you have by allowing you to adjust the seasonal components of your data. By default, Facebook Prophet assumes daily data with weekly and yearly seasonal effects. If your data differentiates from this standard, for example, you have weekly data with monthly seasonality, then you need to adjust the number of terms accordingly.</p>



<h4 class="wp-block-heading">C) Holiday</h4>



<p class="wp-block-paragraph">Every year, public holidays can lead to strong deviations in a time series; for example, thinking of computing power,  demand more people will visit the Facebook website. The Facebook Prophet model also accounts for such special events by allowing us to specify binary indicators that mark whether a certain day is a public holiday. If you have other non-holiday events that occur yearly, you can use this indicator for the same purpose. Usually, Facebook Prophet will automatically remove outliers from the data. But if an outlier occurs on a day highlighted as a public holiday, Facebook Prophet will adjust its model accordingly. </p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<h3 class="wp-block-heading">Hyperparameter Tuning and Customization</h3>



<p class="wp-block-paragraph">Facebook Prophet includes additional optimization techniques, such as Bayesian optimization, to automatically tune the model&#8217;s hyperparameters, such as the length of the seasonal period, to improve its accuracy. Once the model is trained, it can be used to predict future values in the time series. However, users with a strong domain knowledge may prefer to tweak these parameters themselves, and Facebook Prophet provides several functions for this purpose. It also includes a range of tools for model evaluation and diagnostics, as well as for visualizing the model and the input data.</p>



<p class="wp-block-paragraph">Also: <a href="https://www.relataly.com/using-random-search-to-tune-the-hyperparameters-of-a-random-decision-forest-with-python/6875/" target="_blank" rel="noreferrer noopener">Using Random Search to Tune the Hyperparameters of a Random Decision Forest with Python</a> </p>



<h3 class="wp-block-heading">Application Domains</h3>



<p class="wp-block-paragraph">Facebook Prophet is a powerful forecasting tool that has been specifically designed to make forecasting easy. As mentioned, Prophet is easy to use and can flexibly handle various forecasting problems. In addition, it requires very little preprocessing to generate accurate forecasts. As a result of these advantages, Facebook Prophet has been adopted by various application domains. Some possible application domains for Facebook Prophet include:</p>



<ul class="wp-block-list">
<li>Sales forecasting: Facebook Prophet can be used to predict future sales of a product or service, based on historical sales data. This can be useful for businesses to plan their inventory and staffing, and to make informed decisions about future investments and growth.</li>



<li>Financial forecasting: Facebook Prophet can be used to predict future stock prices, currency exchange rates, or other financial metrics. This can be useful for investors and financial analysts to make informed decisions about the market.</li>



<li>Traffic forecasting: Facebook Prophet can be used to predict future traffic on a website or mobile app based on historical data. This can be useful for businesses to plan for capacity and optimize their servers and infrastructure.</li>



<li>Energy consumption forecasting: Facebook Prophet can be used to predict future energy consumption based on historical data. This can be useful for utilities and energy companies to plan for demand and optimize their generation and distribution.</li>
</ul>



<h2 class="wp-block-heading">When to Use Facebook Prophet?</h2>



<p class="wp-block-paragraph">Although Facebook Prophet is applicable in any domain where time series data is available, it is most effective when certain conditions are met. These include univariate time series data with prominent seasonal effects and an extensive historical record spanning multiple seasons. Facebook Prophet is especially beneficial when dealing with large quantities of historical data that require efficient analysis and quick, accurate predictions of future trends.</p>



<p class="wp-block-paragraph">Also: <a href="https://www.relataly.com/multi-step-time-series-forecasting-a-step-by-step-guide/275/" target="_blank" rel="noreferrer noopener">Rolling Time Series Forecasting: Creating a Multi-Step Prediction</a></p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<div style="height:29px" aria-hidden="true" class="wp-block-spacer"></div>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<h2 class="wp-block-heading" id="h-using-facebook-prophet-to-forecast-the-coca-cola-stock-price-in-python">Using Facebook Prophet to Forecast the Coca-Cola Stock Price in Python</h2>



<p class="wp-block-paragraph">In this hands-on tutorial, we&#8217;ll use Facebook Prophet and Python to create a forecast for the Coca-Cola stock price. We have chosen Coca-Cola as an example because the Coca-Cola share is known to be a cyclical stock. As such, its chart reflects a seasonal pattern, different periods, and varying trend lines. We train our model on historical price data and then predict the next data points half-year in advance. In addition, we will discuss how we could finetune our model to improve the accuracy of the predictions further. This involves the following steps:</p>



<ol class="wp-block-list">
<li>Collect historical stock data for CocaCola and familiarize ourselves with the data.</li>



<li>Use Facebook Prophet to fit a model to the data.</li>



<li>Use the model to make predictions about the future stock price of Coca-Cola.</li>



<li>Visualize model components and predictions.</li>



<li>Manually adjust the model to improve the model fit.</li>
</ol>



<p class="wp-block-paragraph">By following these steps, we will try to gain insights into the future performance of Coca-Cola stock. Let&#8217;s get started!</p>



<p class="wp-block-paragraph">As always, you can find the code of this tutorial on the GitHub repository.</p>



<div class="wp-block-kadence-advancedbtn kb-buttons-wrap kb-btns_883036-a5"><a class="kb-button kt-button button kb-btn_c85f7c-32 kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit kt-btn-has-text-true kt-btn-has-svg-true wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-tutorials/blob/master/01%20Time%20Series%20Forecasting%20%26%20Regression/011%20Time%20Series%20Forecasting%20using%20Facebooks&#039;%20Prophet.ipynb" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fe_eye kt-btn-icon-side-left"><svg viewBox="0 0 24 24"  fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M1 12s4-8 11-8 11 8 11 8-4 8-11 8-11-8-11-8z"/><circle cx="12" cy="12" r="3"/></svg></span><span class="kt-btn-inner-text">View on GitHub </span></a>

<a class="kb-button kt-button button kb-btn_db3037-b2 kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit kt-btn-has-text-true kt-btn-has-svg-true wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-tutorials" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fa_github kt-btn-icon-side-left"><svg viewBox="0 0 496 512"  fill="currentColor" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg></span><span class="kt-btn-inner-text">Relataly GitHub Repo </span></a></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h3 class="wp-block-heading" id="h-prerequisites">Prerequisites</h3>



<p class="wp-block-paragraph">Before you proceed, ensure that you have set up your&nbsp;Python&nbsp;environment (3.8 or higher) and the required packages. If you don’t have an environment, consider following&nbsp;<a href="https://www.relataly.com/anaconda-python-environment-machine-learning/1663/" target="_blank" rel="noreferrer noopener">this tutorial</a>&nbsp;to set up the&nbsp;<a href="https://www.anaconda.com/products/individual" target="_blank" rel="noreferrer noopener">Anaconda environment</a>. </p>



<p class="wp-block-paragraph">Also, make sure you install all required Python packages. We will be working with the following standard Python packages:&nbsp;</p>



<ul class="wp-block-list">
<li>pandas</li>



<li>seaborn</li>



<li>matplotlib</li>
</ul>



<p class="wp-block-paragraph">In addition, we will use the Facebook Prophet library that goes by the library name &#8220;prophet.&#8221; You can install these packages using the following commands:</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">pip install &lt;package name&gt;
conda install &lt;package name&gt; (if you are using the anaconda packet manager)</pre></div>



<h3 class="wp-block-heading">Step #1 Loading Packages and API Key</h3>



<p class="wp-block-paragraph">Let&#8217;s begin by loading the required Python packages and historical price quotes for the Coca-Cola stock. We will obtain the data from the yahoo finance API. Note that the API will return several columns of data, including, opening, average, and closing prices. We will only use the closing price. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Tested with Python 3.8.8, Matplotlib 3.5, Seaborn 0.11.1, numpy 1.19.5, plotly 4.1.1, cufflinks 0.17.3, prophet 1.1.1, CmdStan 2.31.0
import pandas as pd 
import matplotlib.pyplot as plt 
import numpy as np 
from math import log, exp 
from datetime import date, timedelta, datetime
import seaborn as sns
sns.set_style('white', {'axes.spines.right': False, 'axes.spines.top': False})
from scipy.stats import norm
from prophet import Prophet
from prophet.plot import add_changepoints_to_plot
import cmdstanpy
cmdstanpy.install_cmdstan()
cmdstanpy.install_cmdstan(compiler=True)
# Setting the timeframe for the data extraction
end_date =  date.today().strftime(&quot;%Y-%m-%d&quot;)
start_date = '2010-01-01'
# Getting quotes
stockname = 'Coca Cola'
symbol = 'KO'
# You can either use webreader or yfinance to load the data from yahoo finance
# import pandas_datareader as webreader
# df = webreader.DataReader(symbol, start=start_date, end=end_date, data_source=&quot;yahoo&quot;)
import yfinance as yf #Alternative package if webreader does not work: pip install yfinance
df = yf.download(symbol, start=start_date, end=end_date)
# Quick overview of dataset
print(df.head())</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">[*********************100%***********************]  1 of 1 completed
                 Open       High        Low      Close  Adj Close    Volume
Date                                                                       
2010-01-04  28.580000  28.610001  28.450001  28.520000  19.081614  13870400
2010-01-05  28.424999  28.495001  28.070000  28.174999  18.850786  23172400
2010-01-06  28.174999  28.219999  27.990000  28.165001  18.844103  19264600
2010-01-07  28.165001  28.184999  27.875000  28.094999  18.797268  13234600
2010-01-08  27.730000  27.820000  27.375000  27.575001  18.449350  28712400</pre></div>



<p class="wp-block-paragraph">Once we have downloaded the data, we create a line plot of the closing price to familiarize ourselves with the time series data. Note that Facebook Prophet works on a single input signal only (univariate data). This input will be the closing price. For illustration purposes, we add a moving average to the chart. However, the moving average makes it easier to spot trends and seasonal patterns, it will not be used to fit the model. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Visualize the original time series
rolling_window=25
y_a_add_ma = df['Close'].rolling(window=rolling_window).mean() 
fig, ax = plt.subplots(figsize=(20,5))
sns.lineplot(data=df, x=df.index, y='Close', color='skyblue', linewidth=0.5, label='Close')
sns.lineplot(data=df, x=df.index, y=y_a_add_ma, 
    linewidth=1.0, color='royalblue', linestyle='--', label=f'{rolling_window}-Day MA')</pre></div>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="284" data-attachment-id="10876" data-permalink="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/image-10-15/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-10.png" data-orig-size="1614,448" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-10" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-10.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-10-1024x284.png" alt="lineplot with historical price quotes of the Coca-cola stock since 2010" class="wp-image-10876" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-10.png 1024w, https://www.relataly.com/wp-content/uploads/2022/12/image-10.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-10.png 768w, https://www.relataly.com/wp-content/uploads/2022/12/image-10.png 1536w, https://www.relataly.com/wp-content/uploads/2022/12/image-10.png 1614w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p class="wp-block-paragraph">The chart shows a long-term upward trend interrupted by phases of downturns. In addition, between 2010 and 2018, we can see some cyclical movements. At some points, we can spot clear breakpoints, for example, in 2019 and mid-2020. </p>



<h3 class="wp-block-heading"><strong><strong><strong>Step #2 Preparing the Data</strong></strong></strong></h3>



<p class="wp-block-paragraph">Next, we prepare our data for model training. Propjet has a strict condition on how the input columns must be named. In order to use Facebook Prophet, your data needs to be in a time series format with the time as the index and the value as the first column. In addition, column names need to adhere to the following naming convention:</p>



<ul class="wp-block-list">
<li><strong>ds </strong>for the timestamp</li>



<li><strong>y </strong>for the metric columns, which in our case is the closing price</li>
</ul>



<p class="wp-block-paragraph">So before we proceed, we must rename the columns in our dataframe. In addition, we will remove the index and drop NA values. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">df_x = df[['Close']].copy()
df_x['ds'] = df.index.copy()
df_x.rename(columns={'Close': 'y'}, inplace=True)
df_x.reset_index(inplace=True, drop=True)
df_x.dropna(inplace=True)
df_x.tail(9)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">		y			ds
3257	63.139999	2022-12-09
3258	63.970001	2022-12-12
3259	63.990002	2022-12-13</pre></div>



<p class="wp-block-paragraph">Now we have a simple dataframe with ds and y as the only variables.</p>



<h3 class="wp-block-heading" id="h-step-3-model-fitting-and-forecasting"><strong>Step #3 Model Fitting and Forecasting</strong></h3>



<p class="wp-block-paragraph">Next, let&#8217;s fit our forecasting model to the time series data. Afterward, we can make predictions about future values in the series. However, before we do this, we need to define our prediction interval. </p>



<h4 class="wp-block-heading">3.1 Setting the Prediction Interval</h4>



<p class="wp-block-paragraph">The prediction interval is a measure of uncertainty in a forecast made with Facebook Prophet. It indicates the range within which the true value of the forecasted quantity is expected to fall a certain percentage of the time. For example, a 95% prediction interval means that the true value of the forecasted quantity is expected to fall within the given range 95% of the time. </p>



<p class="wp-block-paragraph">In Facebook Prophet, the prediction interval is controlled by the interval_width parameter, which can be set when calling the predict method. The default value for interval_width is 0.80. This means that the true value of the forecasted quantity is expected to fall within the prediction interval 80% of the time. We can adjust the value of interval_width to change the width of the prediction interval as desired. In the example below, we use a prediction interval of 0.85.</p>



<h4 class="wp-block-heading">3.2 Fit the Model</h4>



<p class="wp-block-paragraph">Next, let&#8217;s fit our model and generate a one-year forecast. First, we need to instantiate our model with by calling Prophet(). Then we use model.fit(df) to fit this model to the historical price quotes of the Coca-Cola stock. Once, we have done that, we use the model instance model.make_future_dataframe() to create an extended dataframe (future_df). This dataframe has been extended with records for a one-year period. The records are empty dummy values ready to be filled with the real forecast. We then pass this dummy dataframe to the model.predict(df) function, Facebook Prophet creates the forecast and fills up the dummy dataframe with the forecast values.  </p>



<p class="wp-block-paragraph">For the sake of reusability, I have encapsulated the entire process into a wrapper function. This will allow us to run quick experiments with different parameter values.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># This function fits the prophet model to the input data and generates a forecast
def fit_and_forecast(df, periods, interval_width, changepoint_range=0.8):
    # set the uncertainty interval
    Prophet(interval_width=interval_width)
    # Instantiate the model
    model = Prophet(changepoint_range=changepoint_range)
    # Fit the model
    model.fit(df)
    # Create a dataframe with a given number of dates
    future_df = model.make_future_dataframe(periods=periods)
    # Generate a forecast for the given dates
    forecast_df = model.predict(future_df)
    #print(forecast_df.head())
    return forecast_df, model, future_df
# Forecast for 365 days with full data
forecast_df, model, future_df = fit_and_forecast(df_x, 365, 0.95)
print(forecast_df.columns)
forecast_df[['yhat_lower', 'yhat_upper', 'yhat']].head(5)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Index(['ds', 'trend', 'yhat_lower', 'yhat_upper', 'trend_lower', 'trend_upper',
       'additive_terms', 'additive_terms_lower', 'additive_terms_upper',
       'weekly', 'weekly_lower', 'weekly_upper', 'yearly', 'yearly_lower',
       'yearly_upper', 'multiplicative_terms', 'multiplicative_terms_lower',
       'multiplicative_terms_upper', 'yhat'],
      dtype='object')
	yhat_lower	yhat_upper	yhat
0	24.468273	28.944286	26.691615
1	24.496074	29.146425	26.706924
2	24.513424	28.829159	26.682213
3	24.358048	28.767209	26.667476
4	24.487963	28.839966	26.666242</pre></div>



<p class="wp-block-paragraph">Voila, we have generated a one-year forecast. </p>



<h3 class="wp-block-heading">Step #4 Analyzing the Forecast</h3>



<p class="wp-block-paragraph">Next, let&#8217;s visualize our forecast and discuss what we see. The most simple way is to create the plot with a standard Facebook Prophet function.</p>



<p class="wp-block-paragraph">Also: <a href="https://www.relataly.com/regression-error-metrics-python/923/" target="_blank" rel="noreferrer noopener">Measuring Regression Errors with Python</a> </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">model.plot(forecast_df, uncertainty=True)</pre></div>



<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="10880" data-permalink="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/image-35-4/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-35.png" data-orig-size="989,590" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-35" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-35.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-35.png" alt="Prophet forecast for the coca-cola stock" class="wp-image-10880" width="867" height="517" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-35.png 989w, https://www.relataly.com/wp-content/uploads/2022/12/image-35.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-35.png 768w" sizes="(max-width: 867px) 100vw, 867px" /></figure>



<p class="wp-block-paragraph">So what do we see? The forecast shows that our model does not simply predict a straight line and instead has generated a more sophisticated forecast that displays an upward cyclical trend with higher highs and higher lows. </p>



<ul class="wp-block-list">
<li>The black dots are the data points from the historical data to which we have fit our model. </li>



<li>The dark blue line is the most likely path. </li>



<li>The light blue lines are the upper and lower boundaries of the prediction interval. We have set the prediction interval to 0.85, which means there is a probability of 85% the actual values will fall into this range. </li>



<li>In total, the model seems confident that the price of Coca-Cola stock will rise within the next year (no financial advice). However, as we will see later, the forecast depends on where the model sees the breakpoints.</li>
</ul>



<p class="wp-block-paragraph">In case, you want to create a custom plot, you can use the function below. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Visualize the Forecast
def visualize_the_forecast(df_f, df_o):
    rolling_window = 20
    yhat_mean = df_f['yhat'].rolling(window=rolling_window).mean() 
    # Thin out the ground truth data for illustration purposes
    df_lim = df_o
    # Print the Forecast
    fig, ax = plt.subplots(figsize=[20,7])
    sns.lineplot(data=df_f, x=df_f.ds, y=yhat_mean, ax=ax, label='predicted path', color='blue')
    sns.lineplot(data=df_lim, x=df_lim.ds, y='y', ax=ax, label='ground_truth', color='orange')
    #sns.lineplot(data=df_f, x=df_f.ds, y='yhat_lower', ax=ax, label='yhat_lower', color='skyblue', linewidth=1.0)
    #sns.lineplot(data=df_f, x=df_f.ds, y='yhat_upper', ax=ax, label='yhat_upper', color='coral', linewidth=1.0)
    plt.fill_between(df_f.ds, df_f.yhat_lower, df_f.yhat_upper, color='lightgreen')
    plt.legend(framealpha=0)
    ax.set(ylabel=stockname + &quot; stock price&quot;)
    ax.set(xlabel=None)
visualize_the_forecast(forecast_df, df_x)</pre></div>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="369" data-attachment-id="10879" data-permalink="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/image-11-12/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-11.png" data-orig-size="1614,582" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-11" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-11.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-11-1024x369.png" alt="time series forecast generated with Facebook prophet for the coca cola stock: ground truth and predicted path" class="wp-image-10879" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-11.png 1024w, https://www.relataly.com/wp-content/uploads/2022/12/image-11.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-11.png 768w, https://www.relataly.com/wp-content/uploads/2022/12/image-11.png 1536w, https://www.relataly.com/wp-content/uploads/2022/12/image-11.png 1614w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p class="wp-block-paragraph"></p>



<h3 class="wp-block-heading"><strong>Step #5 Analyzing Model Components</strong></h3>



<p class="wp-block-paragraph">We can gain a better understanding of different model components by using the plot_components function. This method creates a plot showing the trend, weekly and yearly seasonality, and any additional user-defined seasonalities of the forecast. This can be useful for understanding the underlying patterns in the data and for diagnosing potential issues with the model.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">model.plot_components(forecast_df)</pre></div>



<figure class="wp-block-image size-full"><img decoding="async" width="897" height="890" data-attachment-id="11036" data-permalink="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/image-41-6/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-41.png" data-orig-size="897,890" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-41" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-41.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-41.png" alt="Illustration of the three components of our prophet model" class="wp-image-11036" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-41.png 897w, https://www.relataly.com/wp-content/uploads/2022/12/image-41.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-41.png 140w, https://www.relataly.com/wp-content/uploads/2022/12/image-41.png 768w" sizes="(max-width: 897px) 100vw, 897px" /></figure>



<p class="wp-block-paragraph">The first chart shows the trendlines that the model sees within different periods. The trendlines are separated by breakpoints about, which we will talk in the next section. When we look at the second plot, we can see no price changes during the weekend. This is plausible, considering that the stock markets are closed over the weekend. The third chart is most interesting, as it shows that the model has recognized some yearly seasonality with two peaks in April and August, as well as lows in March and October.</p>



<h3 class="wp-block-heading">Step #6 Adjusting the Changepoints of our Facebook Prophet Model</h3>



<p class="wp-block-paragraph">Let&#8217;s take a closer look at the changepoints in our model. Changepoints are the points in time where the trend of the time series is expected to change, and Facebook Prophet&#8217;s algorithm automatically detects these points and adapts the model accordingly. Changepoints are important to Facebook Prophet because they allow the model to capture gradual changes or shifts in the data. By identifying and incorporating changepoints into the forecasting model, Facebook Prophet can make more accurate predictions. Changepoints can also help to identify potential outliers in the data.</p>



<h4 class="wp-block-heading">6.1 Checking Current Changepoints</h4>



<p class="wp-block-paragraph">We can illustrate the changepoints in our model with the add_changepoints_to_plot method. The method adds vertical lines to a plot to indicate the locations of the changepoints in the data. By plotting the changepoints on a graph, we can visually identify when these changes in trend occur and potentially diagnose any issues with our model.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Printing the ChangePoints of our Model
forecast_df, model, future_df = fit_and_forecast(df_x, 365, 1.0)
axislist = add_changepoints_to_plot(model.plot(forecast_df).gca(), model, forecast_df)</pre></div>



<figure class="wp-block-image size-full"><img decoding="async" width="989" height="589" data-attachment-id="11038" data-permalink="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/image-42-5/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-42.png" data-orig-size="989,589" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-42" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-42.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-42.png" alt="Changepoints in a chart showing a Prophet forecast for the coca-cola stock. Changepoint_range = 0.8" class="wp-image-11038" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-42.png 989w, https://www.relataly.com/wp-content/uploads/2022/12/image-42.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-42.png 768w" sizes="(max-width: 989px) 100vw, 989px" /></figure>



<p class="wp-block-paragraph">The chart above shows that our model has identified several changepoints in the historical data. However, it has only searched for changepoints within 80% of the time series. As a result, the algorithm hasn&#8217;t identified any change points in the most recent years after 2020. We can adjust the changepoints with the changepoint_range (default = 80%) variable. This is what we will do in the next section. </p>



<h4 class="wp-block-heading">6.2 Adjusting Changepoints</h4>



<p class="wp-block-paragraph">We can adjust the range within which Facebook Prophet looks for changepoints with the &#8220;changepoint_range&#8221;.  It is specified as a fraction of the total duration of the time series. For example, if changepoint_range is set to 0.8 and the time series spans 10 years, the algorithm will look for changepoints within the last 8 years of the series.</p>



<p class="wp-block-paragraph">By default, changepoint_range is set to 0.8, which means that the algorithm will look for changepoints within the last 80% of the time series. We can adjust this value depending on the characteristics of our data and our desired level of flexibility in the model.</p>



<p class="wp-block-paragraph">Increasing the value of changepoint_range will allow the algorithm to identify more changepoints and potentially improve the fit of the model, but it may also increase the risk of overfitting. Conversely, decreasing the value of changepoint_range will reduce the number of changepoints detected and may improve the model&#8217;s ability to generalize to new data, but it may also reduce the accuracy of the forecast.</p>



<p class="wp-block-paragraph">Let&#8217;s fit our model again, but this time we let Facebook Prophet search for changepoints within the entire time series (changepoint_range=1.0).</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Adjusting ChangePoints of our Model
forecast_df, model, future_df = fit_and_forecast(df_x, 365, 1.0, 1.0)
axislist = add_changepoints_to_plot(model.plot(forecast_df).gca(), model, forecast_df)</pre></div>



<figure class="wp-block-image size-full"><img decoding="async" width="989" height="590" data-attachment-id="11043" data-permalink="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/image-43-5/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-43.png" data-orig-size="989,590" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-43" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-43.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-43.png" alt="Changepoints in a chart showing a Prophet forecast for the coca-cola stock. Changepoint_range = 1.0" class="wp-image-11043" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-43.png 989w, https://www.relataly.com/wp-content/uploads/2022/12/image-43.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-43.png 768w" sizes="(max-width: 989px) 100vw, 989px" /></figure>



<p class="wp-block-paragraph">The plot above shows that Facebook Prophet has now identified several additional breakpoints in the time series. As a result, the forecast has become rather pessimistic, as Facebook Prophet gave more weight to recent changes.</p>



<p class="wp-block-paragraph">Finally, it is worth mentioning that it is possible to add changepoints for specific dates manually. You can try this out using &#8220;model.changepoints(series)&#8221;. The function takes a series of timestamps as the parameter value. </p>



<h2 class="wp-block-heading" id="h-summary">Summary</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">Get ready to dive into the world of stock market prediction with Facebook Prophet! In this article, we&#8217;ll show you how to leverage the power of this amazing tool to forecast time series data, using Coca-Cola&#8217;s stock as an example. We&#8217;ll guide you through the process of fitting a curve to univariate time series data and fine-tuning the initial breakpoints and trendlines to enhance model performance. With Facebook Prophet&#8217;s automatic trend identification algorithm, you&#8217;ll be able to easily adapt to changes in the data over time.</p>



<p class="wp-block-paragraph">Also: <a href="https://www.relataly.com/feature-engineering-for-multivariate-time-series-models-with-python/1813/" target="_blank" rel="noreferrer noopener">Mastering Multivariate Stock Market Prediction with Python</a> </p>



<p class="wp-block-paragraph">As a data scientist, you&#8217;ll appreciate how easy it is to use Facebook Prophet and how it consistently outperforms other models. With its straightforward interface and impressive accuracy, this tool is a must-have for your forecasting toolkit. And we&#8217;re always looking for feedback from our audience, so let us know what you think! We&#8217;re committed to improving our content to provide the best learning experience possible.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h2 class="wp-block-heading">Sources and Further Reading</h2>



<ol class="wp-block-list">
<li><a href="https://amzn.to/3MAy8j5" target="_blank" rel="noreferrer noopener"></a><a href="https://peerj.com/preprints/3190/" target="_blank" rel="noreferrer noopener">Taylor and Letham, 2017, Forecasting at scale</a></li>



<li><a href="https://facebook.github.io/prophet/docs/quick_start.html" target="_blank" rel="noreferrer noopener">github.io/prophet/docs/quick_start.html</a></li>



<li><a href="https://amzn.to/3EKidwE" target="_blank" rel="noreferrer noopener">David Forsyth (2019) Applied Machine Learning Springer</a></li>
</ol>



<p class="has-contrast-2-color has-base-3-background-color has-text-color has-background wp-block-paragraph"><em>The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.</em></p>



<p class="wp-block-paragraph">Other Methods for Time Series Forecasting</p>



<ul class="wp-block-list">
<li><a href="https://www.relataly.com/univariate-stock-market-forecasting-using-a-recurrent-neural-network/122/" target="_blank" rel="noreferrer noopener">Univariate time series forecasting with Recurrent Neural Networks</a></li>



<li><a href="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/" target="_blank" rel="noreferrer noopener">Multivariate time series forecasting with Recurrent Neural Networks</a></li>



<li><a href="https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/" target="_blank" rel="noreferrer noopener">Forecasting sales data with ARIMA models</a></li>
</ul>
<p>The post <a href="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/">Univariate Stock Market Forecasting using Facebook Prophet in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">10351</post-id>	</item>
		<item>
		<title>Using Pandas DataReader to Access Online Data Sources in Python</title>
		<link>https://www.relataly.com/using-pandas-datareader-in-python/10934/</link>
					<comments>https://www.relataly.com/using-pandas-datareader-in-python/10934/#respond</comments>
		
		<dc:creator><![CDATA[Florian Follonier]]></dc:creator>
		<pubDate>Sat, 15 Oct 2022 20:14:00 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[Data Sources]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Tech]]></category>
		<category><![CDATA[Yahoo Finance API]]></category>
		<category><![CDATA[API Tutorials]]></category>
		<category><![CDATA[Beginner Tutorials]]></category>
		<category><![CDATA[Requesting Data via REST APIs]]></category>
		<guid isPermaLink="false">https://www.relataly.com/?p=10934</guid>

					<description><![CDATA[<p>Pandas DataReader is a library that allows data scientists to easily read data from a variety of sources into a Pandas DataFrame. This is especially useful for accessing data that resides outside of their local development environment and needs to be accessed via APIs. The Pandas DataReader provides functions for loading data from various online ... <a title="Using Pandas DataReader to Access Online Data Sources in Python" class="read-more" href="https://www.relataly.com/using-pandas-datareader-in-python/10934/" aria-label="Read more about Using Pandas DataReader to Access Online Data Sources in Python">Read more</a></p>
<p>The post <a href="https://www.relataly.com/using-pandas-datareader-in-python/10934/">Using Pandas DataReader to Access Online Data Sources in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">Pandas DataReader is a library that allows data scientists to easily read data from a variety of sources into a Pandas DataFrame. This is especially useful for accessing data that resides outside of their local development environment and needs to be accessed via APIs. The Pandas DataReader provides functions for loading data from various online sources, including Yahoo Finance and the NASDAQ. This can be incredibly helpful for tasks such as financial analysis, data visualization, and machine learning. In this tutorial, we will give a brief overview of the library and show how to use it in Python to access financial data from the Yahoo Finance API.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h2 class="wp-block-heading">What is Pandas Data Reader?</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">The Pandas DataReader library provides functions that extract data from various Internet sources into a pandas DataFrame. The pandas DataReader supports several remote data providers, including <a href="https://pandas-datareader.readthedocs.io/en/latest/remote_data.html#remote-data-alphavantage" target="_blank" rel="noreferrer noopener">Alpha Vantage</a>, <a href="https://pandas-datareader.readthedocs.io/en/latest/remote_data.html#remote-data-wb" target="_blank" rel="noreferrer noopener">World Bank</a>, <a href="https://pandas-datareader.readthedocs.io/en/latest/remote_data.html#eurostat" target="_blank" rel="noreferrer noopener">Eurostat</a>, the <a href="https://pandas-datareader.readthedocs.io/en/latest/remote_data.html#oecd" target="_blank" rel="noreferrer noopener">OECD, </a>and several stock markets such as the <a href="https://pandas-datareader.readthedocs.io/en/latest/remote_data.html#nasdaq-trader-symbol-definitions" target="_blank" rel="noreferrer noopener">NASDAQ</a>, <a href="https://pandas-datareader.readthedocs.io/en/latest/remote_data.html#remote-data-yahoo" target="_blank" rel="noreferrer noopener">Yahoo Finance</a>, and<a href="https://pandas-datareader.readthedocs.io/en/latest/remote_data.html#remote-data-naver" target="_blank" rel="noreferrer noopener"> Naver Finance</a>. A complete list of available sources is available from the pandas DataReader <a href="https://pandas-datareader.readthedocs.io/en/latest/remote_data.html" target="_blank" rel="noreferrer noopener">API documentation</a>.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="252" data-attachment-id="11792" data-permalink="https://www.relataly.com/using-pandas-datareader-in-python/10934/image-10/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/01/image.png" data-orig-size="1246,307" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/01/image.png" src="https://www.relataly.com/wp-content/uploads/2023/01/image-1024x252.png" alt="Pandas datareader is a useful Python library for accessing remote data via an API" class="wp-image-11792" srcset="https://www.relataly.com/wp-content/uploads/2023/01/image.png 1024w, https://www.relataly.com/wp-content/uploads/2023/01/image.png 300w, https://www.relataly.com/wp-content/uploads/2023/01/image.png 768w, https://www.relataly.com/wp-content/uploads/2023/01/image.png 1246w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">Pandas DataReader is a useful Python library for accessing remote data via an API</figcaption></figure>
</div>
</div>



<p class="wp-block-paragraph"></p>



<div style="height:24px" aria-hidden="true" class="wp-block-spacer"></div>



<h2 class="wp-block-heading" id="h-access-financial-data-using-pandas-datareader-and-the-yahoo-finance-rest-api-in-python">Access Financial Data using Pandas DataReader and the Yahoo Finance REST API in Python</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">In this tutorial, we will learn how to use the pandas library to retrieve data for the German stock market index DAX from the Yahoo finance API. Specifically, we will use the pandas_datareader package, which provides a convenient interface for accessing various online data sources. We will carry out the following steps:</p>



<ol class="wp-block-list">
<li>Install the pandas_datareader package.</li>



<li>Import the necessary libraries in our Python script.</li>



<li>Use the data.DataReader function to request data for the DAX index from the Yahoo finance API. Specify the start and end dates for the data you want to retrieve. The returned data will be stored in a pandas DataFrame. </li>



<li>Finally, we use the plot() method from the matplotlib library to visualize the data.</li>
</ol>



<p class="wp-block-paragraph">The code is available on the GitHub repository.</p>



<div class="wp-block-kadence-advancedbtn kb-buttons-wrap kb-btns_368b60-d7"><a class="kb-button kt-button button kb-btn_1809f5-70 kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit kt-btn-has-text-true kt-btn-has-svg-true wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-API-tutorials/blob/main/101%20Pulling%20COVID-19%20Data%20via%20the%20Statworx%20API%20to%20a%20DataFrame.ipynb" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fe_eye kt-btn-icon-side-left"><svg viewBox="0 0 24 24"  fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M1 12s4-8 11-8 11 8 11 8-4 8-11 8-11-8-11-8z"/><circle cx="12" cy="12" r="3"/></svg></span><span class="kt-btn-inner-text">View on GitHub </span></a>

<a class="kb-button kt-button button kb-btn_ad4d40-fa kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit kt-btn-has-text-true kt-btn-has-svg-true wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-API-tutorials" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fa_github kt-btn-icon-side-left"><svg viewBox="0 0 496 512"  fill="currentColor" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg></span><span class="kt-btn-inner-text">Relataly GitHub Repo </span></a></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="11808" data-permalink="https://www.relataly.com/using-pandas-datareader-in-python/10934/image-2-4/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/01/image-2.png" data-orig-size="1442,1216" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-2" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/01/image-2.png" src="https://www.relataly.com/wp-content/uploads/2023/01/image-2-1024x864.png" alt="Pandas DataReader provides access to a wide range of public data sources. " class="wp-image-11808" width="382" height="323" srcset="https://www.relataly.com/wp-content/uploads/2023/01/image-2.png 1024w, https://www.relataly.com/wp-content/uploads/2023/01/image-2.png 300w, https://www.relataly.com/wp-content/uploads/2023/01/image-2.png 768w, https://www.relataly.com/wp-content/uploads/2023/01/image-2.png 1442w" sizes="(max-width: 382px) 100vw, 382px" /><figcaption class="wp-element-caption">Pandas DataReader provides access to a wide range of public data sources.</figcaption></figure>
</div>
</div>



<div style="height:29px" aria-hidden="true" class="wp-block-spacer"></div>



<h3 class="wp-block-heading" id="h-prerequisites">Prerequisites</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">Before starting the coding part, ensure you have set up your <a href="https://www.python.org/downloads/" target="_blank" rel="noreferrer noopener">Python 3</a> environment and required libraries. If you don&#8217;t have an environment, consider following&nbsp;the steps in <a href="https://www.relataly.com/anaconda-python-environment-machine-learning/1663/" target="_blank" rel="noreferrer noopener">this tutorial</a>&nbsp;to set up the&nbsp;<a href="https://www.anaconda.com/products/individual" target="_blank" rel="noreferrer noopener">Anaconda environment</a>.</p>



<p class="wp-block-paragraph">Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:&nbsp;</p>



<ul class="wp-block-list">
<li><em><a href="https://pandas.pydata.org/" target="_blank" rel="noreferrer noopener">pandas</a></em></li>



<li><em><a href="https://numpy.org/" target="_blank" rel="noreferrer noopener">NumPy</a></em></li>



<li><a href="https://docs.python.org/3/library/math.html" target="_blank" rel="noreferrer noopener">math</a></li>



<li><em><a href="https://matplotlib.org/" target="_blank" rel="noreferrer noopener">matplotlib</a></em></li>
</ul>



<p class="wp-block-paragraph">You can install packages using console commands:</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">pip install &lt;package name&gt;
conda install &lt;package name&gt; (if you are using the anaconda packet manager)</pre></div>



<p class="wp-block-paragraph">In addition, we will be using the pandas DataReader library, which you can install with the following command:</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;disableCopy&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">pip install pandas-datareader</pre></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<div style="height:24px" aria-hidden="true" class="wp-block-spacer"></div>



<h3 class="wp-block-heading" id="h-step-1-define-the-api-request-parameters">Step #1: Define the API Request Parameters</h3>



<p class="wp-block-paragraph">We begin by setting up imports and adjusting the request parameters. The parameters in an API request will depend on the API and the library used for making the request. Also, some parameters may be optional, while others are mandatory.</p>



<p class="wp-block-paragraph">The Yahoo Finance API allows us to limit the period we want to retrieve price data, an example of an optional parameter. Furthermore, we need to define the ticker symbol for the financial instrument if we wish to request the price data. This parameter is mandatory.</p>



<p class="wp-block-paragraph">The ticker symbol for the German stock market index is  <strong>^GDAXI</strong>. If you want to retrieve price data for other stocks or indices, you can search for the respective ticker symbols on <a href="https://de.finance.yahoo.com/quote/%5EGDAXI/?guccounter=1&amp;guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&amp;guce_referrer_sig=AQAAAJH4Onl0hOq1uOzUs7uiZWttbmU1Lw3XWqfXzhqMIUNypCiocw3d_hbUWI92G9TMZn3_M9q4RnaoNYjbWte3RM2iyGc1U_iPquEwan_ezsgKxiLDidFUB2R3zuF46IOvGIqueLikt8Znl-4yDCn_o_50qCUmCr3uZTJ8p8Eaf-MI" target="_blank" rel="noreferrer noopener">Yahoo finance</a>. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">import pandas_datareader as webreader
import pandas as pd
import matplotlib.pyplot as plt

# Set the API 
data_source = &quot;yahoo&quot;

# Set the API parameters
date_today = &quot;2020-01-01&quot; # period start date
date_start = &quot;2010-01-01&quot; # period end date
symbol = &quot;^GDAXI&quot; # asset symbol - For more symbols check yahoo.finance.com</pre></div>



<h3 class="wp-block-heading" id="h-step-2-send-the-request-to-the-rest-api-endpoint">Step #2: Send the Request to the REST API Endpoint</h3>



<p class="wp-block-paragraph">Once we have defined the request parameters, we can make the request via the DataReader function and print out the result. If you request a REST API, the response will come back in JSON format. However, DataReader will directly convert the API response into a DataFrame, which makes using APIs much simpler.</p>



<p class="wp-block-paragraph">This will retrieve the DAX stock market index from Yahoo Finance and print the first few rows of the resulting DataFrame. We have specified a date range to retrieve data for a specific period of time. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Send the request to the yahoo finance api endpoint
df = webreader.DataReader(symbol, start=date_start, end=date_today, data_source=data_source)
df.head(5)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">			High		Low			Open		Close		Volume		Adj Close
Date						
2010-01-04	6048.299805	5974.430176	5975.520020	6048.299805	104344400.0	6048.299805
2010-01-05	6058.020020	6015.669922	6043.939941	6031.859863	117572100.0	6031.859863
2010-01-06	6047.569824	5997.089844	6032.390137	6034.330078	108742400.0	6034.330078
2010-01-07	6037.569824	5961.250000	6016.799805	6019.359863	133704300.0	6019.359863
2010-01-08	6053.040039	5972.240234	6028.620117	6037.609863	126099000.0	6037.609863</pre></div>



<p class="wp-block-paragraph">Dataframe with the price data from yahoo finance.</p>



<h3 class="wp-block-heading" id="h-step-3-plot-the-data">Step #3 Plot the Data</h3>



<p class="wp-block-paragraph">Let&#8217;s quickly print the data to check if everything looks ok.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Plot the closing prices
fig, ax1 = plt.subplots(figsize=(12, 8))
plt.plot(df.index, df.Close)
plt.show()</pre></div>



<figure class="wp-block-image size-full"><img decoding="async" width="1003" height="659" data-attachment-id="10950" data-permalink="https://www.relataly.com/using-pandas-datareader-in-python/10934/image-38-3/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-38.png" data-orig-size="1003,659" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-38" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-38.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-38.png" alt="plot for financial data requested via the pandas datareader python library" class="wp-image-10950" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-38.png 1003w, https://www.relataly.com/wp-content/uploads/2022/12/image-38.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-38.png 768w" sizes="(max-width: 1003px) 100vw, 1003px" /></figure>



<p class="wp-block-paragraph">Everything looks good, so let&#8217;s proceed.</p>



<h3 class="wp-block-heading" id="h-step-4-save-the-data-to-a-csv-file">Step #4: Save the Data to a CSV File</h3>



<p class="wp-block-paragraph">To save the data from a Pandas DataFrame to a CSV file, you can use the to_csv method. The to_csv method takes a few optional arguments that you can use to customize the output. For example, you can use the &#8220;sep&#8221; argument to specify a different delimiter to use in the CSV file or the &#8220;index&#8221; argument for including or excluding the DataFrame&#8217;s index in the output.</p>



<p class="wp-block-paragraph">Here&#8217;s an example of how you can use the to_csv method with the index parameter set to False:</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Save the data to a CSV file
df.to_csv(&quot;price_quotes.csv&quot;, index=False)</pre></div>



<p class="wp-block-paragraph">Now you have the data on your local machine and can load it later. So unless you require more actual data, there is no need to call the API again.</p>



<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="11794" data-permalink="https://www.relataly.com/using-pandas-datareader-in-python/10934/image-1-4/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/01/image-1.png" data-orig-size="902,220" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-1" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/01/image-1.png" src="https://www.relataly.com/wp-content/uploads/2023/01/image-1.png" alt="price quotes csv file downloaded with Pandas DataReader library for Python" class="wp-image-11794" width="918" height="224" srcset="https://www.relataly.com/wp-content/uploads/2023/01/image-1.png 902w, https://www.relataly.com/wp-content/uploads/2023/01/image-1.png 300w, https://www.relataly.com/wp-content/uploads/2023/01/image-1.png 768w" sizes="(max-width: 918px) 100vw, 918px" /></figure>



<div style="height:29px" aria-hidden="true" class="wp-block-spacer"></div>



<h2 class="wp-block-heading" id="h-summary">Summary</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">This article has shown how to use the Pandas DataReader library. We learned how to use the library to request data from the Yahoo Finance API and save the data to a Pandas DataFrame. The Pandas DataReader library is a helpful tool for importing financial data into a Pandas DataFrame and working with it in Python. You can use it to retrieve data from a wide range of sources, including stock prices from major stock exchanges, economic data from the Federal Reserve, and cryptocurrency prices. Once you have the data in a DataFrame, you can use the various methods and functions provided by Pandas to analyze and manipulate the data, and save the results to a CSV file using the to_csv method.</p>



<p class="wp-block-paragraph">I hope this post was helpful. If you have any remarks or questions, let me know.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img decoding="async" width="504" height="502" data-attachment-id="12632" data-permalink="https://www.relataly.com/pandas-data-library-panda-midjourney-python-relataly-tutorial-min/" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/03/pandas-data-library-panda-midjourney-python-relataly-tutorial-min.png" data-orig-size="504,502" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="pandas data library panda midjourney python relataly tutorial-min" data-image-description="&lt;p&gt;This panda just loaded a lot of data into his python project. &lt;/p&gt;
" data-image-caption="&lt;p&gt;This panda just loaded a lot of data into his python project. &lt;/p&gt;
" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/pandas-data-library-panda-midjourney-python-relataly-tutorial-min.png" src="https://www.relataly.com/wp-content/uploads/2023/03/pandas-data-library-panda-midjourney-python-relataly-tutorial-min.png" alt="This panda just loaded a lot of data into his python project. " class="wp-image-12632" srcset="https://www.relataly.com/wp-content/uploads/2023/03/pandas-data-library-panda-midjourney-python-relataly-tutorial-min.png 504w, https://www.relataly.com/wp-content/uploads/2023/03/pandas-data-library-panda-midjourney-python-relataly-tutorial-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/03/pandas-data-library-panda-midjourney-python-relataly-tutorial-min.png 140w" sizes="(max-width: 504px) 100vw, 504px" /><figcaption class="wp-element-caption">This panda looks happy because it just loaded data into his python project. Image created with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a>.</figcaption></figure>
</div>
</div>



<p class="wp-block-paragraph"></p>



<h2 class="wp-block-heading">Sources and Further Reading</h2>



<p class="wp-block-paragraph"><a href="https://pandas-datareader.readthedocs.io/en/latest/index.html" target="_blank" rel="noreferrer noopener">pandas-datareader.readthedocs.io/</a></p>



<p class="wp-block-paragraph">Images created with Midjourney AI</p>



<h3 class="wp-block-heading">Further API Tutorials</h3>


<ul class="wp-block-kadence-posts kb-posts kadence-posts-list kb-posts-id-_69433d-4d content-wrap grid-cols kb-posts-style-boxed grid-sm-col-1 grid-lg-col-3 item-image-style-above"><li class="kb-post-list-item">
	<article class="entry content-bg loop-entry post-12143 post type-post status-publish format-standard has-post-thumbnail hentry category-language-generation category-machine-learning-marketing-automation category-natural-language-processing-nlp category-openai category-python-programming category-rest-apis tag-api-tutorials tag-beginner-tutorials tag-deep-learning">
				<a aria-hidden="true" tabindex="-1" role="presentation" class="post-thumbnail kadence-thumbnail-ratio-2-3" href="https://www.relataly.com/automated-prompt-generation-for-dall-e-using-chatgpt-in-python-a-step-by-step-api-tutorial/12143/" aria-label="Generating Detailed Images with OpenAI DALL-E and ChatGPT in Python: A Step-By-Step API Tutorial">
			<div class="post-thumbnail-inner">
				<img decoding="async" width="768" height="382" src="https://www.relataly.com/wp-content/uploads/2023/01/Flo7up_a_robot_painting_a_picture_with_data_technology_and_ai_i_5e7ffa5e-06c3-436b-b4fa-3fc5af58e546-Copy-min.png" class="attachment-medium_large size-medium_large wp-post-image" alt="OpenAI Dall-E ChatGPT Prompt Design Detailed Images Combining ChatGPT and Dall-E Midjourney" srcset="https://www.relataly.com/wp-content/uploads/2023/01/Flo7up_a_robot_painting_a_picture_with_data_technology_and_ai_i_5e7ffa5e-06c3-436b-b4fa-3fc5af58e546-Copy-min.png 1530w, https://www.relataly.com/wp-content/uploads/2023/01/Flo7up_a_robot_painting_a_picture_with_data_technology_and_ai_i_5e7ffa5e-06c3-436b-b4fa-3fc5af58e546-Copy-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/01/Flo7up_a_robot_painting_a_picture_with_data_technology_and_ai_i_5e7ffa5e-06c3-436b-b4fa-3fc5af58e546-Copy-min.png 512w, https://www.relataly.com/wp-content/uploads/2023/01/Flo7up_a_robot_painting_a_picture_with_data_technology_and_ai_i_5e7ffa5e-06c3-436b-b4fa-3fc5af58e546-Copy-min.png 768w" sizes="(max-width: 768px) 100vw, 768px" data-attachment-id="13511" data-permalink="https://www.relataly.com/automated-prompt-generation-for-dall-e-using-chatgpt-in-python-a-step-by-step-api-tutorial/12143/flo7up_a_robot_painting_a_picture_with_data_technology_and_ai_i_5e7ffa5e-06c3-436b-b4fa-3fc5af58e546-copy-min/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/01/Flo7up_a_robot_painting_a_picture_with_data_technology_and_ai_i_5e7ffa5e-06c3-436b-b4fa-3fc5af58e546-Copy-min.png" data-orig-size="1530,762" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="Flo7up_a_robot_painting_a_picture_with_data_technology_and_ai_i_5e7ffa5e-06c3-436b-b4fa-3fc5af58e546 &amp;#8211; Copy-min" data-image-description="&lt;p&gt;OpenAI Dall-E ChatGPT Prompt Design Detailed Images Combining ChatGPT and Dall-E Midjourney&lt;/p&gt;
" data-image-caption="&lt;p&gt;OpenAI Dall-E ChatGPT Prompt Design Detailed Images Combining ChatGPT and Dall-E Midjourney&lt;/p&gt;
" data-large-file="https://www.relataly.com/wp-content/uploads/2023/01/Flo7up_a_robot_painting_a_picture_with_data_technology_and_ai_i_5e7ffa5e-06c3-436b-b4fa-3fc5af58e546-Copy-min.png" />			</div>
		</a><!-- .post-thumbnail -->
				<div class="entry-content-wrap">
			<header class="entry-header">
	<h2 class="entry-title"><a href="https://www.relataly.com/automated-prompt-generation-for-dall-e-using-chatgpt-in-python-a-step-by-step-api-tutorial/12143/" rel="bookmark">Generating Detailed Images with OpenAI DALL-E and ChatGPT in Python: A Step-By-Step API Tutorial</a></h2></header><!-- .entry-header -->
<footer class="entry-footer">
	</footer><!-- .entry-footer -->		</div>
	</article>
</li>
<li class="kb-post-list-item">
	<article class="entry content-bg loop-entry post-12068 post type-post status-publish format-standard has-post-thumbnail hentry category-natural-language-processing-nlp category-openai category-rest-apis tag-api-tutorials tag-beginner-tutorials tag-chatgpt tag-deep-learning">
				<a aria-hidden="true" tabindex="-1" role="presentation" class="post-thumbnail kadence-thumbnail-ratio-2-3" href="https://www.relataly.com/using-chatgpt-and-other-openai-models-via-apis-in-python/12068/" aria-label="Unleashing the Power of ChatGPT and Other OpenAI GPT Language Models in Python A Guide to Using APIs">
			<div class="post-thumbnail-inner">
				<img decoding="async" width="768" height="300" src="https://www.relataly.com/wp-content/uploads/2023/02/unleashing-the-power-of-openai-super-hero-robot-gpt-python-ai-min.png" class="attachment-medium_large size-medium_large wp-post-image" alt="unleashing the power of openai super hero robot gpt python ai value proposition chatgpt" srcset="https://www.relataly.com/wp-content/uploads/2023/02/unleashing-the-power-of-openai-super-hero-robot-gpt-python-ai-min.png 1614w, https://www.relataly.com/wp-content/uploads/2023/02/unleashing-the-power-of-openai-super-hero-robot-gpt-python-ai-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/02/unleashing-the-power-of-openai-super-hero-robot-gpt-python-ai-min.png 512w, https://www.relataly.com/wp-content/uploads/2023/02/unleashing-the-power-of-openai-super-hero-robot-gpt-python-ai-min.png 768w, https://www.relataly.com/wp-content/uploads/2023/02/unleashing-the-power-of-openai-super-hero-robot-gpt-python-ai-min.png 1536w" sizes="(max-width: 768px) 100vw, 768px" data-attachment-id="13197" data-permalink="https://www.relataly.com/openai-gpt-chatgpt-in-a-business-context-whats-the-value-proposition/12282/unleashing-the-power-of-openai-super-hero-robot-gpt-python-ai-min/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/02/unleashing-the-power-of-openai-super-hero-robot-gpt-python-ai-min.png" data-orig-size="1614,631" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="unleashing the power of openai super hero robot gpt python ai-min" data-image-description="&lt;p&gt;unleashing the power of openai super hero robot gpt python ai value proposition chatgpt&lt;/p&gt;
" data-image-caption="&lt;p&gt;unleashing the power of openai super hero robot gpt python ai value proposition chatgpt&lt;/p&gt;
" data-large-file="https://www.relataly.com/wp-content/uploads/2023/02/unleashing-the-power-of-openai-super-hero-robot-gpt-python-ai-min.png" />			</div>
		</a><!-- .post-thumbnail -->
				<div class="entry-content-wrap">
			<header class="entry-header">
	<h2 class="entry-title"><a href="https://www.relataly.com/using-chatgpt-and-other-openai-models-via-apis-in-python/12068/" rel="bookmark">Unleashing the Power of ChatGPT and Other OpenAI GPT Language Models in Python A Guide to Using APIs</a></h2></header><!-- .entry-header -->
<footer class="entry-footer">
	</footer><!-- .entry-footer -->		</div>
	</article>
</li>
<li class="kb-post-list-item">
	<article class="entry content-bg loop-entry post-10351 post type-post status-publish format-standard has-post-thumbnail hentry category-cryptocompare-api category-facebook-prophet category-finance category-python-programming category-rest-apis category-seaborn category-stock-market-forecasting category-time-series-forecasting category-use-case category-yahoo-finance-api tag-ai-in-finance tag-intermediate-tutorials tag-supervised-learning">
				<a aria-hidden="true" tabindex="-1" role="presentation" class="post-thumbnail kadence-thumbnail-ratio-2-3" href="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/" aria-label="Univariate Stock Market Forecasting using Facebook Prophet in Python">
			<div class="post-thumbnail-inner">
				<img decoding="async" width="768" height="307" src="https://www.relataly.com/wp-content/uploads/2023/03/stock-market-forecasting-python-relataly-midjourney-3-min.png" class="attachment-medium_large size-medium_large wp-post-image" alt="Univariate Stock Market Forecasting using Facebook Prophet in Python" srcset="https://www.relataly.com/wp-content/uploads/2023/03/stock-market-forecasting-python-relataly-midjourney-3-min.png 1455w, https://www.relataly.com/wp-content/uploads/2023/03/stock-market-forecasting-python-relataly-midjourney-3-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/03/stock-market-forecasting-python-relataly-midjourney-3-min.png 512w, https://www.relataly.com/wp-content/uploads/2023/03/stock-market-forecasting-python-relataly-midjourney-3-min.png 768w" sizes="(max-width: 768px) 100vw, 768px" data-attachment-id="13377" data-permalink="https://www.relataly.com/stock-market-forecasting-python-relataly-midjourney-3-min/" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/03/stock-market-forecasting-python-relataly-midjourney-3-min.png" data-orig-size="1455,582" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="stock market forecasting python relataly midjourney 3-min" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/stock-market-forecasting-python-relataly-midjourney-3-min.png" />			</div>
		</a><!-- .post-thumbnail -->
				<div class="entry-content-wrap">
			<header class="entry-header">
	<h2 class="entry-title"><a href="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/" rel="bookmark">Univariate Stock Market Forecasting using Facebook Prophet in Python</a></h2></header><!-- .entry-header -->
<footer class="entry-footer">
	</footer><!-- .entry-footer -->		</div>
	</article>
</li>
<li class="kb-post-list-item">
	<article class="entry content-bg loop-entry post-10098 post type-post status-publish format-standard has-post-thumbnail hentry category-blockchain-crypto-analytics category-correlation-machine-learning category-crypto-exchange-apis category-cryptocompare-api category-data-science category-finance category-python-programming category-rest-apis category-seaborn category-use-case tag-ai-in-finance tag-intermediate-tutorials">
				<a aria-hidden="true" tabindex="-1" role="presentation" class="post-thumbnail kadence-thumbnail-ratio-2-3" href="https://www.relataly.com/seven-metrics-for-on-chain-analysis-in-python/10098/" aria-label="On-Chain Analytics: Metrics for Analyzing Blockchains in Python">
			<div class="post-thumbnail-inner">
				<img decoding="async" width="768" height="314" src="https://www.relataly.com/wp-content/uploads/2023/02/blockchain-analysis-python-min.png" class="attachment-medium_large size-medium_large wp-post-image" alt="onchain-analysis - tutorial blockchain data in python CryptoCompare api" srcset="https://www.relataly.com/wp-content/uploads/2023/02/blockchain-analysis-python-min.png 1262w, https://www.relataly.com/wp-content/uploads/2023/02/blockchain-analysis-python-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/02/blockchain-analysis-python-min.png 1024w, https://www.relataly.com/wp-content/uploads/2023/02/blockchain-analysis-python-min.png 768w" sizes="(max-width: 768px) 100vw, 768px" data-attachment-id="12339" data-permalink="https://www.relataly.com/blockchain-analysis-python-min/" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/02/blockchain-analysis-python-min.png" data-orig-size="1262,516" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="blockchain analysis python-min" data-image-description="&lt;p&gt;onchain-analysis &amp;#8211; tutorial blockchain data in python  CryptoCompare api&lt;/p&gt;
" data-image-caption="&lt;p&gt;onchain-analysis &amp;#8211; tutorial blockchain data in python  CryptoCompare api&lt;/p&gt;
" data-large-file="https://www.relataly.com/wp-content/uploads/2023/02/blockchain-analysis-python-min.png" />			</div>
		</a><!-- .post-thumbnail -->
				<div class="entry-content-wrap">
			<header class="entry-header">
	<h2 class="entry-title"><a href="https://www.relataly.com/seven-metrics-for-on-chain-analysis-in-python/10098/" rel="bookmark">On-Chain Analytics: Metrics for Analyzing Blockchains in Python</a></h2></header><!-- .entry-header -->
<footer class="entry-footer">
	</footer><!-- .entry-footer -->		</div>
	</article>
</li>
<li class="kb-post-list-item">
	<article class="entry content-bg loop-entry post-3982 post type-post status-publish format-standard has-post-thumbnail hentry category-finance category-gate-io-api category-python-programming category-rest-apis tag-ai-in-finance tag-api-tutorials tag-beginner-tutorials tag-bitcoin tag-cryptocurrencies">
				<a aria-hidden="true" tabindex="-1" role="presentation" class="post-thumbnail kadence-thumbnail-ratio-2-3" href="https://www.relataly.com/streaming-crypto-prices-via-the-gate-io-api-with-python/3982/" aria-label="Requesting Crypto Price Data from the Gate.io REST API in Python">
			<div class="post-thumbnail-inner">
				<img decoding="async" width="768" height="305" src="https://www.relataly.com/wp-content/uploads/2021/05/gatio-cryptocurrency-data-api-midjourney-relataly-min.png" class="attachment-medium_large size-medium_large wp-post-image" alt="gatio cryptocurrency data api midjourney relataly-min" srcset="https://www.relataly.com/wp-content/uploads/2021/05/gatio-cryptocurrency-data-api-midjourney-relataly-min.png 1358w, https://www.relataly.com/wp-content/uploads/2021/05/gatio-cryptocurrency-data-api-midjourney-relataly-min.png 300w, https://www.relataly.com/wp-content/uploads/2021/05/gatio-cryptocurrency-data-api-midjourney-relataly-min.png 512w, https://www.relataly.com/wp-content/uploads/2021/05/gatio-cryptocurrency-data-api-midjourney-relataly-min.png 768w" sizes="(max-width: 768px) 100vw, 768px" data-attachment-id="12769" data-permalink="https://www.relataly.com/streaming-crypto-prices-via-the-gate-io-api-with-python/3982/gatio-cryptocurrency-data-api-midjourney-relataly-min/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2021/05/gatio-cryptocurrency-data-api-midjourney-relataly-min.png" data-orig-size="1358,540" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="gatio cryptocurrency data api midjourney relataly-min" data-image-description="&lt;p&gt;gatio cryptocurrency data api midjourney relataly-min&lt;/p&gt;
" data-image-caption="&lt;p&gt;gatio cryptocurrency data api midjourney relataly-min&lt;/p&gt;
" data-large-file="https://www.relataly.com/wp-content/uploads/2021/05/gatio-cryptocurrency-data-api-midjourney-relataly-min.png" />			</div>
		</a><!-- .post-thumbnail -->
				<div class="entry-content-wrap">
			<header class="entry-header">
	<h2 class="entry-title"><a href="https://www.relataly.com/streaming-crypto-prices-via-the-gate-io-api-with-python/3982/" rel="bookmark">Requesting Crypto Price Data from the Gate.io REST API in Python</a></h2></header><!-- .entry-header -->
<footer class="entry-footer">
	</footer><!-- .entry-footer -->		</div>
	</article>
</li>
<li class="kb-post-list-item">
	<article class="entry content-bg loop-entry post-3925 post type-post status-publish format-standard has-post-thumbnail hentry category-rest-apis category-twitter-api tag-ai-in-e-commerce tag-api-tutorials tag-automated-twitter-posts tag-beginner-tutorials tag-social-media-data tag-tweepy">
				<a aria-hidden="true" tabindex="-1" role="presentation" class="post-thumbnail kadence-thumbnail-ratio-2-3" href="https://www.relataly.com/posting-tweets-on-twitter-using-python-and-tweepy/3925/" aria-label="Posting Tweets On Twitter using Python and Tweepy">
			<div class="post-thumbnail-inner">
				<img decoding="async" width="768" height="306" src="https://www.relataly.com/wp-content/uploads/2023/03/twitter-api-gate-to-social-mediadata-relataly-tutorial-python-min.png" class="attachment-medium_large size-medium_large wp-post-image" alt="twitter api gate to social mediadata relataly tutorial python" srcset="https://www.relataly.com/wp-content/uploads/2023/03/twitter-api-gate-to-social-mediadata-relataly-tutorial-python-min.png 1390w, https://www.relataly.com/wp-content/uploads/2023/03/twitter-api-gate-to-social-mediadata-relataly-tutorial-python-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/03/twitter-api-gate-to-social-mediadata-relataly-tutorial-python-min.png 512w, https://www.relataly.com/wp-content/uploads/2023/03/twitter-api-gate-to-social-mediadata-relataly-tutorial-python-min.png 768w" sizes="(max-width: 768px) 100vw, 768px" data-attachment-id="12599" data-permalink="https://www.relataly.com/twitter-api-gate-to-social-mediadata-relataly-tutorial-python-min/" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/03/twitter-api-gate-to-social-mediadata-relataly-tutorial-python-min.png" data-orig-size="1390,554" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="twitter api gate to social mediadata relataly tutorial python" data-image-description="&lt;p&gt;twitter api gate to social mediadata relataly tutorial python&lt;/p&gt;
" data-image-caption="&lt;p&gt;twitter api gate to social mediadata relataly tutorial python&lt;/p&gt;
" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/twitter-api-gate-to-social-mediadata-relataly-tutorial-python-min.png" />			</div>
		</a><!-- .post-thumbnail -->
				<div class="entry-content-wrap">
			<header class="entry-header">
	<h2 class="entry-title"><a href="https://www.relataly.com/posting-tweets-on-twitter-using-python-and-tweepy/3925/" rel="bookmark">Posting Tweets On Twitter using Python and Tweepy</a></h2></header><!-- .entry-header -->
<footer class="entry-footer">
	</footer><!-- .entry-footer -->		</div>
	</article>
</li>
</ul><p>The post <a href="https://www.relataly.com/using-pandas-datareader-in-python/10934/">Using Pandas DataReader to Access Online Data Sources in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.relataly.com/using-pandas-datareader-in-python/10934/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">10934</post-id>	</item>
		<item>
		<title>Stock Market Forecasting Neural Networks for Multi-Output Regression in Python</title>
		<link>https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/</link>
					<comments>https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/#comments</comments>
		
		<dc:creator><![CDATA[Florian Follonier]]></dc:creator>
		<pubDate>Tue, 13 Jul 2021 21:10:23 +0000</pubDate>
				<category><![CDATA[Finance]]></category>
		<category><![CDATA[Keras]]></category>
		<category><![CDATA[Neural Networks]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Recurrent Neural Networks]]></category>
		<category><![CDATA[Scikit-Learn]]></category>
		<category><![CDATA[Seaborn]]></category>
		<category><![CDATA[Stock Market Forecasting]]></category>
		<category><![CDATA[Time Series Forecasting]]></category>
		<category><![CDATA[Yahoo Finance API]]></category>
		<category><![CDATA[AI in Finance]]></category>
		<category><![CDATA[Deep Learning]]></category>
		<category><![CDATA[Intermediate Tutorials]]></category>
		<category><![CDATA[Multi-output Neural Network]]></category>
		<category><![CDATA[Multi-Step Time Series Forecasting]]></category>
		<category><![CDATA[Multivariate Models]]></category>
		<category><![CDATA[Stock Market Prediction]]></category>
		<category><![CDATA[Supervised Learning]]></category>
		<guid isPermaLink="false">https://www.relataly.com/?p=5800</guid>

					<description><![CDATA[<p>Multi-output time series regression can forecast several steps of a time series at once. The number of neurons in the final output layer determines how many steps the model can predict. Models with one output return single-step forecasts. Models with various outputs can return entire series of time steps and thus deliver a more detailed ... <a title="Stock Market Forecasting Neural Networks for Multi-Output Regression in Python" class="read-more" href="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/" aria-label="Read more about Stock Market Forecasting Neural Networks for Multi-Output Regression in Python">Read more</a></p>
<p>The post <a href="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/">Stock Market Forecasting Neural Networks for Multi-Output Regression in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">Multi-output time series regression can forecast several steps of a time series at once. The number of neurons in the final output layer determines how many steps the model can predict. Models with one output return single-step forecasts. Models with various outputs can return entire series of time steps and thus deliver a more detailed projection of how a time series could develop in the future. This article is a hands-on Python tutorial that shows how to design a neural network architecture with multiple outputs. The goal is to create a multi-output model for stock-price forecasting using Python and Keras. By the end of this tutorial, you will have learned how to design a multi-output model for stock price forecasting using Python and Keras. This knowledge can be applied to other types of time series forecasting tasks, such as weather forecasting or sales forecasting.</p>



<p class="wp-block-paragraph">This article proceeds as follows: We briefly discuss the architecture of a multi-output neural network. After familiarizing ourselves with the model architecture, we develop a Keras neural network for multi-output regression. For data preparation, we perform various steps, including cleaning, splitting, selecting, and scaling the data. Afterward, we define a model architecture with multiple LSTM layers and ten output neurons in the last layer. This architecture enables the model to generate projections for ten consecutive steps. After configuring the model architecture, we train the model with the historical daily prices of the Apple stock. Finally, we use this model to generate a ten-day forecast.</p>



<div class="wp-block-group"><div class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained">
<div class="wp-block-kadence-infobox kt-info-box_317393-a1"><span class="kt-blocks-info-box-link-wrap info-box-link kt-blocks-info-box-media-align-top kt-info-halign-left"><div class="kt-infobox-textcontent"><h2 class="kt-blocks-info-box-title">Disclaimer</h2><p class="kt-blocks-info-box-text">This article does not constitute financial advice. Stock markets can be very volatile and are generally difficult to predict. Predictive models and other forms of analytics applied in this article only serve the purpose of illustrating machine learning use cases.</p></div></span></div>
</div></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<div style="height:5px" aria-hidden="true" class="wp-block-spacer"></div>



<h2 class="wp-block-heading" id="h-multi-output-regression-vs-single-output-regression">Multi-Output Regression vs. Single-Output Regression</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">In time series regression, we train a statistical model on the past values of a time series to make statements about how the time series develops further. During model training, we feed the model with so-called mini-batches and the corresponding target values. The model then creates forecasts for all input batches and compares these predictions to the actual target values to calculate the residuals (prediction errors). In this way, the model can adjust its parameters iteratively and learn to make better predictions.</p>



<p class="wp-block-paragraph">Multivariate forecasting models take into account multiple input variables, such as historical time series data and additional features like moving averages or momentum indicators, to improve the accuracy of their predictions. The idea is that these various variables can help the model identify patterns in the data that suggest future price movements.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"><div class="wp-block-image">
<figure class="alignright size-large is-resized"><img decoding="async" data-attachment-id="7569" data-permalink="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/multi-output-neural-networks-time-series-regression-architecture/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/04/multi-output-neural-networks-time-series-regression-architecture.png" data-orig-size="2017,1342" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="multi-output-neural-networks-time-series-regression-architecture" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/04/multi-output-neural-networks-time-series-regression-architecture.png" src="https://www.relataly.com/wp-content/uploads/2022/04/multi-output-neural-networks-time-series-regression-architecture-1024x681.png" alt="An exemplary architecture of a neural network with five input neurons (blue) and four output neurons (red), keras, python, tutorial, stock market prediction" class="wp-image-7569" width="371" height="247" srcset="https://www.relataly.com/wp-content/uploads/2022/04/multi-output-neural-networks-time-series-regression-architecture.png 1024w, https://www.relataly.com/wp-content/uploads/2022/04/multi-output-neural-networks-time-series-regression-architecture.png 300w, https://www.relataly.com/wp-content/uploads/2022/04/multi-output-neural-networks-time-series-regression-architecture.png 768w, https://www.relataly.com/wp-content/uploads/2022/04/multi-output-neural-networks-time-series-regression-architecture.png 1536w, https://www.relataly.com/wp-content/uploads/2022/04/multi-output-neural-networks-time-series-regression-architecture.png 2017w" sizes="(max-width: 371px) 100vw, 371px" /><figcaption class="wp-element-caption">An exemplary architecture of a neural network with five input neurons (blue) and four output neurons (red)</figcaption></figure>
</div></div>
</div>



<h2 class="wp-block-heading">The Architecture of a Neural Network with Multiple Outputs</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">Next, we will discuss the architecture of a neural network with multiple outputs. The architecture consists of several layers, including an input layer, several hidden layers, and an output layer. The number of neurons in the first layer must match the input data, and the number of neurons in the output layer determines the period length of the predictions. </p>



<p class="wp-block-paragraph">Models with a single neuron in the output layer are used to predict a single time step. It is possible to predict multiple price steps with a single-output model. It requires a <a href="https://www.relataly.com/multi-step-time-series-forecasting-a-step-by-step-guide/275/" target="_blank" rel="noreferrer noopener">rolling forecasting approach</a> in which the outputs are iteratively reused to make further-reaching predictions. However, this way is somewhat cumbersome. A more elegant way is to train a multi-output model right away.</p>



<figure class="wp-block-image size-large is-resized is-style-default"><img decoding="async" data-attachment-id="7586" data-permalink="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/architecture-neural-network-multi-output-regression-model/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/04/architecture-neural-network-multi-output-regression-model.png" data-orig-size="3395,1503" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="architecture-neural-network-multi-output-regression-model" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/04/architecture-neural-network-multi-output-regression-model.png" src="https://www.relataly.com/wp-content/uploads/2022/04/architecture-neural-network-multi-output-regression-model-1024x453.png" alt="The inputs and outputs of a neural network for time series regression with five input neurons and four outputs. Stock market forecasting" class="wp-image-7586" width="755" height="334" srcset="https://www.relataly.com/wp-content/uploads/2022/04/architecture-neural-network-multi-output-regression-model.png 1024w, https://www.relataly.com/wp-content/uploads/2022/04/architecture-neural-network-multi-output-regression-model.png 300w, https://www.relataly.com/wp-content/uploads/2022/04/architecture-neural-network-multi-output-regression-model.png 768w, https://www.relataly.com/wp-content/uploads/2022/04/architecture-neural-network-multi-output-regression-model.png 1536w, https://www.relataly.com/wp-content/uploads/2022/04/architecture-neural-network-multi-output-regression-model.png 2048w, https://www.relataly.com/wp-content/uploads/2022/04/architecture-neural-network-multi-output-regression-model.png 2475w" sizes="(max-width: 755px) 100vw, 755px" /><figcaption class="wp-element-caption">The inputs and outputs of a neural network for time series regression with five input neurons and four outputs</figcaption></figure>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<p class="wp-block-paragraph"> </p>



<h2 class="wp-block-heading">Training Neural Networks with Multiple Outputs</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">A model with multiple neurons in the output layer can predict numerous steps once per batch. Multi-output regression models train on many sequences of subsequent values, followed by the consecutive output sequence. The model architecture thus contains multiple neurons in the initial layer and various neurons in the output layer (as illustrated). </p>



<p class="wp-block-paragraph">In a multi-output regression model, each neuron in the output layer is responsible for predicting a different time step in the future. To train such a model, you need to provide a sequence of input data followed by the corresponding sequence of output data. For example, if you want to predict the stock price for the next ten days, you would provide a sequence of input data containing the historical stock prices for the past 50 days, followed by a sequence of output data containing the stock prices for the next 10 days.</p>



<p class="wp-block-paragraph">The model will then learn to map the input sequence to the output sequence so that it can make predictions for multiple time steps in the future based on the input data. </p>



<p class="wp-block-paragraph">In the next part of this tutorial, we will walk through the process of developing a multi-output regression model in more detail.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h2 class="wp-block-heading has-contrast-color has-text-color" id="h-implementing-a-neural-network-model-for-multi-output-multi-step-regression-in-python">Implementing a Neural Network Model for Multi-Output Multi-Step Regression in Python</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">Let&#8217;s get started with the hands-on Python part. In the following, we will develop a neural network with Keras and Tensorflow that forecasts the Apple stock price. To prepare the data for a neural network with multiple outputs in time series forecasting, we will spend the most time preparing it and bringing it into the right shape. Broadly this involves the following steps:</p>



<ol class="wp-block-list">
<li>Load the time series data that we want to use as input and output for your model. We use historical price data that is available via the yahoo finance API.</li>



<li>Then we split our data into training and testing sets. We will use the training set to fit the model and the testing set to evaluate the model&#8217;s performance.</li>



<li>Preprocess the data: This includes scaling the data and selecting relevant features.</li>



<li>Reshape the data and bring them into a format that can be input into the neural network. This involves converting the data into a 3D array for time series data.</li>



<li>Finally, we will train our model and generate the forecasting.</li>
</ol>



<p class="wp-block-paragraph">The code is available on the GitHub repository.</p>



<div class="wp-block-kadence-advancedbtn kb-buttons-wrap kb-btns_e5fd46-d1"><a class="kb-button kt-button button kb-btn_25b4a6-dd kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit kt-btn-has-text-true kt-btn-has-svg-true wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-tutorials/blob/master/01%20Time%20Series%20Forecasting%20%26%20Regression/006%20Multi-Output%20Regression.ipynb" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fe_eye kt-btn-icon-side-left"><svg viewBox="0 0 24 24"  fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M1 12s4-8 11-8 11 8 11 8-4 8-11 8-11-8-11-8z"/><circle cx="12" cy="12" r="3"/></svg></span><span class="kt-btn-inner-text">View on GitHub </span></a>

<a class="kb-button kt-button button kb-btn_3ee95e-9c kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit kt-btn-has-text-true kt-btn-has-svg-true wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-API-tutorials" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fa_github kt-btn-icon-side-left"><svg viewBox="0 0 496 512"  fill="currentColor" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg></span><span class="kt-btn-inner-text">Relataly GitHub Repo </span></a></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-large"><img decoding="async" width="512" height="337" data-attachment-id="12797" data-permalink="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/multiple_waterfalls_coming_from_a_single_waterfall/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/03/multiple_waterfalls_coming_from_a_single_waterfall.png" data-orig-size="768,505" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="multiple_waterfalls_coming_from_a_single_waterfall" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/multiple_waterfalls_coming_from_a_single_waterfall.png" src="https://www.relataly.com/wp-content/uploads/2023/03/multiple_waterfalls_coming_from_a_single_waterfall-512x337.png" alt="Neural network architectures with multiple outputs allow for more potent solutions but are more complex to train. Image created with Midjourney. Stock market forecasting, multi-output multi-step  regression, python" class="wp-image-12797" srcset="https://www.relataly.com/wp-content/uploads/2023/03/multiple_waterfalls_coming_from_a_single_waterfall.png 512w, https://www.relataly.com/wp-content/uploads/2023/03/multiple_waterfalls_coming_from_a_single_waterfall.png 300w, https://www.relataly.com/wp-content/uploads/2023/03/multiple_waterfalls_coming_from_a_single_waterfall.png 768w" sizes="(max-width: 512px) 100vw, 512px" /><figcaption class="wp-element-caption">Neural network architectures with multiple outputs allow for more potent solutions but are more complex to train. Image created with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a>.</figcaption></figure>



<p class="wp-block-paragraph"></p>
</div>
</div>



<h3 class="wp-block-heading" id="h-prerequisites">Prerequisites</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">Before beginning the coding part, ensure that you have set up your Python 3 environment and required packages. If you don&#8217;t have a Python environment, consider <a href="https://www.anaconda.com/products/individual" target="_blank" rel="noreferrer noopener">Anaconda</a>. To set it up, you can follow the steps in&nbsp;<a href="https://www.relataly.com/category/data-science/setup-anaconda-environment/" target="_blank" rel="noreferrer noopener">this tutorial</a>.</p>



<p class="wp-block-paragraph">Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:&nbsp;</p>



<ul class="wp-block-list">
<li><em><a href="https://pandas.pydata.org/" target="_blank" rel="noreferrer noopener">pandas</a></em></li>



<li><em><a href="https://numpy.org/" target="_blank" rel="noreferrer noopener">NumPy</a></em></li>



<li><em><a href="https://matplotlib.org/" target="_blank" rel="noreferrer noopener">matplotlib</a></em></li>
</ul>



<p class="wp-block-paragraph">In addition, we will be using the machine learning libraries Keras, Scikit-learn, and Tensorflow. For visualization, we will be using the Seaborn package.</p>



<p class="wp-block-paragraph">Please also have either the <a href="https://pandas-datareader.readthedocs.io/en/latest/" target="_blank" rel="noreferrer noopener">pandas_datareader</a> or the <a href="https://pypi.org/project/yfinance/" target="_blank" rel="noreferrer noopener">yfinance</a> package installed. You will use one of these packages to retrieve the historical stock quotes.</p>



<p class="wp-block-paragraph">You can install these packages using console commands:</p>



<ul class="wp-block-list">
<li><em>pip install &lt;package name&gt;</em></li>



<li><em>conda install &lt;package name&gt;</em>&nbsp;(if you are using the anaconda packet manager)</li>
</ul>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h3 class="wp-block-heading" id="h-step-1-load-the-data">Step #1: Load the Data</h3>



<p class="wp-block-paragraph">The Pandas DataReader library is our first choice for interacting with the yahoo finance API. If the library causes a problem (it sometimes does), you can also use the yfinance package, which should return the same data. We begin by loading historical price quotes of the Apple stock from the public yahoo finance API. Running the code below will load the data into a Pandas DataFrame.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># import pandas_datareader as webreader # Remote data access for pandas
import math # Mathematical functions 
import numpy as np # Fundamental package for scientific computing with Python
import pandas as pd # Additional functions for analysing and manipulating data
from datetime import date, timedelta, datetime # Date Functions
from pandas.plotting import register_matplotlib_converters # This function adds plotting functions for calender dates
import matplotlib.pyplot as plt # Important package for visualization - we use this to plot the market data
import matplotlib.dates as mdates # Formatting dates
from sklearn.metrics import mean_absolute_error, mean_squared_error # Packages for measuring model performance / errors
from keras.models import Sequential # Deep learning library, used for neural networks
from keras.layers import LSTM, Dense, Dropout # Deep learning classes for recurrent and regular densely-connected layers
from keras.callbacks import EarlyStopping # EarlyStopping during model training
from sklearn.preprocessing import RobustScaler, MinMaxScaler # This Scaler removes the median and scales the data according to the quantile range to normalize the price data 
import seaborn as sns

# from pandas_datareader.nasdaq_trader import get_nasdaq_symbols
# symbols = get_nasdaq_symbols()

# Setting the timeframe for the data extraction
today = date.today()
date_today = today.strftime(&quot;%Y-%m-%d&quot;)
date_start = '2010-01-01'

# Getting NASDAQ quotes
stockname = 'Apple'
symbol = 'AAPL'
# df = webreader.DataReader(
#     symbol, start=date_start, end=date_today, data_source=&quot;yahoo&quot;
# )

import yfinance as yf #Alternative package if webreader does not work: pip install yfinance
df = yf.download(symbol, start=date_start, end=date_today)

# # Create a quick overview of the dataset
df.head()</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Tensorflow Version: 2.6.0
Num GPUs: 1
[*********************100%***********************]  1 of 1 completed
			Open		High		Low			Close		Adj Close	Volume
Date						
2010-01-04	7.622500	7.660714	7.585000	7.643214	6.515213	493729600
2010-01-05	7.664286	7.699643	7.616071	7.656429	6.526477	601904800
2010-01-06	7.656429	7.686786	7.526786	7.534643	6.422666	552160000
2010-01-07	7.562500	7.571429	7.466071	7.520714	6.410791	477131200
2010-01-08	7.510714	7.571429	7.466429	7.570714	6.453413	447610800</pre></div>



<p class="wp-block-paragraph">The data should comprise the following columns:</p>



<ul class="wp-block-list">
<li>Close</li>



<li>Open</li>



<li>High</li>



<li>Low</li>



<li>Adj Close</li>



<li>Volume</li>
</ul>



<p class="wp-block-paragraph">The target variable that we are trying to predict is the Closing price (Close).</p>



<h3 class="wp-block-heading">Step #2: Explore the Data</h3>



<p class="wp-block-paragraph">Once we have loaded the data, we print a quick overview of the time-series data using different line graphs. The following code will plot a line chart for each column in df_plot using the <code>seaborn</code> library. The charts will be organized in a grid with nrows number of rows and ncols number of columns. The sharex parameter is set to True, which means that the x-axes of the subplots will be shared. The figsize parameter determines the size of the plot in inches.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Plot line charts
df_plot = df.copy()

ncols = 2
nrows = int(round(df_plot.shape[1] / ncols, 0))

fig, ax = plt.subplots(nrows=nrows, ncols=ncols, sharex=True, figsize=(14, 7))
for i, ax in enumerate(fig.axes):
        sns.lineplot(data = df_plot.iloc[:, i], ax=ax)
        ax.tick_params(axis=&quot;x&quot;, rotation=30, labelsize=10, length=0)
        ax.xaxis.set_major_locator(mdates.AutoDateLocator())
fig.tight_layout()
plt.show()</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="5805" data-permalink="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/image-5-14/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2021/07/image-5.png" data-orig-size="999,496" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-5" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2021/07/image-5.png" src="https://www.relataly.com/wp-content/uploads/2021/07/image-5.png" alt="The Apple Stock's historical price data, including quotes, highs, lows, and volume" class="wp-image-5805" width="817" height="406" srcset="https://www.relataly.com/wp-content/uploads/2021/07/image-5.png 999w, https://www.relataly.com/wp-content/uploads/2021/07/image-5.png 300w, https://www.relataly.com/wp-content/uploads/2021/07/image-5.png 768w" sizes="(max-width: 817px) 100vw, 817px" /></figure>



<p class="wp-block-paragraph">The line plots look as expected and reflect the Apple stock price history. Because we are fetching daily data from an API, please note that the lineplots will look different depending on when you run the code. </p>



<h3 class="wp-block-heading" id="h-step-3-preprocess-the-data">Step #3: Preprocess the Data</h3>



<p class="wp-block-paragraph">Next, we prepare the data for the training process of our multi-output forecasting model. Preparing the data for multivariate forecasting involves several steps: </p>



<ul class="wp-block-list">
<li>Selecting features for model training</li>



<li>Scaling and splitting the data into separate sets for training and testing</li>



<li>Slicing the time series into several shifted training batches</li>
</ul>



<p class="wp-block-paragraph">Remember that the steps are specific to our data and the use case. The steps required to prepare the data for a neural network with multiple outputs in time series forecasting will depend on the characteristics of your data and the requirements of your model. It is essential to consider these factors and tailor your data preparation accordingly and carefully.</p>



<h4 class="wp-block-heading">3.1 Basic Preparations</h4>



<p class="wp-block-paragraph">We begin by creating a copy of the initial data and resetting the index.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Indexing Batches
df_train = df.sort_values(by=['Date']).copy()

# We safe a copy of the dates index, before we need to reset it to numbers
date_index = df_train.index

# We reset the index, so we can convert the date-index to a number-index
df_train = df_train.reset_index(drop=True).copy()
df_train.head(5)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">			Open		High		Low			Close		Adj Close	Volume
Date						
2022-11-29	144.289993	144.809998	140.350006	141.169998	141.169998	83763800
2022-11-30	141.399994	148.720001	140.550003	148.029999	148.029999	111224400
2022-12-01	148.210007	149.130005	146.610001	148.309998	148.309998	71250400
2022-12-02	145.960007	148.000000	145.649994	147.809998	147.809998	65421400
2022-12-05	147.770004	150.919998	145.770004	146.630005	146.630005	68732400</pre></div>



<h4 class="wp-block-heading" id="h-3-2-feature-selection-and-scaling">3.2 Feature Selection and Scaling</h4>



<p class="wp-block-paragraph">We proceed with feature selection. To keep things simple, we will use the features from the input data without any modifications. After selecting the features, we scale them to a range between 0 and 1. To ease unscaling the predictions after training, we create two different scalers: One for the training data, which takes five columns, and one for the output data that scales a single column (the Close Price). I have covered <a href="https://www.relataly.com/feature-engineering-for-multivariate-time-series-models-with-python/1813/" target="_blank" rel="noreferrer noopener">feature engineering in a separate article</a> if you want to learn more about this topic.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">def prepare_data(df):

    # List of considered Features
    FEATURES = ['Open', 'High', 'Low', 'Close', 'Volume']

    print('FEATURE LIST')
    print([f for f in FEATURES])

    # Create the dataset with features and filter the data to the list of FEATURES
    df_filter = df[FEATURES]
    
    # Convert the data to numpy values
    np_filter_unscaled = np.array(df_filter)
    #np_filter_unscaled = np.reshape(np_unscaled, (df_filter.shape[0], -1))
    print(np_filter_unscaled.shape)

    np_c_unscaled = np.array(df['Close']).reshape(-1, 1)
    
    return np_filter_unscaled, np_c_unscaled, df_filter
    
np_filter_unscaled, np_c_unscaled, df_filter = prepare_data(df_train)
                                          
# Creating a separate scaler that works on a single column for scaling predictions
# Scale each feature to a range between 0 and 1
scaler_train = MinMaxScaler()
np_scaled = scaler_train.fit_transform(np_filter_unscaled)
    
# Create a separate scaler for a single column
scaler_pred = MinMaxScaler()
np_scaled_c = scaler_pred.fit_transform(np_c_unscaled)   </pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">FEATURE LIST
['Open', 'High', 'Low', 'Close', 'Volume']
(3254, 5)</pre></div>



<p class="wp-block-paragraph">The final step of the data preparation is to create the structure for the input data. This structure needs to match the input layer of the model architecture.</p>



<h4 class="wp-block-heading">3.3 Slicing the Data for a Model with Multiple In- and Outputs</h4>



<p class="wp-block-paragraph">The code below starts a sliding window process that cuts the initial time series data into multiple slices, i.e., mini-batches. Each batch is a smaller fraction of the initial time series shifted by a single step. Because we will feed our model with multivariate input data, the time series consists of five input columns/features. Each batch comprises a period of 50 steps from the time series and an output sequence of ten consecutive values. To validate that the batches have the right shape, we visualize mini-batches in a line graph with their consecutive target values. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Set the input_sequence_length length - this is the timeframe used to make a single prediction
input_sequence_length = 50
# The output sequence length is the number of steps that the neural network predicts
output_sequence_length = 10 #

# Prediction Index
index_Close = df_train.columns.get_loc(&quot;Close&quot;)

# Split the training data into train and train data sets
# As a first step, we get the number of rows to train the model on 80% of the data 
train_data_length = math.ceil(np_scaled.shape[0] * 0.8)

# Create the training and test data
train_data = np_scaled[:train_data_length, :]
test_data = np_scaled[train_data_length - input_sequence_length:, :]

# The RNN needs data with the format of [samples, time steps, features]
# Here, we create N samples, input_sequence_length time steps per sample, and f features
def partition_dataset(input_sequence_length, output_sequence_length, data):
    x, y = [], []
    data_len = data.shape[0]
    for i in range(input_sequence_length, data_len - output_sequence_length):
        x.append(data[i-input_sequence_length:i,:]) #contains input_sequence_length values 0-input_sequence_length * columns
        y.append(data[i:i + output_sequence_length, index_Close]) #contains the prediction values for validation (3rd column = Close),  for single-step prediction
    
    # Convert the x and y to numpy arrays
    x = np.array(x)
    y = np.array(y)
    return x, y

# Generate training data and test data
x_train, y_train = partition_dataset(input_sequence_length, output_sequence_length, train_data)
x_test, y_test = partition_dataset(input_sequence_length, output_sequence_length, test_data)

# Print the shapes: the result is: (rows, training_sequence, features) (prediction value, )
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# Validate that the prediction value and the input match up
# The last close price of the second input sample should equal the first prediction value
nrows = 3 # number of shifted plots
fig, ax = plt.subplots(nrows=nrows, ncols=1, figsize=(16, 8))
for i, ax in enumerate(fig.axes):
    xtrain = pd.DataFrame(x_train[i][:,index_Close], columns={f'x_train_{i}'})
    ytrain = pd.DataFrame(y_train[i][:output_sequence_length-1], columns={f'y_train_{i}'})
    ytrain.index = np.arange(input_sequence_length, input_sequence_length + output_sequence_length-1)
    xtrain_ = pd.concat([xtrain, ytrain[:1].rename(columns={ytrain.columns[0]:xtrain.columns[0]})])
    df_merge = pd.concat([xtrain_, ytrain])
    sns.lineplot(data = df_merge, ax=ax)
plt.show</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">(2544, 50, 5) (2544, 10)
(640, 50, 5) (640, 10)
&lt;function matplotlib.pyplot.show(close=None, block=None)&gt;</pre></div>



<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="8670" data-permalink="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/batch-test-visualizations-deep-neural-networks-for-multi-output-stock-market-forecasting/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/batch-test-visualizations-deep-neural-networks-for-multi-output-stock-market-forecasting.png" data-orig-size="939,465" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="batch-test-visualizations-deep-neural-networks-for-multi-output-stock-market-forecasting" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/batch-test-visualizations-deep-neural-networks-for-multi-output-stock-market-forecasting.png" src="https://www.relataly.com/wp-content/uploads/2022/05/batch-test-visualizations-deep-neural-networks-for-multi-output-stock-market-forecasting.png" alt="batch test visualizations, deep neural networks for multi-output stock market forecasting" class="wp-image-8670" width="891" height="441" srcset="https://www.relataly.com/wp-content/uploads/2022/05/batch-test-visualizations-deep-neural-networks-for-multi-output-stock-market-forecasting.png 939w, https://www.relataly.com/wp-content/uploads/2022/05/batch-test-visualizations-deep-neural-networks-for-multi-output-stock-market-forecasting.png 300w, https://www.relataly.com/wp-content/uploads/2022/05/batch-test-visualizations-deep-neural-networks-for-multi-output-stock-market-forecasting.png 768w" sizes="(max-width: 891px) 100vw, 891px" /></figure>



<h3 class="wp-block-heading" id="h-step-4-prepare-the-neural-network-architecture-and-train-the-multi-output-regression-model">Step #4: Prepare the Neural Network Architecture and Train the Multi-Output Regression Model</h3>



<p class="wp-block-paragraph">Now that we have the training data prepared and ready, the next step is to configure the architecture of the multi-out neural network. Because we will be using multiple input series, our model is, in fact, a multivariate architecture so that it corresponds to the input training batches. </p>



<h4 class="wp-block-heading" id="h-4-1-configuring-and-training-the-model">4.1 Configuring and Training the Model</h4>



<p class="wp-block-paragraph">We choose a comparably simple architecture with only two LSTM layers and two additional dense layers. The first dense layer has 20 neurons, and the second layer is the output layer, which has ten output neurons. If you wonder how I got to the number of neurons in the third layer, I conducted several experiments and found that this number leads to solid results. </p>



<p class="wp-block-paragraph">To ensure that the architecture matches our input data&#8217;s structure, we reuse the variables for the previous code section (n_input_neurons, n_output_neurons. The input sequence length is 50, and the output sequence (the steps for the period we want to predict) is ten.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Configure the neural network model
model = Sequential()
n_output_neurons = output_sequence_length

# Model with n_neurons = inputshape Timestamps, each with x_train.shape[2] variables
n_input_neurons = x_train.shape[1] * x_train.shape[2]
print(n_input_neurons, x_train.shape[1], x_train.shape[2])
model.add(LSTM(n_input_neurons, return_sequences=True, input_shape=(x_train.shape[1], x_train.shape[2]))) 
model.add(LSTM(n_input_neurons, return_sequences=False))
model.add(Dense(20))
model.add(Dense(n_output_neurons))

# Compile the model
model.compile(optimizer='adam', loss='mse')</pre></div>



<p class="wp-block-paragraph">After configuring the model architecture, we can initiate the training process and illustrate how the loss develops over the training epochs. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Training the model
epochs = 10
batch_size = 16
early_stop = EarlyStopping(monitor='loss', patience=5, verbose=1)
history = model.fit(x_train, y_train, 
                    batch_size=batch_size, 
                    epochs=epochs,
                    validation_data=(x_test, y_test)
                   )
                    
                    #callbacks=[early_stop])</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Epoch 1/5
159/159 [==============================] - 7s 14ms/step - loss: 0.0047 - val_loss: 0.0262
Epoch 2/5
159/159 [==============================] - 2s 11ms/step - loss: 3.6759e-04 - val_loss: 0.0097
Epoch 3/5
159/159 [==============================] - 2s 11ms/step - loss: 1.5222e-04 - val_loss: 0.0056
Epoch 4/5
159/159 [==============================] - 2s 11ms/step - loss: 1.0327e-04 - val_loss: 0.0031
Epoch 5/5
159/159 [==============================] - 2s 11ms/step - loss: 1.1690e-04 - val_loss: 0.0026</pre></div>



<h4 class="wp-block-heading" id="h-4-2-loss-curve">4.2 Loss Curve</h4>



<p class="wp-block-paragraph">Next, we plot the loss curve, which represents the amount of error between the model&#8217;s predicted values and the actual values in the training data. A lower loss value indicates that the model makes more accurate predictions on the training data. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Plot training &amp; validation loss values
fig, ax = plt.subplots(figsize=(10, 5), sharex=True)
plt.plot(history.history[&quot;loss&quot;])
plt.title(&quot;Model loss&quot;)
plt.ylabel(&quot;Loss&quot;)
plt.xlabel(&quot;Epoch&quot;)
ax.xaxis.set_major_locator(plt.MaxNLocator(epochs))
plt.legend([&quot;Train&quot;, &quot;Test&quot;], loc=&quot;upper left&quot;)
plt.grid()
plt.show()</pre></div>



<figure class="wp-block-image size-full is-resized is-style-default"><img decoding="async" data-attachment-id="5856" data-permalink="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/image-4-16/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2021/08/image-4.png" data-orig-size="628,333" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-4" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2021/08/image-4.png" src="https://www.relataly.com/wp-content/uploads/2021/08/image-4.png" alt="loss curve after training the multi-output neural network, keras, multi-step time series regression" class="wp-image-5856" width="720" height="381" srcset="https://www.relataly.com/wp-content/uploads/2021/08/image-4.png 628w, https://www.relataly.com/wp-content/uploads/2021/08/image-4.png 300w" sizes="(max-width: 720px) 100vw, 720px" /></figure>



<p class="wp-block-paragraph">As we can see, the loss curve drops quickly during training, which typically means that the model is quickly learning to make accurate predictions.</p>



<h3 class="wp-block-heading" id="h-step-5-evaluate-model-performance">Step #5 Evaluate Model Performance</h3>



<p class="wp-block-paragraph">Now that we have trained the model, we can make forecasts on the test data and use traditional regression metrics such as the MAE, MAPE, or MDAPE to measure the performance of our model. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Get the predicted values
y_pred_scaled = model.predict(x_test)

# Unscale the predicted values
y_pred = scaler_pred.inverse_transform(y_pred_scaled)
y_test_unscaled = scaler_pred.inverse_transform(y_test).reshape(-1, output_sequence_length)

# Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_unscaled, y_pred)
print(f'Median Absolute Error (MAE): {np.round(MAE, 2)}')

# Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled))) * 100
print(f'Mean Absolute Percentage Error (MAPE): {np.round(MAPE, 2)} %')

# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled)) ) * 100
print(f'Median Absolute Percentage Error (MDAPE): {np.round(MDAPE, 2)} %')


def prepare_df(i, x, y, y_pred_unscaled):
    # Undo the scaling on x, reshape the testset into a one-dimensional array, so that it fits to the pred scaler
    x_test_unscaled_df = pd.DataFrame(scaler_pred.inverse_transform((x[i]))[:,index_Close]).rename(columns={0:'x_test'})
    
    y_test_unscaled_df = []
    # Undo the scaling on y
    if type(y) == np.ndarray:
        y_test_unscaled_df = pd.DataFrame(scaler_pred.inverse_transform(y)[i]).rename(columns={0:'y_test'})

    # Create a dataframe for the y_pred at position i, y_pred is already unscaled
    y_pred_df = pd.DataFrame(y_pred_unscaled[i]).rename(columns={0:'y_pred'})
    return x_test_unscaled_df, y_pred_df, y_test_unscaled_df


def plot_multi_test_forecast(x_test_unscaled_df, y_test_unscaled_df, y_pred_df, title): 
    # Package y_pred_unscaled and y_test_unscaled into a dataframe with columns pred and true   
    if type(y_test_unscaled_df) == pd.core.frame.DataFrame:
        df_merge = y_pred_df.join(y_test_unscaled_df, how='left')
    else:
        df_merge = y_pred_df.copy()
    
    # Merge the dataframes 
    df_merge_ = pd.concat([x_test_unscaled_df, df_merge]).reset_index(drop=True)
    
    # Plot the linecharts
    fig, ax = plt.subplots(figsize=(20, 8))
    plt.title(title, fontsize=12)
    ax.set(ylabel = stockname + &quot;_stock_price_quotes&quot;)
    sns.lineplot(data = df_merge_, linewidth=2.0, ax=ax)

# Creates a linechart for a specific test batch_number and corresponding test predictions
batch_number = 50
x_test_unscaled_df, y_pred_df, y_test_unscaled_df = prepare_df(i, x_test, y_test, y_pred)
title = f&quot;Predictions vs y_test - test batch number {batch_number}&quot;
plot_multi_test_forecast(x_test_unscaled_df, y_test_unscaled_df, y_pred_df, title) </pre></div>



<figure class="wp-block-image size-large is-style-default"><img decoding="async" width="1024" height="419" data-attachment-id="8667" data-permalink="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/multioutput-regression-deep-neural-networks-test-predictions/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks-test-predictions.png" data-orig-size="1170,479" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="multioutput-regression-deep-neural-networks-test-predictions" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks-test-predictions.png" src="https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks-test-predictions-1024x419.png" alt="A line chart showing predictions and test data, multioutput regression deep neural networks test predictions" class="wp-image-8667" srcset="https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks-test-predictions.png 1024w, https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks-test-predictions.png 300w, https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks-test-predictions.png 768w, https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks-test-predictions.png 1170w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p class="wp-block-paragraph">The quality of the predictions is acceptable, considering that this tutorial aimed not to achieve excellent predictions but to demonstrate the process and architecture of training a multi-output regression. So, there is certainly room for improvement. Feel free to experiment with different features or try other hyperparameters and neural network layers.</p>



<h3 class="wp-block-heading" id="h-step-6-create-a-new-forecast">Step #6 Create a New Forecast</h3>



<p class="wp-block-paragraph">Finally, let&#8217;s create a forecast on a new dataset. We take the scaled dataset from section 2 (np_scaled) and extract a series with the latest 50 values. The data is reshaped into a 3D array with shape (1, 50, 5) to match the expected input shape of the model. We use these values to generate a new prediction for the next ten days using the predict method. We store the result in the y_pred_scaled variable. In addition, we need to transform the predictions back to the original scale. We do this by using the inverse_transform method of the scaler_pred object, which was fit on the training data. Finally, we visualize the multi-step forecast in another line chart. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Get the latest input batch from the test dataset, which is contains the price values for the last ten trading days
x_test_latest_batch = np_scaled[-50:,:].reshape(1,50,5)

# Predict on the batch
y_pred_scaled = model.predict(x_test_latest_batch)
y_pred_unscaled = scaler_pred.inverse_transform(y_pred_scaled)

# Prepare the data and plot the input data and the predictions
x_test_unscaled_df, y_test_unscaled_df, _ = prepare_df(0, x_test_latest_batch, '', y_pred_unscaled)
plot_multi_test_forecast(x_test_unscaled_df, '', y_test_unscaled_df, &quot;x_new Vs. y_new_pred&quot;)</pre></div>



<figure class="wp-block-image size-large is-style-default"><img decoding="async" width="1024" height="420" data-attachment-id="8666" data-permalink="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/multioutput-regression-deep-neural-networks/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks.png" data-orig-size="1167,479" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="multioutput-regression-deep-neural-networks" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks.png" src="https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks-1024x420.png" alt="multioutput regression deep neural networks - x_text and y_pred" class="wp-image-8666" srcset="https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks.png 1024w, https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks.png 300w, https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks.png 768w, https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks.png 1167w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p class="wp-block-paragraph"></p>



<h2 class="wp-block-heading" id="h-summary">Summary</h2>



<p class="wp-block-paragraph">In this tutorial, we demonstrated how to use multiple output neural networks to make predictions at different time steps. We first discussed the architecture of a recurrent neural network and how it can be used to process sequential data. We then showed how to properly preprocess the data and split it into training and test sets for training a multi-output regression model.</p>



<p class="wp-block-paragraph">Next, we trained a model to predict the stock price of Apple ten steps into the future using historical data. We also discussed how to use the trained model to make multi-step predictions on new data and how to visualize the results.</p>



<p class="wp-block-paragraph">To further improve the performance of the model, you can experiment with different hyperparameters and adjust the model architecture. For example, adding more neurons to the output layers will increase the prediction horizon, but remember that prediction error will also increase as the horizon lengthens. You can also try using different activation functions or adding more layers to the model to see how it affects the performance.</p>



<p class="wp-block-paragraph">I hope this article was helpful in understanding multi-output neural networks better. If you have any questions or comments, please let me know.</p>



<h2 class="wp-block-heading" id="h-sources-and-further-reading">Sources and Further Reading</h2>



<ol class="wp-block-list">
<li><a href="https://amzn.to/3MyU6Tj" target="_blank" rel="noreferrer noopener">Charu C. Aggarwal (2018) Neural Networks and Deep Learning</a></li>



<li><a href="https://amzn.to/3yIQdWi" target="_blank" rel="noreferrer noopener">Jansen (2020) Machine Learning for Algorithmic Trading: Predictive models to extract signals from market and alternative data for systematic trading strategies with Python</a></li>



<li><a href="https://amzn.to/3S9Nfkl" target="_blank" rel="noreferrer noopener">Aurélien Géron (2019) Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems </a></li>



<li><a href="https://amzn.to/3EKidwE" target="_blank" rel="noreferrer noopener">David Forsyth (2019) Applied Machine Learning Springer</a></li>



<li><a href="https://amzn.to/3MAy8j5" target="_blank" rel="noreferrer noopener">Andriy Burkov (2020) Machine Learning Engineering</a></li>
</ol>



<p class="has-contrast-2-color has-base-3-background-color has-text-color has-background wp-block-paragraph"><em>The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.</em></p>



<p class="wp-block-paragraph">If you want to learn about an alternative approach to univariate stock market forecasting, consider taking a look <a href="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/" target="_blank" rel="noreferrer noopener">at Facebook Prophet</a> or <a href="https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/" target="_blank" rel="noreferrer noopener">ARIMA models</a></p>
<p>The post <a href="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/">Stock Market Forecasting Neural Networks for Multi-Output Regression in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/feed/</wfw:commentRss>
			<slash:comments>31</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">5800</post-id>	</item>
		<item>
		<title>Mastering Multivariate Stock Market Prediction with Python: A Guide to Effective Feature Engineering Techniques</title>
		<link>https://www.relataly.com/feature-engineering-for-multivariate-time-series-models-with-python/1813/</link>
					<comments>https://www.relataly.com/feature-engineering-for-multivariate-time-series-models-with-python/1813/#comments</comments>
		
		<dc:creator><![CDATA[Florian Follonier]]></dc:creator>
		<pubDate>Mon, 29 Jun 2020 21:47:28 +0000</pubDate>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Feature Engineering]]></category>
		<category><![CDATA[Finance]]></category>
		<category><![CDATA[Keras]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Neural Networks]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Recurrent Neural Networks]]></category>
		<category><![CDATA[Stock Market Forecasting]]></category>
		<category><![CDATA[Tensorflow]]></category>
		<category><![CDATA[Time Series Forecasting]]></category>
		<category><![CDATA[Use Cases]]></category>
		<category><![CDATA[Yahoo Finance API]]></category>
		<category><![CDATA[Deep Learning]]></category>
		<category><![CDATA[Feature Engineering for Time Series Forecasting]]></category>
		<category><![CDATA[Intermediate Tutorials]]></category>
		<category><![CDATA[Supervised Learning]]></category>
		<guid isPermaLink="false">https://www.relataly.com/?p=1813</guid>

					<description><![CDATA[<p>Are you interested in learning how multivariate forecasting models can enhance the accuracy of stock market predictions? Look no further! While traditional time series data provides valuable insights into historical trends, multivariate forecasting models utilize additional features to identify patterns and predict future price movements. This process, known as &#8220;feature engineering,&#8221; is a crucial step ... <a title="Mastering Multivariate Stock Market Prediction with Python: A Guide to Effective Feature Engineering Techniques" class="read-more" href="https://www.relataly.com/feature-engineering-for-multivariate-time-series-models-with-python/1813/" aria-label="Read more about Mastering Multivariate Stock Market Prediction with Python: A Guide to Effective Feature Engineering Techniques">Read more</a></p>
<p>The post <a href="https://www.relataly.com/feature-engineering-for-multivariate-time-series-models-with-python/1813/">Mastering Multivariate Stock Market Prediction with Python: A Guide to Effective Feature Engineering Techniques</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:60%">
<p class="wp-block-paragraph">Are you interested in learning how multivariate forecasting models can enhance the accuracy of stock market predictions? Look no further! While traditional time series data provides valuable insights into historical trends, multivariate forecasting models utilize additional features to identify patterns and predict future price movements. This process, known as &#8220;feature engineering,&#8221; is a crucial step in creating accurate stock market forecasts.</p>



<p class="wp-block-paragraph">In this article, we dive into the world of feature engineering and demonstrate how it can improve stock market predictions. We explore popular financial analysis metrics, including Bollinger bands, RSI, and Moving Averages, and show how they can be used to create powerful forecasting models.</p>



<p class="wp-block-paragraph">But we don&#8217;t just stop at theory. We provide a hands-on tutorial using Python to prepare and analyze time-series data for stock market forecasting. We leverage the power of recurrent neural networks with LSTM layers, based on the Keras library, to train and test different model variations with various feature combinations.</p>



<p class="wp-block-paragraph">By the end of this article, you&#8217;ll have a thorough understanding of feature engineering and how it can improve the accuracy of stock market predictions. So, buckle up and get ready to discover how multivariate forecasting models can take your stock market analysis to the next level!</p>



<p class="wp-block-paragraph"><strong>New to time series modeling?</strong><br>Consider starting with the following tutorial on univariate time series models: <a href="https://www.relataly.com/stock-market-prediction-with-multivariate-time-series-in-python/1815/" target="_blank" rel="noreferrer noopener">Stock-market forecasting using Keras Recurrent Neural Networks and Python</a>. </p>



<p class="has-accent-color has-blush-light-purple-gradient-background has-text-color has-background wp-block-paragraph"><strong>Disclaimer</strong>: This article does not constitute financial advice. Stock markets can be very volatile and are generally difficult to predict. Predictive models and other forms of analytics applied in this article only illustrate machine learning use cases.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<figure class="wp-block-image size-full"><img decoding="async" width="503" height="503" data-attachment-id="13113" data-permalink="https://www.relataly.com/feature-engineering-for-multivariate-time-series-models-with-python/1813/multivariate-engineering-for-time-series-analysis-in-python-min/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/03/multivariate-engineering-for-time-series-analysis-in-python-min.png" data-orig-size="503,503" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="multivariate-engineering-for-time-series-analysis-in-python-min" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/multivariate-engineering-for-time-series-analysis-in-python-min.png" src="https://www.relataly.com/wp-content/uploads/2023/03/multivariate-engineering-for-time-series-analysis-in-python-min.png" alt="A cartoon-style illustration of a cute animal, possibly a raccoon, sitting at a desk and working on a laptop. The animal is wearing glasses and appears to be focused on a screen displaying graphs and charts related to time series analysis. In the background, there are books, a clock, and other office supplies. This image represents the concept of feature engineering, a process of selecting and transforming data features to improve machine learning models, particularly for time series data. Midjourney" class="wp-image-13113" srcset="https://www.relataly.com/wp-content/uploads/2023/03/multivariate-engineering-for-time-series-analysis-in-python-min.png 503w, https://www.relataly.com/wp-content/uploads/2023/03/multivariate-engineering-for-time-series-analysis-in-python-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/03/multivariate-engineering-for-time-series-analysis-in-python-min.png 140w" sizes="(max-width: 503px) 100vw, 503px" /><figcaption class="wp-element-caption">Squirrels mastered the art of multivariate feature engineering for time series analysis a long time ago. You can do it too! Image generated with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a></figcaption></figure>
</div>
</div>



<h2 class="wp-block-heading" id="h-feature-engineering-for-stock-market-forecasting-borrowing-features-from-chart-analysis">Feature Engineering for Stock Market Forecasting &#8211; Borrowing Features from Chart Analysis</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">The idea behind multivariate time series models is to feed the model with additional features that improve prediction quality. An example of such an additional feature is a &#8220;moving average.&#8221; Adding more features does not automatically improve predictive performance but increases the time needed to train the models. The challenge is to find the right combination of features and to create an input form that allows the model to recognize meaningful patterns. There is no way around conducting experiments and trying out feature combinations. This process of trial and error can be time-consuming. It is, therefore, helping to build upon established indicators.</p>



<p class="wp-block-paragraph">In stock market forecasting, we can use indicators from chart analysis. This domain forecasts future prices by studying historical prices and trading volume. The underlying idea is that specific patterns or chart formations in the data can signal the timing of beneficial buying or selling decisions. We can borrow indicators from this discipline and use them as input features. </p>



<p class="wp-block-paragraph">When we develop predictive machine learning models, the difference from chart analysis is that we do not aim to analyze the chart ourselves manually, but try to create a machine learning model, for example, a recurrent neural network, that does the job for us. </p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image is-resized"><img decoding="async" src="https://www.relataly.com/wp-content/uploads/2020/06/image-28.png" alt="Feature engineering for multivariate stock market prediction - A multivariate time series forecast. Keras, Scikit-Learn, Python, Tutorial" width="383" height="200"/><figcaption class="wp-element-caption">A multivariate time-series forecast, as we will create it in this article. Exemplary chart with technical indicators (Bollinger bands, RSI, and Double-EMA)</figcaption></figure>
</div>
</div>



<h2 class="wp-block-heading" id="h-stock-market-forecasting-does-this-really-work">Stock Market Forecasting &#8211; Does this really Work?</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">It is essential to point out that the effectiveness of chart analysis and algorithmic trading is controversial. There is at least as much controversy about whether it is possible to predict the price of stock markets with neural networks. Various studies and researchers have examined the effectiveness of chart analysis with different results. One of the most significant points of criticism is that it cannot take external events into account. Nevertheless, many financial analysts consider financial indicators when making investment decisions, so a lot of money is moved simply because many people believe in statistical indicators. </p>



<p class="wp-block-paragraph">So without knowing how well this will work, it is worth an attempt to feed a neural network with different financial indicators. But first and foremost, I see this as an excellent way to show how feature engineering works. Just make sure not to rely on the predictions of these models blindly. </p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<p class="wp-block-paragraph">Also: <a href="https://www.relataly.com/univariate-stock-market-forecasting-using-a-recurrent-neural-network/122/" target="_blank" rel="noreferrer noopener">Stock Market Prediction using Univariate Recurrent Neural Networks</a></p>



<h2 class="wp-block-heading" id="h-selected-statistical-indicators">Selected Statistical Indicators</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">The following indicators are commonly used in chart analysis and may be helpful when creating forecasting models:</p>



<ul class="wp-block-list">
<li>Relative Strength Index</li>



<li>Simple Moving Averages</li>



<li>Exponential Moving Averages</li>



<li>Bolliger Bands</li>
</ul>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h3 class="wp-block-heading" id="h-relative-strength-index-rsi">Relative Strength Index (RSI)</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">The Relative Strength Index (RSI) is one of the most commonly used oscillating indicators. In 1978, Welles Wilder developed it to determine the momentum of price movements and compare the strength of price losses in a period with price gains. It can take percentage values between 0 and 100. </p>



<p class="wp-block-paragraph">Reference lines determine how long an existing trend will last before expecting a trend reversal. In other words, when the price is heavily oversold or overbought, one should expect a trend reversal.</p>



<ul class="wp-block-list">
<li>The reference line is at 40% (oversold) and 80% (overbought) with an upward trend.</li>



<li>The reference line is at 20% (oversold) and 60% (overbought) with a downtrend.</li>
</ul>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<p class="wp-block-paragraph">The formula for the RSI is as follows:</p>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:25.5%">
<ul class="wp-block-list">
<li>Calculate the sum of all positive and negative price changes in a period (e.g., 30 days):</li>



<li>We then calculate the mean value of the sums with the following formula:</li>



<li>Finally, we calculate the RSI with the following formula:</li>
</ul>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:55.9%">
<figure class="wp-block-image is-resized"><img decoding="async" src="https://wikimedia.org/api/rest_v1/media/math/render/svg/4fc95bb85e82212cece770cb561766b2f4b2b579" alt="feature engineering for stock price prediction:  formula for the rsi, Keras, Scikit-Learn, Python, Tutorial" width="321" height="84"/></figure>



<figure class="wp-block-image is-resized"><img decoding="async" src="https://wikimedia.org/api/rest_v1/media/math/render/svg/91d80c46471846096df7dec9be671572c7b7e064" alt="feature engineering for stock price prediction: formula for the rsi, Keras, Scikit-Learn, Python, Tutorial" width="144" height="68"/></figure>



<figure class="wp-block-image is-resized"><img decoding="async" src="https://wikimedia.org/api/rest_v1/media/math/render/svg/bd24e0da167456b367d13ec0327eca724feecc58" alt="feature engineering for stock price prediction:  formula for the rsi,Keras, Scikit-Learn, Python, Tutorial" width="148" height="36"/></figure>



<p class="wp-block-paragraph"></p>
</div>
</div>



<h3 class="wp-block-heading" id="h-simple-moving-averages-sma">Simple Moving Averages (SMA)</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">Simple Moving Averages (SMA) is another technical indicator that financial analysts use to determine if a price trend will continue or reverse. The SMA is the average sum of all values within a certain period. Financial analysts pay close attention to the 200-day SMA (SMA-200). When the price crosses the SMA, this may signal a trend reversal. Furthermore, we often use SMAs for 50 (SMA-50) and 100 days (SMA-100) periods. In this regard, two popular trading patterns include the death cross and a golden cross. </p>



<ul class="wp-block-list">
<li>A&nbsp;death cross&nbsp;occurs when the trend line of the SMA-50/100 crosses below the 200-day SMA.&nbsp;This suggests that a falling trend will likely accelerate downwards.</li>



<li>A golden cross occurs when the trend line of the SMA-50/100 crosses over the 200-day SMA, suggesting a rising trend will likely accelerate upwards.</li>
</ul>



<p class="wp-block-paragraph">We can use the SMA in the input shape of our model simply by measuring the distance between two trendlines.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h3 class="wp-block-heading" id="h-exponential-moving-averages-ema">Exponential Moving Averages (EMA)</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">The exponential moving average (EMA) is another lagging trend indicator. Like the SMA, the EMA measures the strength of a price trend. The difference between SMA and EMA is that the SMA assigns equal values to all price points, while the EMA uses a multiplier that weights recent prices higher.</p>



<p class="wp-block-paragraph">The formula for the EMA is as follows: Calculating the EMA for a given data point requires past price values. For example, to calculate the SMA for today, based on 30 past values, we calculate the average price values for the past 30 days. We then multiply the result by a weighting factor that weighs the EMA. The formula for this multiplier is as follows: Smoothing factor / (1+ days)</p>



<p class="wp-block-paragraph">It is common to use different smoothing factors. For a 30-day moving average, the multiplier would be [2/(30+1)]= 0.064. </p>



<p class="wp-block-paragraph">As soon as we have calculated the EMA for the first data point, we can use the following formula to calculate the ema for all subsequent data points: EMA = Closing price x multiplier + EMA (previous day) x (1-multiplier)</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h3 class="wp-block-heading">Bollinger Bands</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">Bollinger Bands are a popular technical analysis tool used to identify market volatility and potential price movements in financial markets. They are named after their creator, John Bollinger.</p>



<p class="wp-block-paragraph">Bollinger Bands consist of three lines that are plotted on a price chart. The middle line is a simple moving average (SMA) of the asset price over a specified period (typically 20 days). The upper and lower lines are calculated by adding and subtracting a multiple (usually two) of the standard deviation of the asset price from the middle line.</p>



<p class="wp-block-paragraph">The upper band is calculated as: Middle band + (2 x Standard deviation) The lower band is calculated as: Middle band &#8211; (2 x Standard deviation)</p>



<p class="wp-block-paragraph">The standard deviation is a measure of how much the asset price deviates from the average. When the asset price is more volatile, the bands widen, and when the price is less volatile, the bands narrow.</p>



<p class="wp-block-paragraph">Traders use Bollinger Bands to identify potential buy or sell signals. When the price touches or crosses the upper band, it may be a sell signal, indicating that the asset is overbought. Conversely, when the price touches or crosses the lower band, it may be a buy signal, indicating that the asset is oversold.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h2 class="wp-block-heading" id="h-feature-engineering-for-time-series-prediction-models-in-python">Feature Engineering for Time Series Prediction Models in Python</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">In the following, this tutorial will guide you through the process of implementing a multivariate time series prediction model for the NASDAQ stock market index. Our aim is to equip you with the knowledge and practical skills required to create a powerful predictive model that can effectively forecast stock prices.</p>



<p class="wp-block-paragraph">Throughout this tutorial, we will take you through a step-by-step approach to building a multivariate time series prediction model. You will learn how to implement and utilize different features to train and measure the performance of your model. Our goal is to ensure that you are not only able to understand the underlying concepts of multivariate time series prediction, but that you are also capable of applying these concepts in a practical setting.</p>



<p class="wp-block-paragraph">The code is available on the GitHub repository.</p>



<div class="wp-block-kadence-advancedbtn kb-buttons-wrap kb-btns_f47875-58"><a class="kb-button kt-button button kb-btn_a01882-be kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit kt-btn-has-text-true kt-btn-has-svg-true wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-tutorials/blob/master/01%20Time%20Series%20Forecasting%20%26%20Regression/008%20Feature%20Engineering%20for%20Multivariate%20Models.ipynb" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fe_eye kt-btn-icon-side-left"><svg viewBox="0 0 24 24"  fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M1 12s4-8 11-8 11 8 11 8-4 8-11 8-11-8-11-8z"/><circle cx="12" cy="12" r="3"/></svg></span><span class="kt-btn-inner-text">View on GitHub </span></a>

<a class="kb-button kt-button button kb-btn_35290c-df kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit kt-btn-has-text-true kt-btn-has-svg-true wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-API-tutorials" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fa_github kt-btn-icon-side-left"><svg viewBox="0 0 496 512"  fill="currentColor" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg></span><span class="kt-btn-inner-text">Relataly GitHub Repo </span></a></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img decoding="async" width="496" height="500" data-attachment-id="12671" data-permalink="https://www.relataly.com/robot-artificial-intelligence-colorful-midjourney-relataly-min/" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/03/robot-artificial-intelligence-colorful-midjourney-relataly-min.png" data-orig-size="496,500" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="robot artificial intelligence colorful midjourney relataly-min" data-image-description="&lt;p&gt;Let&amp;#8217;s do some feature engineering for machine learning!&lt;/p&gt;
" data-image-caption="&lt;p&gt;Let&amp;#8217;s do some feature engineering for machine learning!&lt;/p&gt;
" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/robot-artificial-intelligence-colorful-midjourney-relataly-min.png" src="https://www.relataly.com/wp-content/uploads/2023/03/robot-artificial-intelligence-colorful-midjourney-relataly-min.png" alt="Let's do some feature engineering for machine learning!" class="wp-image-12671" srcset="https://www.relataly.com/wp-content/uploads/2023/03/robot-artificial-intelligence-colorful-midjourney-relataly-min.png 496w, https://www.relataly.com/wp-content/uploads/2023/03/robot-artificial-intelligence-colorful-midjourney-relataly-min.png 298w, https://www.relataly.com/wp-content/uploads/2023/03/robot-artificial-intelligence-colorful-midjourney-relataly-min.png 140w" sizes="(max-width: 496px) 100vw, 496px" /><figcaption class="wp-element-caption">Let&#8217;s do some feature engineering for machine learning!</figcaption></figure>
</div>
</div>



<h3 class="wp-block-heading" id="h-prerequisites">Prerequisites</h3>



<p class="wp-block-paragraph">Before starting the coding part, make sure that you have set up your <a href="https://www.python.org/downloads/" target="_blank" rel="noreferrer noopener">Python 3</a> environment and required packages. If you don&#8217;t have an environment, follow&nbsp;<a href="https://www.relataly.com/anaconda-python-environment-machine-learning/1663/" target="_blank" rel="noreferrer noopener">this tutorial</a>&nbsp;to set up the&nbsp;<a href="https://www.anaconda.com/products/individual" target="_blank" rel="noreferrer noopener">Anaconda environment</a>.</p>



<p class="wp-block-paragraph">Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:&nbsp;</p>



<ul class="wp-block-list">
<li><em><a href="https://pandas.pydata.org/" target="_blank" rel="noreferrer noopener">pandas</a></em></li>



<li><em><a href="https://numpy.org/" target="_blank" rel="noreferrer noopener">NumPy</a></em></li>



<li><a href="https://docs.python.org/3/library/math.html" target="_blank" rel="noreferrer noopener"><em>math</em></a></li>



<li><em><a href="https://matplotlib.org/" target="_blank" rel="noreferrer noopener">matplotlib</a></em></li>



<li><a href="https://seaborn.pydata.org/" target="_blank" rel="noreferrer noopener">Seaborn</a></li>
</ul>



<p class="wp-block-paragraph">In addition, we will be using <em><a href="https://keras.io/" target="_blank" rel="noreferrer noopener">Keras</a></em>&nbsp;(2.0 or higher) with Tensorflow backend to train the neural network, the machine learning library scikit-learn, and the <a href="https://pandas-datareader.readthedocs.io/en/latest/" target="_blank" rel="noreferrer noopener">pandas-DataReader</a>. You can install these packages using the following console commands:</p>



<ul class="wp-block-list">
<li><em>pip install &lt;package name&gt;</em></li>



<li><em>conda install &lt;package name&gt;</em>&nbsp;(if you are using the anaconda packet manager)</li>
</ul>



<h3 class="wp-block-heading" id="h-step-1-load-the-data">Step #1 Load the Data</h3>



<p class="wp-block-paragraph">Let&#8217;s start by setting up the imports and loading the data. Our Python project will use price data from the&nbsp;<a href="https://en.wikipedia.org/wiki/Nasdaq" target="_blank" rel="noreferrer noopener">NASDAQ</a>&nbsp;composite index&nbsp;<strong>(symbol: ^IXIC)</strong>&nbsp;from yahoo.finance.com.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Time Series Forecasting - Feature Engineering For Multivariate Models (Stock Market Prediction Example)
# A tutorial for this file is available at www.relataly.com

import math # Mathematical functions  
import numpy as np # Fundamental package for scientific computing with Python 
import pandas as pd # Additional functions for analysing and manipulating data 
from datetime import date # Date Functions 
import matplotlib.pyplot as plt # Important package for visualization - we use this to plot the market data 
import matplotlib.dates as mdates # Formatting dates 
from sklearn.metrics import mean_absolute_error, mean_squared_error # Packages for measuring model performance / errors 
import tensorflow as tf
from tensorflow.keras.models import Sequential # Deep learning library, used for neural networks 
from tensorflow.keras.layers import LSTM, Dense, Dropout # Deep learning classes for recurrent and regular densely-connected layers 
from tensorflow.keras.callbacks import EarlyStopping # EarlyStopping during model training 
from sklearn.preprocessing import RobustScaler # This Scaler removes the median and scales the data according to the quantile range to normalize the price data  
#from keras.optimizers import Adam # For detailed configuration of the optimizer 
import seaborn as sns # Visualization
sns.set_style('white', { 'axes.spines.right': False, 'axes.spines.top': False})


# check the tensorflow version and the number of available GPUs
print('Tensorflow Version: ' + tf.__version__)
physical_devices = tf.config.list_physical_devices('GPU')
print(&quot;Num GPUs:&quot;, len(physical_devices))

# Setting the timeframe for the data extraction
end_date =  date.today().strftime(&quot;%Y-%m-%d&quot;)
start_date = '2010-01-01'

# Getting NASDAQ quotes
stockname = 'NASDAQ'
symbol = '^IXIC'

# You can either use webreader or yfinance to load the data from yahoo finance
# import pandas_datareader as webreader
# df = webreader.DataReader(symbol, start=start_date, end=end_date, data_source=&quot;yahoo&quot;)

import yfinance as yf #Alternative package if webreader does not work: pip install yfinance
df = yf.download(symbol, start=start_date, end=end_date)

# Quick overview of dataset
df.head()</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Tensorflow Version: 2.5.0
Num GPUs: 1
[*********************100%***********************]  1 of 1 completed
			Open		High		Low	Close	Adj 		Close		Volume
Date						
2009-12-31	2292.919922	2293.590088	2269.110107	2269.149902	2269.149902	1237820000
2010-01-04	2294.409912	2311.149902	2294.409912	2308.419922	2308.419922	1931380000
2010-01-05	2307.270020	2313.729980	2295.620117	2308.709961	2308.709961	2367860000
2010-01-06	2307.709961	2314.070068	2295.679932	2301.090088	2301.090088	2253340000
2010-01-07	2298.090088	2301.300049	2285.219971	2300.050049	2300.050049	2270050000</pre></div>



<h3 class="wp-block-heading" id="h-step-2-explore-the-data">Step #2 Explore the Data</h3>



<p class="wp-block-paragraph">Let&#8217;s take a quick look at the data by creating line charts for the columns of our data set.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Plot line charts
df_plot = df.copy()

ncols = 2
nrows = int(round(df_plot.shape[1] / ncols, 0))

fig, ax = plt.subplots(nrows=nrows, ncols=ncols, sharex=True, figsize=(14, 7))
for i, ax in enumerate(fig.axes):
        sns.lineplot(data = df_plot.iloc[:, i], ax=ax)
        ax.tick_params(axis=&quot;x&quot;, rotation=30, labelsize=10, length=0)
        ax.xaxis.set_major_locator(mdates.AutoDateLocator())
fig.tight_layout()
plt.show()</pre></div>



<figure class="wp-block-image size-full"><img decoding="async" width="1000" height="496" data-attachment-id="8645" data-permalink="https://www.relataly.com/feature-engineering-for-multivariate-time-series-models-with-python/1813/line-plots-feature-engineering-1/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/line-plots-feature-engineering-1.png" data-orig-size="1000,496" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="line-plots-feature-engineering-1" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/line-plots-feature-engineering-1.png" src="https://www.relataly.com/wp-content/uploads/2022/05/line-plots-feature-engineering-1.png" alt="feature engineering stock market prediction, python tutorial, keras, scikit-learn" class="wp-image-8645" srcset="https://www.relataly.com/wp-content/uploads/2022/05/line-plots-feature-engineering-1.png 1000w, https://www.relataly.com/wp-content/uploads/2022/05/line-plots-feature-engineering-1.png 300w, https://www.relataly.com/wp-content/uploads/2022/05/line-plots-feature-engineering-1.png 768w" sizes="(max-width: 1000px) 100vw, 1000px" /></figure>



<p class="wp-block-paragraph">Our initial dataset includes six features: High, Low, Open, Close, Volumen, and Adj Close.</p>



<h3 class="wp-block-heading" id="h-step-3-feature-engineering">Step #3 Feature Engineering</h3>



<p class="wp-block-paragraph">Now comes the exciting part &#8211;  we will implement additional features. We use various indicators from chart analysis, such as averages for different periods and stochastic oscillators to measure price momentum.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Indexing Batches
train_df = df.sort_values(by=['Date']).copy()

# Adding Month and Year in separate columns
d = pd.to_datetime(train_df.index)
train_df['Day'] = d.strftime(&quot;%d&quot;) 
train_df['Month'] = d.strftime(&quot;%m&quot;) 
train_df['Year'] = d.strftime(&quot;%Y&quot;) 
train_df</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">			Open		High		Low			Close		Adj Close	Volume		Day	Month	Year
Date									
2009-12-31	2292.919922	2293.590088	2269.110107	2269.149902	2269.149902	1237820000	31	12		2009
2010-01-04	2294.409912	2311.149902	2294.409912	2308.419922	2308.419922	1931380000	04	01		2010
2010-01-05	2307.270020	2313.729980	2295.620117	2308.709961	2308.709961	2367860000	05	01		2010
2010-01-06	2307.709961	2314.070068	2295.679932	2301.090088	2301.090088	2253340000	06	01		2010
2010-01-07	2298.090088	2301.300049	2285.219971	2300.050049	2300.050049	2270050000	07	01		2010</pre></div>



<p class="wp-block-paragraph">We create a set of indicators for the training data with the following code. However, we will make one more restriction in the next step since a model with all these indicators does not achieve good results and would take far too long to train on a local computer.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Feature Engineering
def createFeatures(df):
    df = pd.DataFrame(df)

    
    df['Close_Diff'] = df['Adj Close'].diff()
        
    # Moving averages - different periods
    df['MA200'] = df['Close'].rolling(window=200).mean() 
    df['MA100'] = df['Close'].rolling(window=100).mean() 
    df['MA50'] = df['Close'].rolling(window=50).mean() 
    df['MA26'] = df['Close'].rolling(window=26).mean() 
    df['MA20'] = df['Close'].rolling(window=20).mean() 
    df['MA12'] = df['Close'].rolling(window=12).mean() 
    
    # SMA Differences - different periods
    df['DIFF-MA200-MA50'] = df['MA200'] - df['MA50']
    df['DIFF-MA200-MA100'] = df['MA200'] - df['MA100']
    df['DIFF-MA200-CLOSE'] = df['MA200'] - df['Close']
    df['DIFF-MA100-CLOSE'] = df['MA100'] - df['Close']
    df['DIFF-MA50-CLOSE'] = df['MA50'] - df['Close']
    
    # Moving Averages on high, lows, and std - different periods
    df['MA200_low'] = df['Low'].rolling(window=200).min()
    df['MA14_low'] = df['Low'].rolling(window=14).min()
    df['MA200_high'] = df['High'].rolling(window=200).max()
    df['MA14_high'] = df['High'].rolling(window=14).max()
    df['MA20dSTD'] = df['Close'].rolling(window=20).std() 
    
    # Exponential Moving Averages (EMAS) - different periods
    df['EMA12'] = df['Close'].ewm(span=12, adjust=False).mean()
    df['EMA20'] = df['Close'].ewm(span=20, adjust=False).mean()
    df['EMA26'] = df['Close'].ewm(span=26, adjust=False).mean()
    df['EMA100'] = df['Close'].ewm(span=100, adjust=False).mean()
    df['EMA200'] = df['Close'].ewm(span=200, adjust=False).mean()

    # Shifts (one day before and two days before)
    df['close_shift-1'] = df.shift(-1)['Close']
    df['close_shift-2'] = df.shift(-2)['Close']

    # Bollinger Bands
    df['Bollinger_Upper'] = df['MA20'] + (df['MA20dSTD'] * 2)
    df['Bollinger_Lower'] = df['MA20'] - (df['MA20dSTD'] * 2)
    
    # Relative Strength Index (RSI)
    df['K-ratio'] = 100*((df['Close'] - df['MA14_low']) / (df['MA14_high'] - df['MA14_low']) )
    df['RSI'] = df['K-ratio'].rolling(window=3).mean() 

    # Moving Average Convergence/Divergence (MACD)
    df['MACD'] = df['EMA12'] - df['EMA26']
    
    # Replace nas 
    nareplace = df.at[df.index.max(), 'Close']    
    df.fillna((nareplace), inplace=True)
    
    return df</pre></div>



<p class="wp-block-paragraph">Now that we have created several features, we will limit them. We can now choose from these features and test how different feature combinations affect model performance.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># List of considered Features
FEATURES = [
#             'High',
#             'Low',
#             'Open',
              'Close',
#             'Volume',
#             'Day',
#             'Month',
#             'Year',
#             'Adj Close',
#              'close_shift-1',
#              'close_shift-2',
#             'MACD',
#             'RSI',
#             'MA200',
#             'MA200_high',
#             'MA200_low',
            'Bollinger_Upper',
            'Bollinger_Lower',
#             'MA100',            
#             'MA50',
#             'MA26',
#             'MA14_low',
#             'MA14_high',
#             'MA12',
#             'EMA20',
#             'EMA100',
#             'EMA200',
#               'DIFF-MA200-MA50',
#               'DIFF-MA200-MA100',
#             'DIFF-MA200-CLOSE',
#             'DIFF-MA100-CLOSE',
#             'DIFF-MA50-CLOSE'
           ]

# Create the dataset with features
df_features = createFeatures(train_df)

# Shift the timeframe by 10 month
use_start_date = pd.to_datetime(&quot;2010-11-01&quot; )
df_features = df_features[df_features.index &gt; use_start_date].copy()

# Filter the data to the list of FEATURES
data_filtered_ext = df_features[FEATURES].copy()

# We add a prediction column and set dummy values to prepare the data for scaling
#data_filtered_ext['Prediction'] = data_filtered_ext['Close'] 
print(data_filtered_ext.tail().to_string())

# remove Date column before training
dfs = data_filtered_ext.copy()

# Create a list with the relevant columns
assetname_list = [dfs.columns[i-1] for i in range(dfs.shape[1])]

# Create the lineplot
fig, ax = plt.subplots(figsize=(16, 8))
sns.lineplot(data=data_filtered_ext[assetname_list], linewidth=1.0, dashes=False, palette='muted')

# Configure and show the plot    
ax.set_title(stockname + ' price chart')
ax.legend()
plt.show</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}"> 			Close  			Bollinger_Upper  Bollinger_Lower
Date                                                      
2022-05-18  11418.150391     13404.779247     11065.040772
2022-05-19  11388.500000     13285.741255     11005.463725
2022-05-20  11354.620117     13214.664450     10928.073538
2022-05-23  11535.269531     13075.594634     10920.185347
2022-05-24  11264.450195     13035.543222     10837.607755
&lt;function matplotlib.pyplot.show(close=None, block=None)&gt;</pre></div>



<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="11456" data-permalink="https://www.relataly.com/feature-engineering-for-multivariate-time-series-models-with-python/1813/image-7/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image.png" data-orig-size="942,492" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image.png" alt="Price chart for the nasdaq price index with bollinger bands" class="wp-image-11456" width="1124" height="586" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image.png 942w, https://www.relataly.com/wp-content/uploads/2022/12/image.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image.png 768w" sizes="(max-width: 1124px) 100vw, 1124px" /></figure>



<h3 class="wp-block-heading" id="h-step-4-scaling-and-transforming-the-data">Step #4 Scaling and Transforming the Data</h3>



<p class="wp-block-paragraph">Before training our model, we need to transform the data. This step includes scaling the data (to a range between 0 and 1) and dividing it into separate sets for training and testing the prediction model. Most of the code used in this section stems from the previous article on <a href="https://www.relataly.com/stock-market-prediction-with-multivariate-time-series-in-python/1815/" target="_blank" rel="noreferrer noopener">multivariate time-series prediction</a>, which covers the steps to transform the data. So we don&#8217;t go into too much detail here. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Calculate the number of rows in the data
nrows = dfs.shape[0]
np_data_unscaled = np.reshape(np.array(dfs), (nrows, -1))
print(np_data_unscaled.shape)

# Transform the data by scaling each feature to a range between 0 and 1
scaler = RobustScaler()
np_data = scaler.fit_transform(np_data_unscaled)

# Creating a separate scaler that works on a single column for scaling predictions
scaler_pred = RobustScaler()
df_Close = pd.DataFrame(data_filtered_ext['Close'])
np_Close_scaled = scaler_pred.fit_transform(df_Close)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Out: (2619, 6)</pre></div>



<p class="wp-block-paragraph">Once we have scaled the data, we will split the data into a train and test set. This step creates four datasets x_train and x_test, and y_train and y_test. x_train and x_test contain the data with our selected features. The two sets, y_train and y_test, have the actual values, which our model will try to predict.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Set the sequence length - this is the timeframe used to make a single prediction
sequence_length = 50 # = number of neurons in the first layer of the neural network

# Split the training data into train and train data sets
# As a first step, we get the number of rows to train the model on 80% of the data 
train_data_len = math.ceil(np_data.shape[0] * 0.8)

# Create the training and test data
train_data = np_data[:train_data_len, :]
test_data = np_data[train_data_len - sequence_length:, :]

# The RNN needs data with the format of [samples, time steps, features]
# Here, we create N samples, sequence_length time steps per sample, and 6 features
def partition_dataset(sequence_length, data):
    x, y = [], []
    data_len = data.shape[0]

    for i in range(sequence_length, data_len):
        x.append(data[i-sequence_length:i,:]) #contains sequence_length values 0-sequence_length * columsn
        y.append(data[i, 0]) #contains the prediction values for validation,  for single-step prediction
    
    # Convert the x and y to numpy arrays
    x = np.array(x)
    y = np.array(y)
    return x, y

# Generate training data and test data
x_train, y_train = partition_dataset(sequence_length, train_data)
x_test, y_test = partition_dataset(sequence_length, test_data)

# Print the shapes: the result is: (rows, training_sequence, features) (prediction value, )
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# Validate that the prediction value and the input match up
# The last close price of the second input sample should equal the first prediction value
print(x_train[1][sequence_length-1][0])
print(y_train[0])</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;disableCopy&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Out:
(1914, 30, 3) (1914,) 
(486, 30, 3) (486,)</pre></div>



<h3 class="wp-block-heading" id="h-step-5-train-the-time-series-forecasting-model">Step #5 Train the Time Series Forecasting Model</h3>



<p class="wp-block-paragraph">Now that we have prepared the data, we can train our forecasting model. For this purpose, we will use a recurrent neural network from the Keras library. A recurrent neural network (RNN) is a type of artificial neural network that can process sequential data, such as text, audio, or time series data. Unlike traditional feedforward neural networks, in which data flows through the network in only one direction, RNNs have connections that form a directed cycle, allowing information to flow in multiple directions and be processed in a temporal manner.</p>



<p class="wp-block-paragraph">The model architecture of our RNN looks as follows:</p>



<ul class="wp-block-list">
<li>LSTM layer that receives a mini-batch as input.</li>



<li>LSTM layer that has the same number of neurons as the mini-batch</li>



<li>Another LSTM layer that does not return the sequence</li>



<li>Dense layer with 32 neurons</li>



<li>Dense layer with one neuron that outputs the forecast</li>
</ul>



<p class="wp-block-paragraph">The architecture is not too complex and is suitable for experimenting with different features. I arrived at this architecture by trying out different layers and configurations. However, I did not spend too much time fine-tuning the architecture since this tutorial focuses on feature engineering.</p>



<p class="wp-block-paragraph">During model training, the neural network processes several mini-batches. The shape of the mini-batch is defined by the number of features and the period chosen. Multiplying these two dimensions (number of features x number of time steps) gives the input shape of our model.</p>



<p class="wp-block-paragraph">The following code defines the model architecture, trains the model, and then prints the training loss curve:</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Configure the neural network model
model = Sequential()

# Configure the Neural Network Model with n Neurons - inputshape = t Timestamps x f Features
n_neurons = x_train.shape[1] * x_train.shape[2]
print('timesteps: ' + str(x_train.shape[1]) + ',' + ' features:' + str(x_train.shape[2]))
model.add(LSTM(n_neurons, return_sequences=True, input_shape=(x_train.shape[1], x_train.shape[2]))) 
#model.add(Dropout(0.1))
model.add(LSTM(n_neurons, return_sequences=True))
#model.add(Dropout(0.1))
model.add(LSTM(n_neurons, return_sequences=False))
model.add(Dense(32))
model.add(Dense(1, activation='relu'))


# Configure the Model   
optimizer='adam'; loss='mean_squared_error'; epochs = 100; batch_size = 32; patience = 8; 

# uncomment to customize the learning rate
learn_rate = &quot;standard&quot; # 0.05
# adam = Adam(learn_rate=learn_rate) 

parameter_list = ['epochs ' + str(epochs), 'batch_size ' + str(batch_size), 'patience ' + str(patience), 'optimizer ' + str(optimizer) + ' with learn rate ' + str(learn_rate), 'loss ' + str(loss)]
print('Parameters: ' + str(parameter_list))

# Compile and Training the model
model.compile(optimizer=optimizer, loss=loss)
early_stop = EarlyStopping(monitor='loss', patience=patience, verbose=1)
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, callbacks=[early_stop], shuffle = True,
                  validation_data=(x_test, y_test))

# Plot training &amp; validation loss values
fig, ax = plt.subplots(figsize=(12, 6), sharex=True)
plt.plot(history.history[&quot;loss&quot;])
plt.title(&quot;Model loss&quot;)
plt.ylabel(&quot;Loss&quot;)
plt.xlabel(&quot;Epoch&quot;)
ax.xaxis.set_major_locator(plt.MaxNLocator(epochs))
plt.legend([&quot;Train&quot;, &quot;Test&quot;], loc=&quot;upper left&quot;)
plt.show()</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">timesteps: 50, features:1
Parameters: ['epochs 100', 'batch_size 32', 'patience 8', 'optimizer adam with learn rate standard', 'loss mean_squared_error']
Epoch 1/100
72/72 [==============================] - 9s 55ms/step - loss: 0.0990 - val_loss: 0.2985
Epoch 2/100
72/72 [==============================] - 3s 37ms/step - loss: 0.0932 - val_loss: 0.1768
Epoch 3/100
72/72 [==============================] - 3s 39ms/step - loss: 0.0931 - val_loss: 0.1246
Epoch 4/100
72/72 [==============================] - 3s 37ms/step - loss: 0.0931 - val_loss: 0.0902
Epoch 5/100
72/72 [==============================] - 3s 38ms/step - loss: 0.0929 - val_loss: 0.0846
Epoch 6/100
72/72 [==============================] - 3s 38ms/step - loss: 0.0930 - val_loss: 0.0611
Epoch 7/100
72/72 [==============================] - 3s 38ms/step - loss: 0.0929 - val_loss: 0.0498
Epoch 8/100
72/72 [==============================] - 3s 37ms/step - loss: 0.0928 - val_loss: 0.0208
Epoch 9/100
72/72 [==============================] - 3s 38ms/step - loss: 0.0929 - val_loss: 0.0588
Epoch 10/100
72/72 [==============================] - 3s 37ms/step - loss: 0.0928 - val_loss: 0.0437
Epoch 11/100
72/72 [==============================] - 3s 36ms/step - loss: 0.0928 - val_loss: 0.0192
Epoch 12/100
...
72/72 [==============================] - 3s 38ms/step - loss: 0.0925 - val_loss: 0.0094
Epoch 46/100
72/72 [==============================] - 3s 37ms/step - loss: 0.0925 - val_loss: 0.0113
Epoch 00046: early stopping</pre></div>



<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="8656" data-permalink="https://www.relataly.com/feature-engineering-for-multivariate-time-series-models-with-python/1813/loss-function-feature-engineering-neural-networks/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/loss-function-feature-engineering-neural-networks.png" data-orig-size="729,383" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="loss-function-feature-engineering-neural-networks" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/loss-function-feature-engineering-neural-networks.png" src="https://www.relataly.com/wp-content/uploads/2022/05/loss-function-feature-engineering-neural-networks.png" alt="loss curve of our time series prediction model for stock market forecasting" class="wp-image-8656" width="775" height="407" srcset="https://www.relataly.com/wp-content/uploads/2022/05/loss-function-feature-engineering-neural-networks.png 729w, https://www.relataly.com/wp-content/uploads/2022/05/loss-function-feature-engineering-neural-networks.png 300w" sizes="(max-width: 775px) 100vw, 775px" /></figure>



<p class="wp-block-paragraph">The loss drops quickly, and the training process looks promising.</p>



<h3 class="wp-block-heading" id="h-step-6-evaluate-model-performance">Step #6 Evaluate Model Performance</h3>



<p class="wp-block-paragraph">If we test a feature, we also want to know how it impacts the performance of our model. Feature Engineering is therefore closely related to evaluating model performance. So, let&#8217;s check the prediction performance. For this purpose, we score the model with the test data set (x_test). Then we can compare the predictions with the actual values (y_test) in a lineplot.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Get the predicted values
y_pred_scaled = model.predict(x_test)

# Unscale the predicted values
y_pred = scaler_pred.inverse_transform(y_pred_scaled)
y_test_unscaled = scaler_pred.inverse_transform(y_test.reshape(-1, 1))
y_test_unscaled.shape

# Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_unscaled, y_pred)
print(f'Median Absolute Error (MAE): {np.round(MAE, 2)}')

# Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled))) * 100
print(f'Mean Absolute Percentage Error (MAPE): {np.round(MAPE, 2)} %')

# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled)) ) * 100
print(f'Median Absolute Percentage Error (MDAPE): {np.round(MDAPE, 2)} %')

# The date from which on the date is displayed
display_start_date = &quot;2019-01-01&quot; 

# Add the difference between the valid and predicted prices
train = pd.DataFrame(dfs['Close'][:train_data_len + 1]).rename(columns={'Close': 'y_train'})
valid = pd.DataFrame(dfs['Close'][train_data_len:]).rename(columns={'Close': 'y_test'})
valid.insert(1, &quot;y_pred&quot;, y_pred, True)
valid.insert(1, &quot;residuals&quot;, valid[&quot;y_pred&quot;] - valid[&quot;y_test&quot;], True)
df_union = pd.concat([train, valid])

# Zoom in to a closer timeframe
df_union_zoom = df_union[df_union.index &gt; display_start_date]

# Create the lineplot
fig, ax1 = plt.subplots(figsize=(16, 8))
plt.title(&quot;y_pred vs y_test&quot;)
plt.ylabel(stockname, fontsize=18)
sns.set_palette([&quot;#090364&quot;, &quot;#1960EF&quot;, &quot;#EF5919&quot;])
sns.lineplot(data=df_union_zoom[['y_pred', 'y_train', 'y_test']], linewidth=1.0, dashes=False, ax=ax1)

# Create the barplot for the absolute errors
df_sub = [&quot;#2BC97A&quot; if x &gt; 0 else &quot;#C92B2B&quot; for x in df_union_zoom[&quot;residuals&quot;].dropna()]
ax1.bar(height=df_union_zoom['residuals'].dropna(), x=df_union_zoom['residuals'].dropna().index, width=3, label='absolute errors', color=df_sub)
plt.legend()
plt.show()</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Median Absolute Error (MAE): 547.23 
Mean Absolute Percentage Error (MAPE): 4.04 % 
Median Absolute Percentage Error (MDAPE): 3.73 %</pre></div>



<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="8654" data-permalink="https://www.relataly.com/feature-engineering-for-multivariate-time-series-models-with-python/1813/lineplot-nasdaq-feature-engineering-1/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/lineplot-nasdaq-feature-engineering-1.png" data-orig-size="942,492" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="lineplot-nasdaq-feature-engineering-1" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/lineplot-nasdaq-feature-engineering-1.png" src="https://www.relataly.com/wp-content/uploads/2022/05/lineplot-nasdaq-feature-engineering-1.png" alt="multivariate feature engineering, prediction results" class="wp-image-8654" width="1164" height="607" srcset="https://www.relataly.com/wp-content/uploads/2022/05/lineplot-nasdaq-feature-engineering-1.png 942w, https://www.relataly.com/wp-content/uploads/2022/05/lineplot-nasdaq-feature-engineering-1.png 300w, https://www.relataly.com/wp-content/uploads/2022/05/lineplot-nasdaq-feature-engineering-1.png 768w" sizes="(max-width: 1164px) 100vw, 1164px" /></figure>



<p class="wp-block-paragraph">On average, the predictions of our model deviate from the actual values by about one percent. Although one percent may not sound like a lot, the prediction errors can quickly accumulate to larger values.</p>



<h3 class="wp-block-heading" id="h-step-7-overview-of-selected-models">Step #7 Overview of Selected Models</h3>



<p class="wp-block-paragraph">In writing this article, I tested various models based on different features. The neural network architecture remained unchanged. Likewise, I kept the hyperparameters the same except for the learning rate. Below are the results of these model variants:</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="3605" data-permalink="https://www.relataly.com/feature-engineering-for-multivariate-time-series-models-with-python/1813/image-33-4/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2021/04/image-33.png" data-orig-size="753,944" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-33" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2021/04/image-33.png" src="https://www.relataly.com/wp-content/uploads/2021/04/image-33.png" alt="performance of different variations of the multivariate keras neural network model for stock market forecasting" class="wp-image-3605" width="587" height="736" srcset="https://www.relataly.com/wp-content/uploads/2021/04/image-33.png 753w, https://www.relataly.com/wp-content/uploads/2021/04/image-33.png 239w" sizes="(max-width: 587px) 100vw, 587px" /></figure>



<p class="wp-block-paragraph"></p>



<h3 class="wp-block-heading" id="h-step-8-conclusions">Step #8 Conclusions</h3>



<p class="wp-block-paragraph">Estimating which indicators will lead to good results in advance is difficult. More indicators do not necessarily lead to better results because they increase the model complexity and add data without predictive power. This so-called noise makes it harder for the model to separate important influencing factors from less important ones. Also, each additional indicator increases the time needed to train the model. So there is no way around testing different variants.</p>



<p class="wp-block-paragraph">Besides the feature, various hyperparameters such as the learning rate, optimizer, batch size, and the selected time frame of the data (sequence_length) impact the model&#8217;s performance. Tuning these hyperparameters can further improve model performance. </p>



<ul class="wp-block-list">
<li>A learning rate of 0.05 achieves the best results from the tested configurations.</li>



<li>Of all features, only the Bollinger bands positively affected the model&#8217;s performance. </li>



<li>As expected, the performance tends to decrease with the number of features. </li>



<li>In our case, the hyperparameters seem to affect the performance of the models more than the choice of features.</li>
</ul>



<p class="wp-block-paragraph">Finally, we have optimized only a single parameter. We searched for optimal learning rates while leaving all other parameters unchanged, such as the optimizer, the neural network architecture, or the sequence length. Based on the results, we can draw several conclusions: </p>



<p class="wp-block-paragraph">There is plenty of room for improvement and experimentation. With more time for experiments and computational power, it will undoubtedly be possible to identify better features and model configurations. So, have fun experimenting! 🙂</p>



<h2 class="wp-block-heading" id="h-summary">Summary</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">In this tutorial, we have delved into the fascinating world of feature engineering for stock market forecasting using Python. By exploring various features from chart analysis, such as RSI, moving averages, and Bollinger bands, we have developed multiple variants of a recurrent neural network that produce distinct prediction models.</p>



<p class="wp-block-paragraph">Our experiments have shown that the choice of features can have a significant impact on the performance of the prediction model. Therefore, it&#8217;s essential to carefully select features and consider their potential impact on the model. Additionally, keep in mind that the most effective features for recognizing patterns in historical data will vary depending on the specific time series data being analyzed.</p>



<p class="wp-block-paragraph">By following the crucial steps outlined in this tutorial, you now have the knowledge and tools to apply feature engineering techniques to any multivariate time series forecasting problem. With further experimentation and testing, you can fine-tune your models to achieve the best possible results for your specific use case.</p>



<p class="wp-block-paragraph">We hope you found this tutorial both informative and helpful. If you have any questions or comments, don&#8217;t hesitate to reach out and let us know. </p>



<p class="wp-block-paragraph">And if you want to learn more about feature preparation and exploration, check out my recent article on <a href="https://www.relataly.com/exploratory-feature-preparation-for-regression-with-python-and-scikit-learn/8832/" target="_blank" rel="noreferrer noopener">Exploratory Feature Preparation for Regression Models</a>.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h2 class="wp-block-heading" id="h-sources-and-further-reading">Sources and Further Reading</h2>



<ol class="wp-block-list"><li><a href="https://amzn.to/3MyU6Tj" target="_blank" rel="noreferrer noopener">Charu C. Aggarwal (2018) Neural Networks and Deep Learning</a></li><li><a href="https://amzn.to/3yIQdWi" target="_blank" rel="noreferrer noopener">Jansen (2020) Machine Learning for Algorithmic Trading: Predictive models to extract signals from market and alternative data for systematic trading strategies with Python</a></li><li><a href="https://amzn.to/3S9Nfkl" target="_blank" rel="noreferrer noopener">Aurélien Géron (2019) Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems </a></li><li><a href="https://amzn.to/3EKidwE" target="_blank" rel="noreferrer noopener">David Forsyth (2019) Applied Machine Learning Springer</a></li><li><a href="https://amzn.to/3MAy8j5" target="_blank" rel="noreferrer noopener">Andriy Burkov (2020) Machine Learning Engineering</a></li></ol>



<p class="has-contrast-2-color has-base-3-background-color has-text-color has-background wp-block-paragraph"><em>The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.</em></p>



<p class="wp-block-paragraph"><strong>Books on Applied Machine Learning</strong></p>



<div style="display: inline-block;">

  <iframe sandbox="allow-popups allow-scripts allow-modals allow-forms allow-same-origin" style="width:120px;height:240px;" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" src="//ws-eu.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&amp;OneJS=1&amp;Operation=GetAdHtml&amp;MarketPlace=DE&amp;source=ss&amp;ref=as_ss_li_til&amp;ad_type=product_link&amp;tracking_id=flo7up-21&amp;language=de_DE&amp;marketplace=amazon&amp;region=DE&amp;placement=3030181162&amp;asins=3030181162&amp;linkId=669e46025028259138fbb5ccec12dfbe&amp;show_border=true&amp;link_opens_in_new_window=true"></iframe>
<iframe sandbox="allow-popups allow-scripts allow-modals allow-forms allow-same-origin" style="width:120px;height:240px;" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" src="//ws-eu.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&amp;OneJS=1&amp;Operation=GetAdHtml&amp;MarketPlace=DE&amp;source=ss&amp;ref=as_ss_li_til&amp;ad_type=product_link&amp;tracking_id=flo7up-21&amp;language=de_DE&amp;marketplace=amazon&amp;region=DE&amp;placement=1999579577&amp;asins=1999579577&amp;linkId=91d862698bf9010ff4c09539e4c49bf4&amp;show_border=true&amp;link_opens_in_new_window=true"></iframe>
</div>



<p class="has-contrast-2-color has-base-3-background-color has-text-color has-background wp-block-paragraph"><em>The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.</em></p>
<p>The post <a href="https://www.relataly.com/feature-engineering-for-multivariate-time-series-models-with-python/1813/">Mastering Multivariate Stock Market Prediction with Python: A Guide to Effective Feature Engineering Techniques</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.relataly.com/feature-engineering-for-multivariate-time-series-models-with-python/1813/feed/</wfw:commentRss>
			<slash:comments>8</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1813</post-id>	</item>
		<item>
		<title>Correlation Matrix in Python: How Correlated are COVID-19 Cases and Different Financial Assets?</title>
		<link>https://www.relataly.com/stock-market-correlation-matrix-in-python/103/</link>
					<comments>https://www.relataly.com/stock-market-correlation-matrix-in-python/103/#comments</comments>
		
		<dc:creator><![CDATA[Florian Follonier]]></dc:creator>
		<pubDate>Sun, 05 Apr 2020 16:08:00 +0000</pubDate>
				<category><![CDATA[Correlation]]></category>
		<category><![CDATA[Data Science]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Seaborn]]></category>
		<category><![CDATA[Stock Market Forecasting]]></category>
		<category><![CDATA[Yahoo Finance API]]></category>
		<category><![CDATA[Beginner Tutorials]]></category>
		<category><![CDATA[Bitcoin]]></category>
		<category><![CDATA[Correlation Matrix]]></category>
		<category><![CDATA[Covid-19 Analytics]]></category>
		<category><![CDATA[Cryptocurrencies]]></category>
		<category><![CDATA[Pearson Correlation]]></category>
		<category><![CDATA[Time Series Correlation]]></category>
		<guid isPermaLink="false">https://www.relataly.com/?p=103</guid>

					<description><![CDATA[<p>Correlation analysis is a powerful tool in financial market analysis, helping investors to better understand the interdependence of different assets. But what happens when an unprecedented global pandemic like COVID-19 shakes up the market? In this tutorial, we will show you how to create a correlation matrix in Python that will help you visualize the ... <a title="Correlation Matrix in Python: How Correlated are COVID-19 Cases and Different Financial Assets?" class="read-more" href="https://www.relataly.com/stock-market-correlation-matrix-in-python/103/" aria-label="Read more about Correlation Matrix in Python: How Correlated are COVID-19 Cases and Different Financial Assets?">Read more</a></p>
<p>The post <a href="https://www.relataly.com/stock-market-correlation-matrix-in-python/103/">Correlation Matrix in Python: How Correlated are COVID-19 Cases and Different Financial Assets?</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div style="height:31px" aria-hidden="true" class="wp-block-spacer"></div>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">Correlation analysis is a powerful tool in financial market analysis, helping investors to better understand the interdependence of different assets. But what happens when an unprecedented global pandemic like COVID-19 shakes up the market? In this tutorial, we will show you how to create a correlation matrix in Python that will help you visualize the relationship between COVID-19 and various financial assets.</p>



<p class="wp-block-paragraph">First, we will delve into the nitty-gritty of correlation coefficients and how to interpret them. We&#8217;ll focus specifically on the Pearson Correlation Coefficient, a popular measure used to quantify the strength of the relationship between two variables.</p>



<p class="wp-block-paragraph">Next, we&#8217;ll dive right into the practical part of this tutorial and create a stock market correlation matrix in Python. Our matrix will measure the correlation between COVID-19 cases and various financial assets such as gold, Bitcoin, and other popular investments. With this matrix, investors can identify the extent to which COVID-19 has impacted different asset classes and make more informed investment decisions.</p>



<p class="wp-block-paragraph">Whether you&#8217;re a seasoned investor or just starting out, this tutorial will equip you with the knowledge and tools you need to analyze the correlation between COVID-19 and financial markets. So, let&#8217;s dive in and start exploring the fascinating world of correlation analysis!</p>



<p class="wp-block-paragraph">Also: <a href="https://www.relataly.com/cryptocurrency-price-charts-with-color-overlay-python/2820/" target="_blank" rel="noreferrer noopener">Color-coded Cryptocurrency Price Charts with Python</a> </p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image is-resized"><img decoding="async" src="https://www.relataly.com/wp-content/uploads/2020/05/image-45.png" alt="A correlation matrix, as we will create it in this article" width="376" height="335"/><figcaption class="wp-element-caption">A correlation matrix, as we will create it in this article.</figcaption></figure>
</div>
</div>



<h2 class="wp-block-heading">Different Types of Correlation Coefficients</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">There are various types of correlation coefficients used to measure the strength and direction of the relationship between two variables. The most common is the Pearson correlation coefficient, which measures the linear relationship between two variables. This is the correlation coefficient on which we will focus in this article. However, if the relationship between two variables is more complex, other coefficients are a better choice for the analysis.  </p>



<p class="wp-block-paragraph">For example, in situations where the data is not normally distributed or when there are outliers, the Spearman correlation coefficient is used. This coefficient measures the relationship between two variables using ranks instead of the actual data. It is also known as the rank correlation coefficient. For ordinal data, the Kendall correlation coefficient is used. This coefficient measures the strength and direction of the relationship between two variables, taking into account the order of the data points. Finally, the Point-Biserial and Biserial correlation coefficients are used when one variable is dichotomous and the other is continuous. These coefficients measure the strength and direction of the relationship between these variables.</p>



<p class="wp-block-paragraph">Let&#8217;s take a closer look at Pearson Correlation. </p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h2 class="wp-block-heading" id="h-pearson-correlation">Pearson Correlation</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">The Pearson correlation coefficient r is a standard measure for quantifying a linear relationship between two variables. In other words, r is a measure of how strongly two continuous variables (for example, price or volume) tend to make similar changes. For the Pearson correlation coefficient to return a meaningful value, the following conditions must be met:</p>



<ul class="wp-block-list">
<li>Both variables, x and y, are metrically scaled and continuous.</li>



<li>The relationship between the two variables is approximately linear.</li>



<li>The two samples of the variables x and y are independent of each other.</li>
</ul>



<p class="wp-block-paragraph">Correlation measures how much two variables are associated. The Pearson correlation is calculated by dividing the covariance of two variables (x, y) by their standard deviations.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<div class="wp-block-mathml-mathmlblock">\[r = \frac{s_{xy}}{s_x\ast s_y}\ =\ \frac{\sum{x_iy_i\ -\ n\bar{x}\bar{y}}}{\sqrt{\sum{x_i^2\ -\ n{\bar{x}}^2}}\sqrt{\sum{y_i^{2\ }-\ n{\bar{y}}^2}}}\]<script id="wp-hooks-js" src="https://www.relataly.com/wp-includes/js/dist/hooks.min.js?ver=7496969728ca0f95732d"></script>
<script id="wp-i18n-js" src="https://www.relataly.com/wp-includes/js/dist/i18n.min.js?ver=781d11515ad3d91786ec"></script>
<script id="wp-i18n-js-after">
wp.i18n.setLocaleData( { 'text direction\u0004ltr': [ 'ltr' ] } );
//# sourceURL=wp-i18n-js-after
</script>
<script  async id="mathjax-js" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-MML-AM_CHTML"></script>
</div>
</div>
</div>



<h2 class="wp-block-heading" id="h-interpreting-the-pearson-correlation-coefficient">Interpreting the Pearson Correlation Coefficient</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">The value of r is restricted to the range between 1 and -1. Interpreting r requires us to differentiate the following cases:</p>



<ul class="wp-block-list">
<li>The closer r is to 1, the stronger the relationship is, and the better the points (Xi / Yi) fit on the regression line.</li>



<li>The closer r is to 0, the weaker the correlation is, and the more widely are the points spread around the regression line.</li>



<li>The extreme cases r = 1 or r = -1 result from a functional relation, defined by a linear equation of y = a + b*x can be described exactly. In this case, all points (xi / Yi) is located on the regression line.</li>
</ul>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="1205" data-permalink="https://www.relataly.com/stock-market-correlation-matrix-in-python/103/correlation-representation/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2020/05/correlation-representation.png" data-orig-size="1919,439" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="correlation-representation" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2020/05/correlation-representation.png" src="https://www.relataly.com/wp-content/uploads/2020/05/correlation-representation-1024x234.png" alt="correlation matrix, pearson correlation, Python" class="wp-image-1205" width="792" height="181" srcset="https://www.relataly.com/wp-content/uploads/2020/05/correlation-representation.png 1024w, https://www.relataly.com/wp-content/uploads/2020/05/correlation-representation.png 300w, https://www.relataly.com/wp-content/uploads/2020/05/correlation-representation.png 768w, https://www.relataly.com/wp-content/uploads/2020/05/correlation-representation.png 1536w, https://www.relataly.com/wp-content/uploads/2020/05/correlation-representation.png 1919w" sizes="(max-width: 792px) 100vw, 792px" /><figcaption class="wp-element-caption">Graphical representation of different correlation coefficients</figcaption></figure>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">Be aware that the correlation coefficient is often subject to misinterpretation. For example, an empirical correlation coefficient whose value is &gt; 0 merely states that we can prove a relation based on a sample. However, it does not explain why this relationship exists. In addition, if r ~ 0 does not mean that the two variables are independent. Instead, it only means that we cannot prove a linear relation.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h2 class="wp-block-heading" id="h-implementing-a-correlation-matrix-in-python">Implementing a Correlation Matrix in Python</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p class="wp-block-paragraph">In the following, we&#8217;ll dig deep into the data and analyze the spread of COVID-19 cases and casualties. To create this correlation matrix, we&#8217;ll utilize the Pandas library, a fantastic tool for data analysis that enables us to work with data in a variety of formats.</p>



<p class="wp-block-paragraph">First, we&#8217;ll load our data into a Pandas DataFrame, allowing us to manipulate and calculate correlations with ease. We&#8217;ll then use the corr() method to compute the correlation coefficients between the different asset classes and COVID-19. This generates a matrix that provides a clear view of the correlations between our variables.</p>



<p class="wp-block-paragraph">To make this information more visually appealing, we&#8217;ll create a heatmap using the Seaborn library. This heatmap will enable us to easily identify which asset classes are strongly correlated with COVID-19 and which are not.</p>



<p class="wp-block-paragraph">By creating a correlation matrix in Python, we can gain invaluable insights into the relationship between COVID-19 and the financial market. This knowledge can help us make informed investment decisions by identifying patterns and trends. So let&#8217;s dive in and create a correlation matrix that reveals the connection between COVID-19 and the financial market!</p>



<p class="wp-block-paragraph">The code is available on the GitHub repository.</p>



<div class="wp-block-kadence-advancedbtn kb-buttons-wrap kb-btns_9ba519-0b"><a class="kb-button kt-button button kb-btn_1acaaa-9e kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit kt-btn-has-text-true kt-btn-has-svg-true wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-tutorials/blob/master/00%20Data%20Visualization/112%20Correlation%20Matrix%20-%20COVID-19%20and%20Financial%20Assets.ipynb" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fe_eye kt-btn-icon-side-left"><svg viewBox="0 0 24 24"  fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M1 12s4-8 11-8 11 8 11 8-4 8-11 8-11-8-11-8z"/><circle cx="12" cy="12" r="3"/></svg></span><span class="kt-btn-inner-text">View on GitHub </span></a>

<a class="kb-button kt-button button kb-btn_1bc280-ef kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit kt-btn-has-text-true kt-btn-has-svg-true wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-API-tutorials" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fa_github kt-btn-icon-side-left"><svg viewBox="0 0 496 512"  fill="currentColor" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg></span><span class="kt-btn-inner-text">Relataly GitHub Repo </span></a></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h3 class="wp-block-heading" id="h-prerequisites">Prerequisites</h3>



<p class="wp-block-paragraph">Before starting the coding part, make sure that you have set up your <a href="https://www.python.org/downloads/" target="_blank" rel="noreferrer noopener">Python 3</a> environment and required packages. If you don’t have an environment set up yet, you can follow&nbsp;<a href="https://www.relataly.com/anaconda-python-environment-machine-learning/1663/" target="_blank" rel="noreferrer noopener">this tutorial</a>&nbsp;to set up the&nbsp;<a href="https://www.anaconda.com/products/individual" target="_blank" rel="noreferrer noopener">Anaconda environment</a>. Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:&nbsp;</p>



<ul class="wp-block-list">
<li><em><a href="https://pandas.pydata.org/" target="_blank" rel="noreferrer noopener">pandas</a></em></li>



<li><em><a href="https://numpy.org/" target="_blank" rel="noreferrer noopener">NumPy</a></em></li>



<li><a href="https://docs.python.org/3/library/math.html" target="_blank" rel="noreferrer noopener">math</a></li>



<li><em><a href="https://matplotlib.org/" target="_blank" rel="noreferrer noopener">matplotlib</a></em></li>
</ul>



<p class="wp-block-paragraph">In addition, we will be using the <a href="https://pandas-datareader.readthedocs.io/en/latest/" target="_blank" rel="noreferrer noopener">pandas-DataReader</a> package and <a href="https://seaborn.pydata.org/" target="_blank" rel="noreferrer noopener">Seaborn</a> for visualization. You can install packages using console commands:</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">pip install &lt;package name&gt;
conda install &lt;package name&gt; (if you are using the anaconda packet manager)</pre></div>



<h3 class="wp-block-heading" id="h-step-1-load-data">Step #1 Load Data</h3>



<p class="wp-block-paragraph">We begin by loading data about historic COVID-19 cases and price Information on different financial assets. </p>



<h4 class="wp-block-heading" id="h-1-1-load-historic-covid-19-data">1.1 Load Historic COVID-19 Data</h4>



<p class="wp-block-paragraph">We begin by downloading the COVID-19 data. For this purpose, we will use the Statworx API. It provides historical time series data on the number of COVID-19 cases in different countries. In addition, the data contains the number of casualties. If you are not yet familiar with APIs, consider my recent <a href="https://www.relataly.com/access-data-sources-using-apis/278/" target="_blank" rel="noreferrer noopener">tutorial on working with APIs in Python.</a></p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># A tutorial for this file is available at www.relataly.com

# Imports
import pandas as pd
import pandas_datareader as web
import numpy as np
from datetime import datetime
import seaborn as sns
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
import requests
import json
from pandas.plotting import register_matplotlib_converters

# Load second Dataset with Corona Cases
payload = {&quot;code&quot;: &quot;ALL&quot;}
URL = &quot;https://api.statworx.com/covid&quot;
response = requests.post(url=URL, data=json.dumps(payload))
df_covid = pd.DataFrame.from_dict(json.loads(response.text))
# df_covid = df_covid[df_covid['code'] == 'US']

# Add the date column as variable
df_covid[&quot;Date&quot;] = pd.to_datetime(df_covid[&quot;date&quot;])

# Delete some columns that we won't use
df_covid.drop(
    [&quot;day&quot;, &quot;month&quot;, &quot;year&quot;, &quot;country&quot;, &quot;code&quot;, &quot;population&quot;, &quot;date&quot;],
    axis=1,
    inplace=True,
)

# Summarize cases over all countries
df_covid = df_covid.groupby([&quot;Date&quot;]).sum()
df_covid.head()</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">			cases	deaths	cases_cum	deaths_cum
Date				
2019-12-31	27		0		27			0
2020-01-01	0		0		27			0
2020-01-02	0		0		27			0
2020-01-03	17		0		44			0
2020-01-04	0		0		44			0</pre></div>



<h4 class="wp-block-heading" id="h-1-2-loading-data-on-selected-financial-assets">1.2 Loading Data on Selected Financial Assets</h4>



<p class="wp-block-paragraph">We continue by downloading historical price data on different financial assets. For this purpose, we use the Yahoo Finance API. We limit the period to the time after the first documented COVID-19 cases. When you execute the code of this tutorial as it is, you will receive price information for the following financial assets: </p>



<div class="wp-block-columns are-vertically-aligned-top is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-vertically-aligned-top is-layout-flow wp-block-column-is-layout-flow">
<p class="wp-block-paragraph"><strong>Stock Market Indexes</strong></p>



<ul class="wp-block-list">
<li>S&amp;P500</li>



<li>DAX </li>



<li>Niki</li>



<li>N225</li>



<li>S&amp;P500 Futures </li>
</ul>
</div>



<div class="wp-block-column is-vertically-aligned-top is-layout-flow wp-block-column-is-layout-flow">
<p class="wp-block-paragraph"><strong>Stocks: Online Services</strong></p>



<ul class="wp-block-list">
<li>Amazon</li>



<li>Netflix</li>



<li>Apple</li>



<li>Google</li>



<li>Microsoft</li>
</ul>



<p class="wp-block-paragraph"><strong>Stocks: Airlines</strong></p>



<ul class="wp-block-list">
<li>Lufthansa Stock</li>



<li>American Airlines</li>
</ul>
</div>



<div class="wp-block-column is-vertically-aligned-top is-layout-flow wp-block-column-is-layout-flow">
<p class="wp-block-paragraph"><strong>Resource</strong> <strong>Futures</strong></p>



<ul class="wp-block-list">
<li>Crude Oil Price</li>



<li>Gold </li>



<li>Soybean Price </li>
</ul>



<p class="wp-block-paragraph"></p>



<p class="wp-block-paragraph"><strong>Treasury Bonds</strong> <strong>Futures</strong></p>



<ul class="wp-block-list">
<li>US Treasury Bonds</li>
</ul>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<p class="wp-block-paragraph"><strong>Exchange Rates</strong></p>



<ul class="wp-block-list">
<li>EUR-USD</li>



<li>CHF-EUR</li>



<li>GBP-USD</li>



<li>GBP-EUR</li>



<li><br></li>
</ul>



<p class="wp-block-paragraph"><strong>Crypto Currencies</strong></p>



<ul class="wp-block-list">
<li>BTC-USD</li>



<li>ETH-USD</li>
</ul>
</div>
</div>



<p class="wp-block-paragraph">Be aware that stock symbols can change from time to time. If the API does not find a specific stock symbol, you have to look up the current Symbol on Yahoo Finance.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">df_covid_new = df_covid.copy()

# Read the data for different assets
today_date = datetime.today().strftime(&quot;%Y-%m-%d&quot;)
start_date = &quot;2020-01-01&quot;
asset_dict = {
    &quot;^GSPC&quot;: &quot;SP500&quot;,
    &quot;DAX&quot;: &quot;DAX&quot;,
    &quot;^N225&quot;: &quot;N225&quot;,
    &quot;ES=F&quot;: &quot;SP500FutJune20&quot;,
    &quot;LHA.DE&quot;: &quot;Lufthansa&quot;,
    &quot;AAL&quot;: &quot;AmericanAirlines&quot;,
    &quot;NFLX&quot;: &quot;Netflix&quot;,
    &quot;AMZN&quot;: &quot;Amazon&quot;,
    &quot;AAPL&quot;: &quot;Apple&quot;,
    &quot;MSFT&quot;: &quot;Microsoft&quot;,
    &quot;GOOG&quot;: &quot;Google&quot;,
    &quot;BTC-USD&quot;: &quot;BTCUSD&quot;,
    &quot;ETH-USD&quot;: &quot;ETHUSD&quot;,
    &quot;CL=F&quot;: &quot;Oil&quot;,
    &quot;GC=F&quot;: &quot;Gold&quot;,
    #&quot;SM=F&quot;: &quot;Soybean&quot;,
    &quot;ZB=F&quot;: &quot;UsTreasuryBond&quot;,
    &quot;GBPEUR=X&quot;: &quot;GBPEUR&quot;,
    &quot;EURUSD=X&quot;: &quot;EURUSD&quot;,
    &quot;CHFEUR=X&quot;: &quot;CHFEUR&quot;,
    &quot;GBPUSD=X&quot;: &quot;GBPUSD&quot;}

col_list = []
# Join the dataframes
for key, value in asset_dict.items():
    print(key, value)    
    try:
        df_temp = web.DataReader(
            key, start=start_date, end=today_date, data_source=&quot;yahoo&quot;)
    except ValueError: 
        print(f' {key} symbol not found')
    # convert index to Date Format
    df_temp.index = pd.to_datetime(df_temp.index) 
    df_temp.rename(columns={&quot;Close&quot;: value}, inplace=True) # Rename Close Column       
    df_covid_new = pd.merge(
        left=df_covid_new,
        right=df_temp[value],
        how=&quot;inner&quot;,
        left_index=True, right_index=True)     

df_covid_new.head()</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">	cases	deaths	cases_cum	deaths_cum	SP500	DAX			N225		SP500FutJune20	Lufthansa		AmericanAirlines	...	Google		BTCUSD		ETHUSD		Oil	Gold	UsTreasuryBond	GBPEUR	EURUSD	CHFEUR	GBPUSD
Date																					
2020-01-06	0		0			59			0		3246.280029	28.004999	23204.859375	3243.50	15.340	27.320000			...	1394.209961	7769.219238	144.304153	63.270000	1566.199951		157.84375	1.17169	1.116196	0.922110	1.308010
2020-01-07	0		0			59			0		3237.179932	27.955000	23575.720703	3235.25	15.365	27.219999			...	1393.339966	8163.692383	143.543991	62.700001	1571.800049		157.40625	1.17635	1.119799	0.922212	1.317003
2020-01-08	0		0			59			0		3253.050049	28.260000	23204.759766	3260.25	15.540	27.840000			...	1404.319946	8079.862793	141.258133	59.610001	1557.400024		156.37500	1.17551	1.115474	0.925181	1.311372
2020-01-09	0		0			59			0		3274.699951	28.450001	23739.869141	3276.00	16.160	27.950001			...	1419.829956	7879.071289	138.979202	59.560001	1551.699951		156.81250	1.17912	1.111321	0.924505	1.310513
2020-01-10	0		0			59			0		3265.350098	28.500000	23850.570312	3264.75	15.815	27.320000			...	1429.729980	8166.554199	143.963776	59.040001	1557.500000		157.62500	1.17620	1.111111	0.924796	1.307019
5 rows × 24 columns</pre></div>



<p class="wp-block-paragraph">You can add assets of your choice to the asset list if you want. You can find the respective symbols on <a href="https://finance.yahoo.com/" target="_blank" rel="noreferrer noopener">finance.yahoo.com</a>. </p>



<h3 class="wp-block-heading" id="h-step-2-exploring-the-data"> Step #2 Exploring the Data</h3>



<p class="wp-block-paragraph">Next, we will visualize the historical data using line charts.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Create lineplots
list_length = df_covid_new.shape[1]
ncols = 6
nrows = int(round(list_length / ncols, 0))
height = list_length/3 if list_length &gt; 30 else 16

fig, axs = plt.subplots(nrows=nrows, ncols=ncols, sharex=True, figsize=(20, height))

for i, ax in enumerate(fig.axes):
        if i &lt; list_length:
            sns.lineplot(data=df_covid_new, x=df_covid_new.index, y=df_covid_new.iloc[:, i], ax=ax)
            ax.set_title(df_covid_new.columns[i])
            ax.tick_params(labelrotation=45)

plt.show()</pre></div>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="826" data-attachment-id="8516" data-permalink="https://www.relataly.com/stock-market-correlation-matrix-in-python/103/lineplots-1/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/lineplots-1.png" data-orig-size="1186,957" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="lineplots-1" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/lineplots-1.png" src="https://www.relataly.com/wp-content/uploads/2022/05/lineplots-1-1024x826.png" alt="lineplots, correlation matrix python" class="wp-image-8516" srcset="https://www.relataly.com/wp-content/uploads/2022/05/lineplots-1.png 1024w, https://www.relataly.com/wp-content/uploads/2022/05/lineplots-1.png 300w, https://www.relataly.com/wp-content/uploads/2022/05/lineplots-1.png 768w, https://www.relataly.com/wp-content/uploads/2022/05/lineplots-1.png 1186w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p class="wp-block-paragraph">We can easily spot pairs that seem to have experienced similar price developments. This does not mean, however, that these pairs are correlated.</p>



<h3 class="wp-block-heading" id="h-step-3-correlation-matrix">Step #3 Correlation Matrix </h3>



<p class="wp-block-paragraph">Next, we will calculate the correlation matrix. Various Python libraries make this an easy task that only requires a few lines of code. We will use the standard math package for this purpose. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Plotting a diagonal correlation matrix
sns.set(style=&quot;white&quot;)

# Compute the correlation matrix
df = pd.DataFrame(df_covid_new, columns=col_list)
corr = df_covid_new.corr()
corr</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">					cases	deaths	cases_cum	deaths_cum	SP500	DAX	N225	SP500FutJune20	Lufthansa	AmericanAirlines	...	Google	BTCUSD	ETHUSD	Oil	Gold	UsTreasuryBond	GBPEUR	EURUSD	CHFEUR	GBPUSD
cases				1.000000	0.853512	0.972691	0.966481	0.663638	0.519676	0.660547	0.659832	-0.451801	-0.413463	...	0.796671	0.898456	0.899876	0.073393	0.719520	0.147347	-0.566227	0.843788	-0.538949	0.513913
deaths				0.853512	1.000000	0.778833	0.804270	0.399756	0.259080	0.400126	0.395697	-0.590251	-0.589090	...	0.567708	0.705201	0.718329	-0.228573	0.664476	0.399694	-0.574079	0.628463	-0.291254	0.245614
cases_cum			0.972691	0.778833	1.000000	0.974553	0.714616	0.571317	0.711905	0.711552	-0.379420	-0.325739	...	0.812816	0.922179	0.932026	0.142586	0.682001	0.059693	-0.516654	0.865846	-0.584541	0.584691
deaths_cum			0.966481	0.804270	0.974553	1.000000	0.712595	0.587606	0.681964	0.709312	-0.498761	-0.441631	...	0.808086	0.875724	0.925765	0.097746	0.805602	0.193165	-0.626253	0.902159	-0.603867	0.529622
SP500				0.663638	0.399756	0.714616	0.712595	1.000000	0.960100	0.956142	0.999766	0.140961	0.205127	...	0.944084	0.806056	0.801970	0.623960	0.553991	-0.359058	-0.043646	0.738902	-0.791377	0.853893
DAX					0.519676	0.259080	0.571317	0.587606	0.960100	1.000000	0.934535	0.960816	0.246646	0.304234	...	0.860881	0.678125	0.688038	0.715992	0.500840	-0.387279	-0.002362	0.685518	-0.844509	0.826270
N225				0.660547	0.400126	0.711905	0.681964	0.956142	0.934535	1.000000	0.956710	0.240638	0.281306	...	0.922091	0.829050	0.761729	0.655562	0.425364	-0.436453	-0.005655	0.673853	-0.790071	0.810057
SP500FutJune20		0.659832	0.395697	0.711552	0.709312	0.999766	0.960816	0.956710	1.000000	0.147155	0.211133	...	0.943475	0.804529	0.799886	0.627447	0.549565	-0.363198	-0.039701	0.736997	-0.792258	0.855152
Lufthansa			-0.451801	-0.590251	-0.379420	-0.498761	0.140961	0.246646	0.240638	0.147155	1.000000	0.964624	...	-0.006089	-0.135931	-0.296115	0.629831	-0.665533	-0.853762	0.815127	-0.388975	-0.107357	0.262015
AmericanAirlines	-0.413463	-0.589090	-0.325739	-0.441631	0.205127	0.304234	0.281306	0.211133	0.964624	1.000000	...	0.026610	-0.115151	-0.245080	0.658176	-0.603162	-0.877327	0.790366	-0.312451	-0.143469	0.330665
Netflix				0.750950	0.701806	0.721492	0.840104	0.601819	0.523924	0.493603	0.596449	-0.637187	-0.578967	...	0.672056	0.614683	0.749042	-0.027917	0.914606	0.438247	-0.652950	0.766065	-0.460004	0.338608
Amazon				0.801935	0.710040	0.776041	0.887487	0.669833	0.597223	0.564001	0.665996	-0.591990	-0.528531	...	0.732905	0.672651	0.809639	0.049571	0.936733	0.365580	-0.664869	0.848771	-0.562987	0.428907
Apple				0.840178	0.631516	0.862322	0.917166	0.843786	0.765495	0.750124	0.841533	-0.357089	-0.275023	...	0.860493	0.800416	0.906042	0.295665	0.851025	0.081060	-0.499164	0.927081	-0.719334	0.673724
Microsoft			0.772067	0.647593	0.751721	0.849898	0.792196	0.723458	0.689305	0.788468	-0.416892	-0.354098	...	0.833358	0.723236	0.819949	0.206249	0.871319	0.209342	-0.496853	0.807662	-0.598330	0.529434
Google				0.796671	0.567708	0.812816	0.808086	0.944084	0.860881	0.922091	0.943475	-0.006089	0.026610	...	1.000000	0.902355	0.866670	0.492750	0.593879	-0.219884	-0.174525	0.765421	-0.713271	0.770065
BTCUSD				0.898456	0.705201	0.922179	0.875724	0.806056	0.678125	0.829050	0.804529	-0.135931	-0.115151	...	0.902355	1.000000	0.942019	0.315591	0.568836	-0.099474	-0.285379	0.777073	-0.620303	0.685506
ETHUSD				0.899876	0.718329	0.932026	0.925765	0.801970	0.688038	0.761729	0.799886	-0.296115	-0.245080	...	0.866670	0.942019	1.000000	0.242502	0.740186	0.068097	-0.419289	0.886153	-0.644605	0.696074
Oil					0.073393	-0.228573	0.142586	0.097746	0.623960	0.715992	0.655562	0.627447	0.629831	0.658176	...	0.492750	0.315591	0.242502	1.000000	-0.035808	-0.685471	0.344647	0.261168	-0.615400	0.626496
Gold				0.719520	0.664476	0.682001	0.805602	0.553991	0.500840	0.425364	0.549565	-0.665533	-0.603162	...	0.593879	0.568836	0.740186	-0.035808	1.000000	0.485554	-0.672429	0.815864	-0.489188	0.381673
UsTreasuryBond		0.147347	0.399694	0.059693	0.193165	-0.359058	-0.387279	-0.436453	-0.363198	-0.853762	-0.877327	...	-0.219884	-0.099474	0.068097	-0.685471	0.485554	1.000000	-0.667468	0.154001	0.278546	-0.412731
GBPEUR				-0.566227	-0.574079	-0.516654	-0.626253	-0.043646	-0.002362	-0.005655	-0.039701	0.815127	0.790366	...	-0.174525	-0.285379	-0.419289	0.344647	-0.672429	-0.667468	1.000000	-0.586152	0.230223	0.187170
EURUSD				0.843788	0.628463	0.865846	0.902159	0.738902	0.685518	0.673853	0.736997	-0.388975	-0.312451	...	0.765421	0.777073	0.886153	0.261168	0.815864	0.154001	-0.586152	1.000000	-0.756216	0.686032
CHFEUR				-0.538949	-0.291254	-0.584541	-0.603867	-0.791377	-0.844509	-0.790071	-0.792258	-0.107357	-0.143469	...	-0.713271	-0.620303	-0.644605	-0.615400	-0.489188	0.278546	0.230223	-0.756216	1.000000	-0.711504
GBPUSD				0.513913	0.245614	0.584691	0.529622	0.853893	0.826270	0.810057	0.855152	0.262015	0.330665	...	0.770065	0.685506	0.696074	0.626496	0.381673	-0.412731	0.187170	0.686032	-0.711504	1.000000
24 rows × 24 columns</pre></div>



<p class="wp-block-paragraph">The matrix shows the Pearson correlation coefficients of all the pairs (X, Y) in our dataset.</p>



<h3 class="wp-block-heading" id="h-step-4-visualizing-the-correlation-matrix-in-a-heatmap">Step #4 Visualizing the Correlation Matrix in a Heatmap </h3>



<p class="wp-block-paragraph">Heatmaps are an excellent choice for visualizing a correlation matrix. The heatmap applies a color palette to represent numeric values on a scale in different colors. This makes it easier to capture differences and similarities among the correlation coefficients. In Python, we can create heatmaps using the <a href="https://seaborn.pydata.org/" target="_blank" rel="noreferrer noopener">Seaborn package</a>. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Generate a mask for the upper triangle
mask = np.triu(np.ones_like(corr, dtype=np.bool))

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))

# Generate a custom diverging colormap
cmap = &quot;RdBu&quot;

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(
    corr,
    mask=mask,
    cmap=cmap,
    center=0,
    square=True,
    linewidths=0.5,
    cbar_kws={&quot;shrink&quot;: 0.5},
)</pre></div>



<figure class="wp-block-image size-large"><img decoding="async" width="672" height="598" data-attachment-id="1240" data-permalink="https://www.relataly.com/stock-market-correlation-matrix-in-python/103/image-45-2/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2020/05/image-45.png" data-orig-size="672,598" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-45" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2020/05/image-45.png" src="https://www.relataly.com/wp-content/uploads/2020/05/image-45.png" alt="Visualization of the Correlation Matrix in Form of a Heatmap" class="wp-image-1240" srcset="https://www.relataly.com/wp-content/uploads/2020/05/image-45.png 672w, https://www.relataly.com/wp-content/uploads/2020/05/image-45.png 300w" sizes="(max-width: 672px) 100vw, 672px" /><figcaption class="wp-element-caption">Visualization of the Correlation Matrix in the form of a Heatmap</figcaption></figure>



<p class="wp-block-paragraph">The correlation matrix is symmetric. This is because the correlation between a pair of variables X and Y is the same as between Y and X.</p>



<h3 class="wp-block-heading" id="h-step-5-interpretation">Step #5 Interpretation</h3>



<p class="wp-block-paragraph">The heatmap uses a color palette that ranges from blue (positive correlation) over white (no correlation) to red (negative correlation). The different shades of the three colors visualize the extent of the correlation. We can distinguish between correlated pairs, uncorrelated pairs, and negatively correlated pairs. We will compare the different asset classes step by step in the following.</p>



<h4 class="wp-block-heading" id="h-5-1-stock-market-indices-covid-19">5.1 Stock Market Indices / COVID-19</h4>



<p class="wp-block-paragraph">Let us start with the pairs of Stock market indices and COVID-19 data. The heatmap signals a negative correlation between the indices (DAX, S&amp;P500, NIKI) and COVID-19. In other words, when the number of cases rises, stock market indices tend to fall in value. If we look precisely, the total number of new cases seems more correlated than the number of cases (cases_cum) or deaths (deaths_cum). In addition, one can observe that the stock market indices are correlated.</p>



<h4 class="wp-block-heading" id="h-5-2-stock-market-indices-online-service-provider-stocks">5.2 Stock Market Indices / Online Service Provider Stocks</h4>



<p class="wp-block-paragraph">The situation is heterogeneous when we compare the stock markets with the shares of online service providers. There is a positive correlation between the shares of Microsoft and Google and the overall development of the markets. On the other hand, the shares of Netflix, Amazon, and Apple are hardly correlated with market development.</p>



<h4 class="wp-block-heading" id="h-5-3-stock-market-indices-airline-stocks">5.3 Stock Market Indices / Airline Stocks </h4>



<p class="wp-block-paragraph">Airlines are heavily affected by the pandemic. Thus it is plausible that we observe a strong positive correlation between airline stocks and the general stock market indices. </p>



<h4 class="wp-block-heading" id="h-5-4-stock-market-indices-crypto-currencies">5.4 Stock Market Indices / Crypto-Currencies</h4>



<p class="wp-block-paragraph">Next, we compare Cryptocurrencies with the stock market indices. The results are surprising. BTC-USD correlates surprisingly strong positive with the general development of the stock markets. However, the correlation is only slightly positive for ETH-USD and the markets.</p>



<h4 class="wp-block-heading" id="h-5-5-covid-19-currency-exchange-rates">5.5 COVID-19 / Currency Exchange Rates</h4>



<p class="wp-block-paragraph">The correlation between exchange rates and COVID-19 cases is relatively weak. Only GBP/EUR, EUR/USD, and GBP/USD show a slightly negative correlation. An exception is CHF/EUR, which positively correlates to the number of COVID-19 cases. </p>



<h4 class="wp-block-heading" id="h-5-6-treasury-bonds-resources">5.6 Treasury Bonds / Resources</h4>



<p class="wp-block-paragraph">Looking at the coefficients of resources and US Treasury Bonds, we can observe a strong negative correlation between COVID-19 cases and the oil price and a strong positive correlation with the gold price.</p>



<h4 class="wp-block-heading" id="h-5-7-crypto-currencies-resources">5.7 Crypto-Currencies / Resources</h4>



<p class="wp-block-paragraph">Finally, let us consider the coefficients of resources and cryptocurrencies. It is noticeable that BTCUSD correlates with the oil price. Based on the absence of a correlation with gold, one might conclude that BTC-USD is not a comparable crisis currency. However, the correlation between market indices and cryptocurrencies such as ETH-USD is relatively low. Thus, they were less affected by the recent market slump.</p>



<p class="wp-block-paragraph">Also: <a href="https://www.relataly.com/stock-market-prediction-using-multivariate-time-series-in-python/1815/" target="_blank" rel="noreferrer noopener">Stock Market Prediction using Multivariate Data</a> </p>



<h2 class="wp-block-heading" id="h-summary">Summary</h2>



<p class="wp-block-paragraph">Congratulation, you have reached the end of this tutorial! In this article, we have load data on COVID-19 and financial assets via an API. We have created a correlation matrix in Python that shows the linear correlation between financial assets and COVID-19 cases. Finally, we have visualized the matrix in a heatmap and concluded the correlation of different asset pairs. However, we must remember that we may still be unaware of potential non-linear correlations.</p>



<p class="wp-block-paragraph">Please show your appreciation by leaving a like or comment if you found this article helpful. </p>



<p class="wp-block-paragraph">And if you are interested to learn more about an advanced use case for correlation analysis, please take a look at <a href="https://www.relataly.com/crypto-market-cluster-analysis-using-affinity-propagation-python/8114/" target="_blank" rel="noreferrer noopener">this article on clustering cryptocurrencies</a>.</p>



<h2 class="wp-block-heading" id="h-sources-and-further-reading"><strong>Sources and Further Reading</strong></h2>



<ul class="wp-block-list">
<li><a href="https://www.youtube.com/watch?v=qtaqvPAeEJY&amp;t=117s" target="_blank" rel="noreferrer noopener">YouTube tutorial that explains the math behind the correlation</a></li>



<li><a href="https://amzn.to/3MAy8j5" target="_blank" rel="noreferrer noopener">Andriy Burkov (2020) Machine Learning Engineering</a></li>



<li><a href="https://www.relataly.com/crypto-market-cluster-analysis-using-affinity-propagation-python/8114/" target="_blank" rel="noreferrer noopener">Uncover Hidden Patterns in Financial Markets using Affinity Propagation Clustering in Python</a></li>
</ul>



<p class="wp-block-paragraph">Images created with Midjourney.</p>
<p>The post <a href="https://www.relataly.com/stock-market-correlation-matrix-in-python/103/">Correlation Matrix in Python: How Correlated are COVID-19 Cases and Different Financial Assets?</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.relataly.com/stock-market-correlation-matrix-in-python/103/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">103</post-id>	</item>
		<item>
		<title>Accessing Remote Data Sources via REST APIs in Python</title>
		<link>https://www.relataly.com/access-remote-data-sources-using-rest-apis-in-python/278/</link>
					<comments>https://www.relataly.com/access-remote-data-sources-using-rest-apis-in-python/278/#respond</comments>
		
		<dc:creator><![CDATA[Florian Follonier]]></dc:creator>
		<pubDate>Sun, 01 Mar 2020 13:15:00 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[Data Sources]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Statworx COVID-19 API]]></category>
		<category><![CDATA[Tech]]></category>
		<category><![CDATA[Yahoo Finance API]]></category>
		<category><![CDATA[API Tutorials]]></category>
		<category><![CDATA[Beginner Tutorials]]></category>
		<category><![CDATA[Covid-19 Analytics]]></category>
		<category><![CDATA[Requesting Data via REST APIs]]></category>
		<category><![CDATA[Transaction Data]]></category>
		<guid isPermaLink="false">https://www.relataly.com/?p=278</guid>

					<description><![CDATA[<p>REST APIs provide straightforward access to remote data sources. Data scientists should learn about REST APIs because APIs (Application Programming Interfaces) are an important way for data scientists to access data from other sources. By using REST APIs, data scientists can access data from a wide range of sources, including databases, web services, and other ... <a title="Accessing Remote Data Sources via REST APIs in Python" class="read-more" href="https://www.relataly.com/access-remote-data-sources-using-rest-apis-in-python/278/" aria-label="Read more about Accessing Remote Data Sources via REST APIs in Python">Read more</a></p>
<p>The post <a href="https://www.relataly.com/access-remote-data-sources-using-rest-apis-in-python/278/">Accessing Remote Data Sources via REST APIs in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%"></div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<p class="wp-block-paragraph">REST APIs provide straightforward access to remote data sources. Data scientists should learn about REST APIs because APIs (Application Programming Interfaces) are an important way for data scientists to access data from other sources. By using REST APIs, data scientists can access data from a wide range of sources, including databases, web services, and other applications. This allows data scientists to easily integrate data from different sources and use it in their analysis and modeling. Additionally, knowing how to use REST APIs can also be useful for data scientists who want to share their own data or models with others by building APIs that can be accessed by other applications. This article briefly describes how to access remote data sources via REST APIs in Python.</p>



<p class="wp-block-paragraph">The article proceeds as follows: We begin with a brief introduction to RESTful APIs. Then,  we will look at two examples of how to access remote APIs. We will define API requests, learn about parameters, and request data from an API. We will handle the response and save it to a dataframe or a local CSV file. </p>



<p class="wp-block-paragraph">Also: <a href="https://www.relataly.com/accessing-twitter-data-via-the-twitter-rest-api/1976/" target="_blank" rel="noreferrer noopener">Streaming Tweets and Images via the Twitter API in Python</a></p>



<h2 class="wp-block-heading" id="h-what-is-an-api">What is an API?</h2>



<p class="wp-block-paragraph">In Data Science, REST APIs are often used as a modern shortcut to consume remote data sources and services on the internet or make them available to others. Whether these sources are publicly available or reside in corporate networks, data scientists need to understand how to work with API. In the broader sense, an API is a contract between the provider and the consumer of a web service, who communicate and exchange data. The agreement is necessary because communication can lead to misunderstandings, for example, when one party sends information that is not as expected. The contract avoids such misunderstandings by defining standards for what the parties communicate and how it is communicated. </p>



<h2 class="wp-block-heading" id="h-what-are-restful-apis">What are RESTful APIs?</h2>



<p class="wp-block-paragraph">A popular architectural style used to design APIs is the Representational State Transfer (REST) standard. REST has become very popular in recent years and is often considered a more straightforward and modern alternative to the traditional Simple Object Access Protocol (SOAP). Both SOAP and REST are based on established rules, compliance with which is the basis of automated information exchange. However, in data science, REST is now the most common.</p>


<div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><img decoding="async" data-attachment-id="1145" data-permalink="https://www.relataly.com/access-remote-data-sources-using-rest-apis-in-python/278/rest-apis/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2020/05/REST-APIs.png" data-orig-size="1241,689" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="REST-APIs" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2020/05/REST-APIs.png" src="https://www.relataly.com/wp-content/uploads/2020/05/REST-APIs-1024x569.png" alt="Working with REST APIs in Python " class="wp-image-1145" width="609" height="337" srcset="https://www.relataly.com/wp-content/uploads/2020/05/REST-APIs.png 1024w, https://www.relataly.com/wp-content/uploads/2020/05/REST-APIs.png 300w, https://www.relataly.com/wp-content/uploads/2020/05/REST-APIs.png 768w, https://www.relataly.com/wp-content/uploads/2020/05/REST-APIs.png 1241w" sizes="(max-width: 609px) 100vw, 609px" /><figcaption class="wp-element-caption">Communication via an API</figcaption></figure>
</div>


<p class="wp-block-paragraph">If you work with public APIs in Python, you will most likely use the REST protocol. A service provider that operates a REST API exposes an URL that receives requests. Requests to the resource URL can have a JSON, HTML, or XML format payload. A REST API will typically return the response in JSON format, but other formats, such as comma-separated values (CSV), are also possible. REST defines different HTTP methods (GET, HEAD, POST, PUT, PATCH, DELETE, CONNECT, OPTIONS and TRACE). However, the most important ones are:</p>



<ul class="wp-block-list">
<li><strong>GET: </strong>Used to request data from a data provider.</li>



<li><strong>POST</strong>: Typically used to send new data to the service provider. Sometimes also used to define the data that the APIs returns in response to subsequent requests.</li>



<li><strong>PUT: </strong>Used to update data at the service provider.</li>
</ul>



<p class="wp-block-paragraph">Next, let&#8217;s look at how we can interact with RET APIs in Python. </p>



<h2 class="wp-block-heading">How can we interact with REST APIs in Python?</h2>



<p class="wp-block-paragraph">There are several ways to access REST APIs in Python, including:</p>



<ol class="wp-block-list">
<li>Using the requests library: The requests library is a popular Python library for making HTTP requests. It provides a simple, intuitive API for sending and receiving requests, and supports a wide range of HTTP methods, such as GET, POST, and DELETE.</li>



<li>Using the urllib library: The urllib library is a built-in Python library for working with URLs and HTTP requests. It provides a lower-level API than the requests library but is still relatively easy to use and allows you to customize your requests in more detail.</li>



<li>Using the HTTP.client library: The HTTP.client library is another built-in Python library for working with HTTP requests. It provides a more comprehensive and robust set of tools for working with HTTP requests but may require more programming effort to use than the other options.</li>



<li>Service-specific libraries: Many popular online services can be accessed using Python libraries that offer built-in functionality for interacting with REST-based APIs. For example, Twitter, Reddit, or Yahoo Finance offer libraries that make interacting with their REST APIs much easier. </li>
</ol>



<p class="wp-block-paragraph">The choice of which Python library to use for accessing REST APIs will depend on your specific needs and preferences. You may want to try out a few different options and see which one works best for your project. In the following tutorial, we will work with the requests library.</p>



<div style="height:14px" aria-hidden="true" class="wp-block-spacer"></div>



<h3 class="wp-block-heading" id="h-requesting-historical-covid-19-data-from-the-statworx-api-with-the-request-library">Requesting historical COVID-19 Data from the Statworx API with the Request Library</h3>



<p class="wp-block-paragraph">There are many APIs out there that provide more or less reliable data on COVID-19 cases. A good one to use is <a href="https://api.statworx.com/covid" target="_blank" rel="noreferrer noopener">api.statworx.com/covid</a>. This API offers historical data on the number of COVID-19 cases. The API is very accessible since it does not require an authentication key. </p>



<p class="wp-block-paragraph">In the following, we will go through two different cases on how to request data from REST APIs in Python. First, we use the <a href="https://pypi.org/project/requests/" target="_blank" rel="noreferrer noopener">Requests library</a> to request COVID-19 data from the statworx library. We will call this API using the request package, a standard package to interact with APIs in Python. An alternative would be to use a Python package that provides API-specific functions to interact with the API. We will look at this case in the second example. </p>



<p class="wp-block-paragraph">The code is available on the GitHub repository.</p>



<div class="wp-block-kadence-advancedbtn kb-buttons-wrap kb-btns_8e4103-8f"><a class="kb-button kt-button button kb-btn_640363-e3 kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit kt-btn-has-text-true kt-btn-has-svg-true wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-API-tutorials/blob/main/101%20Pulling%20COVID-19%20Data%20via%20the%20Statworx%20API%20to%20a%20DataFrame.ipynb" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fe_eye kt-btn-icon-side-left"><svg viewBox="0 0 24 24"  fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M1 12s4-8 11-8 11 8 11 8-4 8-11 8-11-8-11-8z"/><circle cx="12" cy="12" r="3"/></svg></span><span class="kt-btn-inner-text">View on GitHub </span></a>

<a class="kb-button kt-button button kb-btn_43b657-b3 kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit kt-btn-has-text-true kt-btn-has-svg-true wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-API-tutorials" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fa_github kt-btn-icon-side-left"><svg viewBox="0 0 496 512"  fill="currentColor" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg></span><span class="kt-btn-inner-text">Relataly GitHub Repo </span></a></div>



<h3 class="wp-block-heading" id="h-prerequisites">Prerequisites</h3>



<p class="wp-block-paragraph">Before starting the coding part, ensure you have set up your <a href="https://www.python.org/downloads/" target="_blank" rel="noreferrer noopener">Python 3</a> environment and required libraries. If you don&#8217;t have an environment, follow&nbsp;the steps in <a href="https://www.relataly.com/anaconda-python-environment-machine-learning/1663/" target="_blank" rel="noreferrer noopener">this tutorial</a>&nbsp;to set up the&nbsp;<a href="https://www.anaconda.com/products/individual" target="_blank" rel="noreferrer noopener">Anaconda environment</a>.</p>



<p class="wp-block-paragraph">Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:&nbsp;</p>



<ul class="wp-block-list">
<li><em><a href="https://pandas.pydata.org/" target="_blank" rel="noreferrer noopener">pandas</a></em></li>



<li><em><a href="https://numpy.org/" target="_blank" rel="noreferrer noopener">NumPy</a></em></li>



<li><a href="https://docs.python.org/3/library/math.html" target="_blank" rel="noreferrer noopener">math</a></li>



<li><em><a href="https://matplotlib.org/" target="_blank" rel="noreferrer noopener">matplotlib</a></em></li>
</ul>



<p class="wp-block-paragraph">You can install packages using console commands:</p>



<ul class="wp-block-list">
<li><em>pip install &lt;package name&gt;</em></li>



<li><em>conda install &lt;package name&gt;</em>&nbsp;(if you are using the anaconda packet manager)</li>
</ul>



<p class="wp-block-paragraph">In addition, we will be working with the <a href="https://github.com/psf/requests" target="_blank" rel="noreferrer noopener">Requests library</a>. The requests library is a standard HTTP library that interacts with REST APIs. It provides functionality to send an HTTP request to an API and receive a response. Since the requests library is a native Python, you usually won&#8217;t need to install it.</p>



<h4 class="wp-block-heading" id="h-step-1-define-the-payload">Step #1 Define the Payload</h4>



<p class="wp-block-paragraph">In order to make a request to an API, it is necessary to specify the URL that the request should be sent to. This URL is typically provided by the API documentation and may include certain parameters or queries to specify the specific data that you want to retrieve. For example, you might want to specify a specific date range or filter the results by a certain category.</p>



<p class="wp-block-paragraph">In the following, we send a post request to the statworx.com API and get back COVID-19 data in JSON format as a response. We can then convert the response into a dataframe. With the following code, we can send an HTTP request to the URL provided and will get the requested data back in JSON format:</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># the data is provided from the european centre for disease prevention and control
from datetime import date
import requests
import json
import pandas as pd
# import matplotlib.pyplot as plt

# define the payload that will be sent to the api endpoint, and the endpoint url
# code defines the countries for which we will retrieve data
# to retrieve data for specific countries use e.g. {'country': 'Germany'}
payload = {&quot;code&quot;: &quot;ALL&quot;}  
URL = &quot;https://api.statworx.com/covid&quot;</pre></div>



<p class="wp-block-paragraph">We can change the data to get back in the response by altering the parameters in our request. For example, we have specified the country code as &#8220;US&#8221; in the payload. Thus, the response contains only COVID-19 data for the US. If we want data for all countries, we need to set the code to ALL. </p>



<h4 class="wp-block-heading" id="h-step-2-call-the-rest-api-endpoint">Step #2 Call the REST API Endpoint</h4>



<p class="wp-block-paragraph">Once you have defined the URL, you can use Python&#8217;s built-in &#8220;requests&#8221; library to send an HTTP request to the API and retrieve the data. This can be done using the &#8220;get&#8221; method, which sends a GET request to the specified URL and returns a response object containing the data returned by the API.</p>



<p class="wp-block-paragraph">It is also important to consider any authentication or authorization requirements that the API may have. The statworx API does not require authentication. However, some APIs may require you to provide a valid API key or other credentials in order to access the data. These requirements will also be specified in the API documentation.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># call the api
response = requests.post(url=URL, data=json.dumps(payload))
response # if the request was successful, you should see a response code 200</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">&lt;Response [200]&gt;</pre></div>



<h4 class="wp-block-heading" id="h-step-3-convert-the-data-to-a-dataframe">Step #3 Convert the Data to a DataFrame</h4>



<p class="wp-block-paragraph">When making an API request using Python, the API will typically return a response in JSON format. JSON (JavaScript Object Notation) is a lightweight data interchange format that is easy for humans to read and write and easy for machines to parse and generate. It is often used as a default data format in REST APIs because it is easy to work with and can be easily converted into other data formats such as CSV or Pandas DataFrames.</p>



<p class="wp-block-paragraph">A common way to handle the response from an API is to convert it into a DataFrame using the &#8220;JSON&#8221; method of the Pandas library. This method takes a JSON object or a list of JSON objects and converts it into a DataFrame. Once the data is in a DataFrame, it can be easily processed and analyzed using the various tools and functions provided by Pandas.</p>



<p class="wp-block-paragraph">It is also possible to directly parse the JSON response using Python&#8217;s built-in &#8220;JSON&#8221; module. This can be useful if you want to extract specific pieces of data from the response or if you want to perform additional processing on the data before converting it into a DataFrame.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># convert the response data to a data frame
df = pd.DataFrame.from_dict(json.loads(response.text))
df</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">	date		day	month	year	cases	deaths	country		code	population	continent	cases_cum	deaths_cum
0	2019-12-31	31	12		2019	0		0		Afghanistan	AF		38041757.0	Asia		0			0
1	2020-01-01	01	01		2020	0		0		Afghanistan	AF		38041757.0	Asia		0			0
2	2020-01-02	02	01		2020	0		0		Afghanistan	AF		38041757.0	Asia		0			0
3	2020-01-03	03	01		2020	0		0		Afghanistan	AF		38041757.0	Asia		0			0
4	2020-01-04	04	01		2020	0		0		Afghanistan	AF		38041757.0	Asia		0			0
...	...			...	...		...		...		...		...			...		...			...			...			...</pre></div>



<h4 class="wp-block-heading" id="h-step-4-filter-the-data">Step #4 Filter the Data</h4>



<p class="wp-block-paragraph">Now let&#8217;s make something out of the data and create a simple plot. Our goal is to create a lineplot that shows how cases have developed for different countries. As part of the data preprocessing, we will reduce the data to include a smaller number of selected countries. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># convert date column to date format
df.loc[:, &quot;date&quot;] = pd.to_datetime(df[&quot;date&quot;], format=&quot;%Y-%m-%d&quot;)

# filter specific countries
list_of_countries = [&quot;Germany&quot;, &quot;Switzerland&quot;, &quot;France&quot;, &quot;Spain&quot;, &quot;Canada&quot;]
df_1 = df[df[&quot;country&quot;].isin(list_of_countries)]

# filter the data to a specific timeframe
df_new = df_1[df_1[&quot;date&quot;] &gt; &quot;2020-01-15&quot;]
df_new.head()</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">		date		day	month	year	cases	deaths	country	code	population	continent	cases_cum	deaths_cum
10332	2020-01-16	16	01		2020	0		0		Canada	CA		37411038.0	America		0			0
10333	2020-01-17	17	01		2020	0		0		Canada	CA		37411038.0	America		0			0
10334	2020-01-18	18	01		2020	0		0		Canada	CA		37411038.0	America		0			0
10335	2020-01-19	19	01		2020	0		0		Canada	CA		37411038.0	America		0			0
10336	2020-01-20	20	01		2020	0		0		Canada	CA		37411038.0	America		0			0</pre></div>



<h4 class="wp-block-heading" id="h-step-5-plot-the-case-counts">Step #5 Plot the Case Counts</h4>



<p class="wp-block-paragraph">Finally, we use Matplotlib to create a lineplot that visualizes the data that we have received from the API. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># create separate plot lines for each country in the dataset
fig, ax1 = plt.subplots(figsize=(12, 8))
plt.ylabel(&quot;Total Cases&quot;, fontsize=20, color=&quot;black&quot;)
for countryname in list_of_countries:
    x = df_new[df_new[&quot;country&quot;] == countryname][&quot;date&quot;]
    y = df_new[df_new[&quot;country&quot;] == countryname][&quot;cases_cum&quot;]
    plt.plot(x, y, label=countryname)

plt.legend(list_of_countries, loc=&quot;upper left&quot;)
plt.show()</pre></div>



<figure class="wp-block-image size-large"><img decoding="async" width="729" height="477" data-attachment-id="3817" data-permalink="https://www.relataly.com/access-remote-data-sources-using-rest-apis-in-python/278/image-44-3/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2021/04/image-44.png" data-orig-size="729,477" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-44" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2021/04/image-44.png" src="https://www.relataly.com/wp-content/uploads/2021/04/image-44.png" alt="" class="wp-image-3817" srcset="https://www.relataly.com/wp-content/uploads/2021/04/image-44.png 729w, https://www.relataly.com/wp-content/uploads/2021/04/image-44.png 300w" sizes="(max-width: 729px) 100vw, 729px" /></figure>



<p class="wp-block-paragraph">If your chart looks similar to the one above, you can be confident that you have successfully loaded the data into our project. </p>



<div style="height:16px" aria-hidden="true" class="wp-block-spacer"></div>



<h2 class="wp-block-heading" id="h-summary">Summary</h2>



<p class="wp-block-paragraph">In conclusion, this tutorial has demonstrated how to use REST APIs in Python to access and retrieve data from remote sources. We used the statworx API to fetch COVID-19 data and used the Pandas library to convert the data into a DataFrame. We also created a small sample plot to visualize the number of COVID-19 cases in different countries.</p>



<p class="wp-block-paragraph">By following the steps outlined in this tutorial, you should now be able to use Python to access and work with data from a variety of sources using REST APIs. This can be a powerful tool for data scientists and analysts, allowing them to easily retrieve and analyze data from a wide range of sources in order to gain insights and make informed decisions.</p>



<p class="wp-block-paragraph">I hope this post was helpful. If you have any remarks or questions, let me know.</p>



<h2 class="wp-block-heading">Sources and Further Reading</h2>



<ol class="wp-block-list">
<li><a href="https://www.statworx.com/en/content-hub/blog/making-of-a-free-api-for-covid-19-data/"></a><a href="https://www.statworx.com/en/content-hub/blog/making-of-a-free-api-for-covid-19-data/" target="_blank" rel="noreferrer noopener">statworx API</a></li>



<li><a href="https://pypi.org/project/requests/" target="_blank" rel="noreferrer noopener">https://pypi.org/project/requests/</a></li>
</ol>



<p class="wp-block-paragraph"><strong>Relataly API Tutorials</strong></p>


<ul class="wp-block-kadence-posts kb-posts kadence-posts-list kb-posts-id-_91f1a0-a4 content-wrap grid-cols kb-posts-style-boxed grid-sm-col-1 grid-lg-col-3 item-image-style-above"><li class="kb-post-list-item">
	<article class="entry content-bg loop-entry post-3474 post type-post status-publish format-standard has-post-thumbnail hentry category-coinmarketcap-api category-rest-apis category-sqlite category-stock-market-forecasting tag-api-tutorials tag-beginner-tutorials tag-bitcoin tag-cryptocurrencies tag-peewee">
				<a aria-hidden="true" tabindex="-1" role="presentation" class="post-thumbnail kadence-thumbnail-ratio-inherit" href="https://www.relataly.com/requesting-crypto-price-data-from-the-coinmarketcap-api-using-python/3474/" aria-label="Requesting Crypto Prices from the Coinmarketcap API using Python">
			<div class="post-thumbnail-inner">
				<img decoding="async" width="768" height="303" src="https://www.relataly.com/wp-content/uploads/2023/03/gateio-api-tutorial-relataly-machine-learning-gate-to-cryptocurrency-data-min.png" class="attachment-medium_large size-medium_large wp-post-image" alt="coinmarketcap rest api python tutorial relataly midjourney" srcset="https://www.relataly.com/wp-content/uploads/2023/03/gateio-api-tutorial-relataly-machine-learning-gate-to-cryptocurrency-data-min.png 1363w, https://www.relataly.com/wp-content/uploads/2023/03/gateio-api-tutorial-relataly-machine-learning-gate-to-cryptocurrency-data-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/03/gateio-api-tutorial-relataly-machine-learning-gate-to-cryptocurrency-data-min.png 512w, https://www.relataly.com/wp-content/uploads/2023/03/gateio-api-tutorial-relataly-machine-learning-gate-to-cryptocurrency-data-min.png 768w" sizes="(max-width: 768px) 100vw, 768px" data-attachment-id="12600" data-permalink="https://www.relataly.com/gateio-api-tutorial-relataly-machine-learning-gate-to-cryptocurrency-data-min/" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/03/gateio-api-tutorial-relataly-machine-learning-gate-to-cryptocurrency-data-min.png" data-orig-size="1363,538" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="coinmarketcap rest api python tutorial relataly midjourney" data-image-description="&lt;p&gt;coinmarketcap rest api python tutorial relataly midjourney&lt;/p&gt;
" data-image-caption="&lt;p&gt;coinmarketcap rest api python tutorial relataly midjourney&lt;/p&gt;
" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/gateio-api-tutorial-relataly-machine-learning-gate-to-cryptocurrency-data-min.png" />			</div>
		</a><!-- .post-thumbnail -->
				<div class="entry-content-wrap">
			<header class="entry-header">
	<h2 class="entry-title"><a href="https://www.relataly.com/requesting-crypto-price-data-from-the-coinmarketcap-api-using-python/3474/" rel="bookmark">Requesting Crypto Prices from the Coinmarketcap API using Python</a></h2></header><!-- .entry-header -->
<footer class="entry-footer">
	</footer><!-- .entry-footer -->		</div>
	</article>
</li>
</ul><p>The post <a href="https://www.relataly.com/access-remote-data-sources-using-rest-apis-in-python/278/">Accessing Remote Data Sources via REST APIs in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.relataly.com/access-remote-data-sources-using-rest-apis-in-python/278/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">278</post-id>	</item>
	</channel>
</rss>
