<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Supervised Learning Archives - relataly.com</title>
	<atom:link href="https://www.relataly.com/tag/supervised-learning/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.relataly.com/tag/supervised-learning/</link>
	<description>The Business AI Blog</description>
	<lastBuildDate>Thu, 13 Jul 2023 12:10:12 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://www.relataly.com/wp-content/uploads/2023/04/cropped-AI-cat-Icon-White.png</url>
	<title>Supervised Learning Archives - relataly.com</title>
	<link>https://www.relataly.com/tag/supervised-learning/</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">175977316</site>	<item>
		<title>Univariate Stock Market Forecasting using Facebook Prophet in Python</title>
		<link>https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/</link>
					<comments>https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/#comments</comments>
		
		<dc:creator><![CDATA[Florian Follonier]]></dc:creator>
		<pubDate>Thu, 15 Dec 2022 22:54:34 +0000</pubDate>
				<category><![CDATA[CryptoCompare API]]></category>
		<category><![CDATA[Facebook Prophet]]></category>
		<category><![CDATA[Finance]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[REST APIs]]></category>
		<category><![CDATA[Seaborn]]></category>
		<category><![CDATA[Stock Market Forecasting]]></category>
		<category><![CDATA[Time Series Forecasting]]></category>
		<category><![CDATA[Use Cases]]></category>
		<category><![CDATA[Yahoo Finance API]]></category>
		<category><![CDATA[AI in Finance]]></category>
		<category><![CDATA[Intermediate Tutorials]]></category>
		<category><![CDATA[Supervised Learning]]></category>
		<guid isPermaLink="false">https://www.relataly.com/?p=10351</guid>

					<description><![CDATA[<p>Have you ever wondered how Facebook predicts the future? Meet Facebook Prophet, the open-source time series forecasting tool developed by Facebook&#8217;s Core Data Science team. Built on top of the PyStan library, Facebook Prophet offers a simple and intuitive interface for creating forecasts using historical data. What sets Facebook Prophet apart is its highly modular ... <a title="Univariate Stock Market Forecasting using Facebook Prophet in Python" class="read-more" href="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/" aria-label="Read more about Univariate Stock Market Forecasting using Facebook Prophet in Python">Read more</a></p>
<p>The post <a href="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/">Univariate Stock Market Forecasting using Facebook Prophet in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Have you ever wondered how Facebook predicts the future? Meet Facebook Prophet, the open-source time series forecasting tool developed by Facebook&#8217;s Core Data Science team. Built on top of the PyStan library, Facebook Prophet offers a simple and intuitive interface for creating forecasts using historical data. What sets Facebook Prophet apart is its highly modular design, allowing for a range of customizable components that can be combined to create a wide variety of forecasting models. This makes it perfect for modeling data with strong seasonal effects, like daily or weekly patterns, and it can handle missing data and outliers with ease. In this tutorial, we will take a closer look at the capabilities of Facebook Prophet and see how it can be used to make accurate predictions.</p>



<p>We begin with a brief discussion of how the Facebook Prophet decomposes a time series into different components. Then we turn to the hands-on part. you can use its model in Python to generate a stock market forecast. We will train our Facebook Prophet model using the historical price of the Coca-Cola stock. We will also cover different options to customize the model settings.</p>



<p class="has-accent-color has-text-color has-background" style="background:linear-gradient(135deg,rgb(255,206,236) 68%,rgba(150,149,240,0.4) 100%)"><strong>Disclaimer</strong>: This article does not constitute financial advice. Stock markets can be very volatile and are generally difficult to predict. Predictive models and other forms of analytics applied in this article only illustrate machine learning use cases.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="10371" data-permalink="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/image-31-7/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-31.png" data-orig-size="474,143" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-31" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-31.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-31.png" alt="Facebook Prophet - an open-source tool for univariate time series forecasting" class="wp-image-10371" width="380" height="113"/><figcaption class="wp-element-caption">Facebook Prophet &#8211; an open-source tool for time series forecasting</figcaption></figure>
</div>
</div>



<div style="height:34px" aria-hidden="true" class="wp-block-spacer"></div>



<h2 class="wp-block-heading">What is Facebook Prophet?</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Facebook Prophet is a tool that can be used to make predictions about future events based on historical data. It was developed by <a href="https://peerj.com/preprints/3190/" target="_blank" rel="noreferrer noopener">Taylor and Letham, 2017</a>, who later made it available as an open-source project. The authors developed Facebook Prophet to solve various business forecasting problems without requiring much prior knowledge. In this way, the framework addresses a significant problem many companies face today. They have various prediction problems (e.g., capacity and demand forecasting) but face a skill gap when it comes to generating reliable forecasts with techniques such as ARIMA or neural networks. Compared to that, Facebook Prophet requires minimal fine-tuning and can deal with various challenges, including seasonality, outliers, and changing trend lines. This allows Facebook Prophet to handle a wide range of forecasting problems flexibly. Before we dive into the hands-on part, let&#8217;s gain a quick overview of how Facebook Prophet works.</p>



<p>Also: <a href="https://www.relataly.com/stock-market-prediction-using-multivariate-time-series-in-python/1815/" target="_blank" rel="noreferrer noopener">Stock Market Prediction using Multivariate Time Series</a></p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-large"><img fetchpriority="high" decoding="async" width="1024" height="1024" data-attachment-id="12356" data-permalink="https://www.relataly.com/an_ancient_prophet_looking_into_a_crystal_ball/" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/02/an_ancient_prophet_looking_into_a_crystal_ball.png" data-orig-size="1024,1024" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="an_ancient_prophet_looking_into_a_crystal_ball" data-image-description="&lt;p&gt;time series forecasting with facebook prophet python tutorial&lt;/p&gt;
" data-image-caption="&lt;p&gt;time series forecasting with facebook prophet python tutorial&lt;/p&gt;
" data-large-file="https://www.relataly.com/wp-content/uploads/2023/02/an_ancient_prophet_looking_into_a_crystal_ball.png" src="https://www.relataly.com/wp-content/uploads/2023/02/an_ancient_prophet_looking_into_a_crystal_ball-1024x1024.png" alt="time series forecasting with facebook prophet python tutorial" class="wp-image-12356" srcset="https://www.relataly.com/wp-content/uploads/2023/02/an_ancient_prophet_looking_into_a_crystal_ball.png 1024w, https://www.relataly.com/wp-content/uploads/2023/02/an_ancient_prophet_looking_into_a_crystal_ball.png 300w, https://www.relataly.com/wp-content/uploads/2023/02/an_ancient_prophet_looking_into_a_crystal_ball.png 140w, https://www.relataly.com/wp-content/uploads/2023/02/an_ancient_prophet_looking_into_a_crystal_ball.png 768w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">Time-series forecasting with Facebook Prophet. Image generated with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a>.</figcaption></figure>
</div>
</div>



<h3 class="wp-block-heading">How Facebook Prophet Works</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Facebook Prophet uses a technique called additive regression to model time series data. This involves breaking the time series into a series of components:</p>



<ul class="wp-block-list">
<li>Trends</li>



<li>Seasonality</li>



<li>Holiday</li>
</ul>



<p>Traditional time series <a href="https://www.relataly.com/category/machine-learning-algorithms/arima-models/" target="_blank" rel="noreferrer noopener">methods such as (S)ARIMA</a> base their prediction on a model that weights the linear sum of past observations or lags. Facebook&#8217;s Prophet is similar in that it uses a decreasing weight for past observations. This means current observations have a higher significance for the model than those that date back a long time. It then models each component separately using a combination of linear and non-linear functions. Finally, Facebook Prophet combines these components to form the complete forecast model. Let&#8217;s take a closer look at these components and how Facebook Prophet handles them.</p>



<h4 class="wp-block-heading">A) Dealing with Trends</h4>



<p>Time series often have a trendline. However, even more often, a time series will not follow a single trend, but it has several trend components that are separated by breakpoints. Facebook Prophet tries to handle these trends in several ways. First, the model tries to identify the breakpoints (knots) in a time series that divide different periods. Each breakpoint separates two periods with different trendlines. Facebook Prophet then uses these inflection points between periods to fit the model to the data and create the forecast.  In addition, trendlines do not have to be linear but can also be logarithmic. This is all done automatically, but it is also possible to specify breakpoints manually.</p>



<h4 class="wp-block-heading">B) Seasonality</h4>



<p>Facebook Prophet works very well when the data shows a strong seasonal pattern. It uses Fourier transformations (adding different sine and cosine frequencies) to account for daily, weekly and yearly seasonality. The Facebook Prophet model is flexible on the type of data you have by allowing you to adjust the seasonal components of your data. By default, Facebook Prophet assumes daily data with weekly and yearly seasonal effects. If your data differentiates from this standard, for example, you have weekly data with monthly seasonality, then you need to adjust the number of terms accordingly.</p>



<h4 class="wp-block-heading">C) Holiday</h4>



<p>Every year, public holidays can lead to strong deviations in a time series; for example, thinking of computing power,  demand more people will visit the Facebook website. The Facebook Prophet model also accounts for such special events by allowing us to specify binary indicators that mark whether a certain day is a public holiday. If you have other non-holiday events that occur yearly, you can use this indicator for the same purpose. Usually, Facebook Prophet will automatically remove outliers from the data. But if an outlier occurs on a day highlighted as a public holiday, Facebook Prophet will adjust its model accordingly. </p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<h3 class="wp-block-heading">Hyperparameter Tuning and Customization</h3>



<p>Facebook Prophet includes additional optimization techniques, such as Bayesian optimization, to automatically tune the model&#8217;s hyperparameters, such as the length of the seasonal period, to improve its accuracy. Once the model is trained, it can be used to predict future values in the time series. However, users with a strong domain knowledge may prefer to tweak these parameters themselves, and Facebook Prophet provides several functions for this purpose. It also includes a range of tools for model evaluation and diagnostics, as well as for visualizing the model and the input data.</p>



<p>Also: <a href="https://www.relataly.com/using-random-search-to-tune-the-hyperparameters-of-a-random-decision-forest-with-python/6875/" target="_blank" rel="noreferrer noopener">Using Random Search to Tune the Hyperparameters of a Random Decision Forest with Python</a> </p>



<h3 class="wp-block-heading">Application Domains</h3>



<p>Facebook Prophet is a powerful forecasting tool that has been specifically designed to make forecasting easy. As mentioned, Prophet is easy to use and can flexibly handle various forecasting problems. In addition, it requires very little preprocessing to generate accurate forecasts. As a result of these advantages, Facebook Prophet has been adopted by various application domains. Some possible application domains for Facebook Prophet include:</p>



<ul class="wp-block-list">
<li>Sales forecasting: Facebook Prophet can be used to predict future sales of a product or service, based on historical sales data. This can be useful for businesses to plan their inventory and staffing, and to make informed decisions about future investments and growth.</li>



<li>Financial forecasting: Facebook Prophet can be used to predict future stock prices, currency exchange rates, or other financial metrics. This can be useful for investors and financial analysts to make informed decisions about the market.</li>



<li>Traffic forecasting: Facebook Prophet can be used to predict future traffic on a website or mobile app based on historical data. This can be useful for businesses to plan for capacity and optimize their servers and infrastructure.</li>



<li>Energy consumption forecasting: Facebook Prophet can be used to predict future energy consumption based on historical data. This can be useful for utilities and energy companies to plan for demand and optimize their generation and distribution.</li>
</ul>



<h2 class="wp-block-heading">When to Use Facebook Prophet?</h2>



<p>Although Facebook Prophet is applicable in any domain where time series data is available, it is most effective when certain conditions are met. These include univariate time series data with prominent seasonal effects and an extensive historical record spanning multiple seasons. Facebook Prophet is especially beneficial when dealing with large quantities of historical data that require efficient analysis and quick, accurate predictions of future trends.</p>



<p>Also: <a href="https://www.relataly.com/multi-step-time-series-forecasting-a-step-by-step-guide/275/" target="_blank" rel="noreferrer noopener">Rolling Time Series Forecasting: Creating a Multi-Step Prediction</a></p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<div style="height:29px" aria-hidden="true" class="wp-block-spacer"></div>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<h2 class="wp-block-heading" id="h-using-facebook-prophet-to-forecast-the-coca-cola-stock-price-in-python">Using Facebook Prophet to Forecast the Coca-Cola Stock Price in Python</h2>



<p>In this hands-on tutorial, we&#8217;ll use Facebook Prophet and Python to create a forecast for the Coca-Cola stock price. We have chosen Coca-Cola as an example because the Coca-Cola share is known to be a cyclical stock. As such, its chart reflects a seasonal pattern, different periods, and varying trend lines. We train our model on historical price data and then predict the next data points half-year in advance. In addition, we will discuss how we could finetune our model to improve the accuracy of the predictions further. This involves the following steps:</p>



<ol class="wp-block-list">
<li>Collect historical stock data for CocaCola and familiarize ourselves with the data.</li>



<li>Use Facebook Prophet to fit a model to the data.</li>



<li>Use the model to make predictions about the future stock price of Coca-Cola.</li>



<li>Visualize model components and predictions.</li>



<li>Manually adjust the model to improve the model fit.</li>
</ol>



<p>By following these steps, we will try to gain insights into the future performance of Coca-Cola stock. Let&#8217;s get started!</p>



<p>As always, you can find the code of this tutorial on the GitHub repository.</p>



<div class="wp-block-kadence-advancedbtn kb-buttons-wrap kb-btns_883036-a5"><a class="kb-button kt-button button kb-btn_c85f7c-32 kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit  kt-btn-has-text-true kt-btn-has-svg-true  wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-tutorials/blob/master/01%20Time%20Series%20Forecasting%20%26%20Regression/011%20Time%20Series%20Forecasting%20using%20Facebooks&#039;%20Prophet.ipynb" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fe_eye kt-btn-icon-side-left"><svg viewBox="0 0 24 24"  fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M1 12s4-8 11-8 11 8 11 8-4 8-11 8-11-8-11-8z"/><circle cx="12" cy="12" r="3"/></svg></span><span class="kt-btn-inner-text">View on GitHub </span></a>

<a class="kb-button kt-button button kb-btn_db3037-b2 kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit  kt-btn-has-text-true kt-btn-has-svg-true  wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-tutorials" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fa_github kt-btn-icon-side-left"><svg viewBox="0 0 496 512"  fill="currentColor" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg></span><span class="kt-btn-inner-text">Relataly GitHub Repo </span></a></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h3 class="wp-block-heading" id="h-prerequisites">Prerequisites</h3>



<p>Before you proceed, ensure that you have set up your&nbsp;Python&nbsp;environment (3.8 or higher) and the required packages. If you don’t have an environment, consider following&nbsp;<a href="https://www.relataly.com/anaconda-python-environment-machine-learning/1663/" target="_blank" rel="noreferrer noopener">this tutorial</a>&nbsp;to set up the&nbsp;<a href="https://www.anaconda.com/products/individual" target="_blank" rel="noreferrer noopener">Anaconda environment</a>. </p>



<p>Also, make sure you install all required Python packages. We will be working with the following standard Python packages:&nbsp;</p>



<ul class="wp-block-list">
<li>pandas</li>



<li>seaborn</li>



<li>matplotlib</li>
</ul>



<p>In addition, we will use the Facebook Prophet library that goes by the library name &#8220;prophet.&#8221; You can install these packages using the following commands:</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">pip install &lt;package name&gt;
conda install &lt;package name&gt; (if you are using the anaconda packet manager)</pre></div>



<h3 class="wp-block-heading">Step #1 Loading Packages and API Key</h3>



<p>Let&#8217;s begin by loading the required Python packages and historical price quotes for the Coca-Cola stock. We will obtain the data from the yahoo finance API. Note that the API will return several columns of data, including, opening, average, and closing prices. We will only use the closing price. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Tested with Python 3.8.8, Matplotlib 3.5, Seaborn 0.11.1, numpy 1.19.5, plotly 4.1.1, cufflinks 0.17.3, prophet 1.1.1, CmdStan 2.31.0
import pandas as pd 
import matplotlib.pyplot as plt 
import numpy as np 
from math import log, exp 
from datetime import date, timedelta, datetime
import seaborn as sns
sns.set_style('white', {'axes.spines.right': False, 'axes.spines.top': False})
from scipy.stats import norm
from prophet import Prophet
from prophet.plot import add_changepoints_to_plot
import cmdstanpy
cmdstanpy.install_cmdstan()
cmdstanpy.install_cmdstan(compiler=True)
# Setting the timeframe for the data extraction
end_date =  date.today().strftime(&quot;%Y-%m-%d&quot;)
start_date = '2010-01-01'
# Getting quotes
stockname = 'Coca Cola'
symbol = 'KO'
# You can either use webreader or yfinance to load the data from yahoo finance
# import pandas_datareader as webreader
# df = webreader.DataReader(symbol, start=start_date, end=end_date, data_source=&quot;yahoo&quot;)
import yfinance as yf #Alternative package if webreader does not work: pip install yfinance
df = yf.download(symbol, start=start_date, end=end_date)
# Quick overview of dataset
print(df.head())</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">[*********************100%***********************]  1 of 1 completed
                 Open       High        Low      Close  Adj Close    Volume
Date                                                                       
2010-01-04  28.580000  28.610001  28.450001  28.520000  19.081614  13870400
2010-01-05  28.424999  28.495001  28.070000  28.174999  18.850786  23172400
2010-01-06  28.174999  28.219999  27.990000  28.165001  18.844103  19264600
2010-01-07  28.165001  28.184999  27.875000  28.094999  18.797268  13234600
2010-01-08  27.730000  27.820000  27.375000  27.575001  18.449350  28712400</pre></div>



<p>Once we have downloaded the data, we create a line plot of the closing price to familiarize ourselves with the time series data. Note that Facebook Prophet works on a single input signal only (univariate data). This input will be the closing price. For illustration purposes, we add a moving average to the chart. However, the moving average makes it easier to spot trends and seasonal patterns, it will not be used to fit the model. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Visualize the original time series
rolling_window=25
y_a_add_ma = df['Close'].rolling(window=rolling_window).mean() 
fig, ax = plt.subplots(figsize=(20,5))
sns.lineplot(data=df, x=df.index, y='Close', color='skyblue', linewidth=0.5, label='Close')
sns.lineplot(data=df, x=df.index, y=y_a_add_ma, 
    linewidth=1.0, color='royalblue', linestyle='--', label=f'{rolling_window}-Day MA')</pre></div>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="284" data-attachment-id="10876" data-permalink="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/image-10-15/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-10.png" data-orig-size="1614,448" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-10" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-10.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-10-1024x284.png" alt="lineplot with historical price quotes of the Coca-cola stock since 2010" class="wp-image-10876" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-10.png 1024w, https://www.relataly.com/wp-content/uploads/2022/12/image-10.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-10.png 768w, https://www.relataly.com/wp-content/uploads/2022/12/image-10.png 1536w, https://www.relataly.com/wp-content/uploads/2022/12/image-10.png 1614w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>The chart shows a long-term upward trend interrupted by phases of downturns. In addition, between 2010 and 2018, we can see some cyclical movements. At some points, we can spot clear breakpoints, for example, in 2019 and mid-2020. </p>



<h3 class="wp-block-heading"><strong><strong><strong>Step #2 Preparing the Data</strong></strong></strong></h3>



<p>Next, we prepare our data for model training. Propjet has a strict condition on how the input columns must be named. In order to use Facebook Prophet, your data needs to be in a time series format with the time as the index and the value as the first column. In addition, column names need to adhere to the following naming convention:</p>



<ul class="wp-block-list">
<li><strong>ds </strong>for the timestamp</li>



<li><strong>y </strong>for the metric columns, which in our case is the closing price</li>
</ul>



<p>So before we proceed, we must rename the columns in our dataframe. In addition, we will remove the index and drop NA values. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">df_x = df[['Close']].copy()
df_x['ds'] = df.index.copy()
df_x.rename(columns={'Close': 'y'}, inplace=True)
df_x.reset_index(inplace=True, drop=True)
df_x.dropna(inplace=True)
df_x.tail(9)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">		y			ds
3257	63.139999	2022-12-09
3258	63.970001	2022-12-12
3259	63.990002	2022-12-13</pre></div>



<p>Now we have a simple dataframe with ds and y as the only variables.</p>



<h3 class="wp-block-heading" id="h-step-3-model-fitting-and-forecasting"><strong>Step #3 Model Fitting and Forecasting</strong></h3>



<p>Next, let&#8217;s fit our forecasting model to the time series data. Afterward, we can make predictions about future values in the series. However, before we do this, we need to define our prediction interval. </p>



<h4 class="wp-block-heading">3.1 Setting the Prediction Interval</h4>



<p>The prediction interval is a measure of uncertainty in a forecast made with Facebook Prophet. It indicates the range within which the true value of the forecasted quantity is expected to fall a certain percentage of the time. For example, a 95% prediction interval means that the true value of the forecasted quantity is expected to fall within the given range 95% of the time. </p>



<p>In Facebook Prophet, the prediction interval is controlled by the interval_width parameter, which can be set when calling the predict method. The default value for interval_width is 0.80. This means that the true value of the forecasted quantity is expected to fall within the prediction interval 80% of the time. We can adjust the value of interval_width to change the width of the prediction interval as desired. In the example below, we use a prediction interval of 0.85.</p>



<h4 class="wp-block-heading">3.2 Fit the Model</h4>



<p>Next, let&#8217;s fit our model and generate a one-year forecast. First, we need to instantiate our model with by calling Prophet(). Then we use model.fit(df) to fit this model to the historical price quotes of the Coca-Cola stock. Once, we have done that, we use the model instance model.make_future_dataframe() to create an extended dataframe (future_df). This dataframe has been extended with records for a one-year period. The records are empty dummy values ready to be filled with the real forecast. We then pass this dummy dataframe to the model.predict(df) function, Facebook Prophet creates the forecast and fills up the dummy dataframe with the forecast values.  </p>



<p>For the sake of reusability, I have encapsulated the entire process into a wrapper function. This will allow us to run quick experiments with different parameter values.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># This function fits the prophet model to the input data and generates a forecast
def fit_and_forecast(df, periods, interval_width, changepoint_range=0.8):
    # set the uncertainty interval
    Prophet(interval_width=interval_width)
    # Instantiate the model
    model = Prophet(changepoint_range=changepoint_range)
    # Fit the model
    model.fit(df)
    # Create a dataframe with a given number of dates
    future_df = model.make_future_dataframe(periods=periods)
    # Generate a forecast for the given dates
    forecast_df = model.predict(future_df)
    #print(forecast_df.head())
    return forecast_df, model, future_df
# Forecast for 365 days with full data
forecast_df, model, future_df = fit_and_forecast(df_x, 365, 0.95)
print(forecast_df.columns)
forecast_df[['yhat_lower', 'yhat_upper', 'yhat']].head(5)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Index(['ds', 'trend', 'yhat_lower', 'yhat_upper', 'trend_lower', 'trend_upper',
       'additive_terms', 'additive_terms_lower', 'additive_terms_upper',
       'weekly', 'weekly_lower', 'weekly_upper', 'yearly', 'yearly_lower',
       'yearly_upper', 'multiplicative_terms', 'multiplicative_terms_lower',
       'multiplicative_terms_upper', 'yhat'],
      dtype='object')
	yhat_lower	yhat_upper	yhat
0	24.468273	28.944286	26.691615
1	24.496074	29.146425	26.706924
2	24.513424	28.829159	26.682213
3	24.358048	28.767209	26.667476
4	24.487963	28.839966	26.666242</pre></div>



<p>Voila, we have generated a one-year forecast. </p>



<h3 class="wp-block-heading">Step #4 Analyzing the Forecast</h3>



<p>Next, let&#8217;s visualize our forecast and discuss what we see. The most simple way is to create the plot with a standard Facebook Prophet function.</p>



<p>Also: <a href="https://www.relataly.com/regression-error-metrics-python/923/" target="_blank" rel="noreferrer noopener">Measuring Regression Errors with Python</a> </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">model.plot(forecast_df, uncertainty=True)</pre></div>



<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="10880" data-permalink="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/image-35-4/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-35.png" data-orig-size="989,590" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-35" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-35.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-35.png" alt="Prophet forecast for the coca-cola stock" class="wp-image-10880" width="867" height="517" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-35.png 989w, https://www.relataly.com/wp-content/uploads/2022/12/image-35.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-35.png 768w" sizes="(max-width: 867px) 100vw, 867px" /></figure>



<p>So what do we see? The forecast shows that our model does not simply predict a straight line and instead has generated a more sophisticated forecast that displays an upward cyclical trend with higher highs and higher lows. </p>



<ul class="wp-block-list">
<li>The black dots are the data points from the historical data to which we have fit our model. </li>



<li>The dark blue line is the most likely path. </li>



<li>The light blue lines are the upper and lower boundaries of the prediction interval. We have set the prediction interval to 0.85, which means there is a probability of 85% the actual values will fall into this range. </li>



<li>In total, the model seems confident that the price of Coca-Cola stock will rise within the next year (no financial advice). However, as we will see later, the forecast depends on where the model sees the breakpoints.</li>
</ul>



<p>In case, you want to create a custom plot, you can use the function below. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Visualize the Forecast
def visualize_the_forecast(df_f, df_o):
    rolling_window = 20
    yhat_mean = df_f['yhat'].rolling(window=rolling_window).mean() 
    # Thin out the ground truth data for illustration purposes
    df_lim = df_o
    # Print the Forecast
    fig, ax = plt.subplots(figsize=[20,7])
    sns.lineplot(data=df_f, x=df_f.ds, y=yhat_mean, ax=ax, label='predicted path', color='blue')
    sns.lineplot(data=df_lim, x=df_lim.ds, y='y', ax=ax, label='ground_truth', color='orange')
    #sns.lineplot(data=df_f, x=df_f.ds, y='yhat_lower', ax=ax, label='yhat_lower', color='skyblue', linewidth=1.0)
    #sns.lineplot(data=df_f, x=df_f.ds, y='yhat_upper', ax=ax, label='yhat_upper', color='coral', linewidth=1.0)
    plt.fill_between(df_f.ds, df_f.yhat_lower, df_f.yhat_upper, color='lightgreen')
    plt.legend(framealpha=0)
    ax.set(ylabel=stockname + &quot; stock price&quot;)
    ax.set(xlabel=None)
visualize_the_forecast(forecast_df, df_x)</pre></div>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="369" data-attachment-id="10879" data-permalink="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/image-11-12/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-11.png" data-orig-size="1614,582" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-11" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-11.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-11-1024x369.png" alt="time series forecast generated with Facebook prophet for the coca cola stock: ground truth and predicted path" class="wp-image-10879" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-11.png 1024w, https://www.relataly.com/wp-content/uploads/2022/12/image-11.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-11.png 768w, https://www.relataly.com/wp-content/uploads/2022/12/image-11.png 1536w, https://www.relataly.com/wp-content/uploads/2022/12/image-11.png 1614w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p></p>



<h3 class="wp-block-heading"><strong>Step #5 Analyzing Model Components</strong></h3>



<p>We can gain a better understanding of different model components by using the plot_components function. This method creates a plot showing the trend, weekly and yearly seasonality, and any additional user-defined seasonalities of the forecast. This can be useful for understanding the underlying patterns in the data and for diagnosing potential issues with the model.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">model.plot_components(forecast_df)</pre></div>



<figure class="wp-block-image size-full"><img decoding="async" width="897" height="890" data-attachment-id="11036" data-permalink="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/image-41-6/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-41.png" data-orig-size="897,890" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-41" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-41.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-41.png" alt="Illustration of the three components of our prophet model" class="wp-image-11036" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-41.png 897w, https://www.relataly.com/wp-content/uploads/2022/12/image-41.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-41.png 140w, https://www.relataly.com/wp-content/uploads/2022/12/image-41.png 768w" sizes="(max-width: 897px) 100vw, 897px" /></figure>



<p>The first chart shows the trendlines that the model sees within different periods. The trendlines are separated by breakpoints about, which we will talk in the next section. When we look at the second plot, we can see no price changes during the weekend. This is plausible, considering that the stock markets are closed over the weekend. The third chart is most interesting, as it shows that the model has recognized some yearly seasonality with two peaks in April and August, as well as lows in March and October.</p>



<h3 class="wp-block-heading">Step #6 Adjusting the Changepoints of our Facebook Prophet Model</h3>



<p>Let&#8217;s take a closer look at the changepoints in our model. Changepoints are the points in time where the trend of the time series is expected to change, and Facebook Prophet&#8217;s algorithm automatically detects these points and adapts the model accordingly. Changepoints are important to Facebook Prophet because they allow the model to capture gradual changes or shifts in the data. By identifying and incorporating changepoints into the forecasting model, Facebook Prophet can make more accurate predictions. Changepoints can also help to identify potential outliers in the data.</p>



<h4 class="wp-block-heading">6.1 Checking Current Changepoints</h4>



<p>We can illustrate the changepoints in our model with the add_changepoints_to_plot method. The method adds vertical lines to a plot to indicate the locations of the changepoints in the data. By plotting the changepoints on a graph, we can visually identify when these changes in trend occur and potentially diagnose any issues with our model.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Printing the ChangePoints of our Model
forecast_df, model, future_df = fit_and_forecast(df_x, 365, 1.0)
axislist = add_changepoints_to_plot(model.plot(forecast_df).gca(), model, forecast_df)</pre></div>



<figure class="wp-block-image size-full"><img decoding="async" width="989" height="589" data-attachment-id="11038" data-permalink="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/image-42-5/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-42.png" data-orig-size="989,589" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-42" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-42.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-42.png" alt="Changepoints in a chart showing a Prophet forecast for the coca-cola stock. Changepoint_range = 0.8" class="wp-image-11038" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-42.png 989w, https://www.relataly.com/wp-content/uploads/2022/12/image-42.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-42.png 768w" sizes="(max-width: 989px) 100vw, 989px" /></figure>



<p>The chart above shows that our model has identified several changepoints in the historical data. However, it has only searched for changepoints within 80% of the time series. As a result, the algorithm hasn&#8217;t identified any change points in the most recent years after 2020. We can adjust the changepoints with the changepoint_range (default = 80%) variable. This is what we will do in the next section. </p>



<h4 class="wp-block-heading">6.2 Adjusting Changepoints</h4>



<p>We can adjust the range within which Facebook Prophet looks for changepoints with the &#8220;changepoint_range&#8221;.  It is specified as a fraction of the total duration of the time series. For example, if changepoint_range is set to 0.8 and the time series spans 10 years, the algorithm will look for changepoints within the last 8 years of the series.</p>



<p>By default, changepoint_range is set to 0.8, which means that the algorithm will look for changepoints within the last 80% of the time series. We can adjust this value depending on the characteristics of our data and our desired level of flexibility in the model.</p>



<p>Increasing the value of changepoint_range will allow the algorithm to identify more changepoints and potentially improve the fit of the model, but it may also increase the risk of overfitting. Conversely, decreasing the value of changepoint_range will reduce the number of changepoints detected and may improve the model&#8217;s ability to generalize to new data, but it may also reduce the accuracy of the forecast.</p>



<p>Let&#8217;s fit our model again, but this time we let Facebook Prophet search for changepoints within the entire time series (changepoint_range=1.0).</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Adjusting ChangePoints of our Model
forecast_df, model, future_df = fit_and_forecast(df_x, 365, 1.0, 1.0)
axislist = add_changepoints_to_plot(model.plot(forecast_df).gca(), model, forecast_df)</pre></div>



<figure class="wp-block-image size-full"><img decoding="async" width="989" height="590" data-attachment-id="11043" data-permalink="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/image-43-5/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-43.png" data-orig-size="989,590" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-43" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-43.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-43.png" alt="Changepoints in a chart showing a Prophet forecast for the coca-cola stock. Changepoint_range = 1.0" class="wp-image-11043" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-43.png 989w, https://www.relataly.com/wp-content/uploads/2022/12/image-43.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-43.png 768w" sizes="(max-width: 989px) 100vw, 989px" /></figure>



<p>The plot above shows that Facebook Prophet has now identified several additional breakpoints in the time series. As a result, the forecast has become rather pessimistic, as Facebook Prophet gave more weight to recent changes.</p>



<p>Finally, it is worth mentioning that it is possible to add changepoints for specific dates manually. You can try this out using &#8220;model.changepoints(series)&#8221;. The function takes a series of timestamps as the parameter value. </p>



<h2 class="wp-block-heading" id="h-summary">Summary</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Get ready to dive into the world of stock market prediction with Facebook Prophet! In this article, we&#8217;ll show you how to leverage the power of this amazing tool to forecast time series data, using Coca-Cola&#8217;s stock as an example. We&#8217;ll guide you through the process of fitting a curve to univariate time series data and fine-tuning the initial breakpoints and trendlines to enhance model performance. With Facebook Prophet&#8217;s automatic trend identification algorithm, you&#8217;ll be able to easily adapt to changes in the data over time.</p>



<p>Also: <a href="https://www.relataly.com/feature-engineering-for-multivariate-time-series-models-with-python/1813/" target="_blank" rel="noreferrer noopener">Mastering Multivariate Stock Market Prediction with Python</a> </p>



<p>As a data scientist, you&#8217;ll appreciate how easy it is to use Facebook Prophet and how it consistently outperforms other models. With its straightforward interface and impressive accuracy, this tool is a must-have for your forecasting toolkit. And we&#8217;re always looking for feedback from our audience, so let us know what you think! We&#8217;re committed to improving our content to provide the best learning experience possible.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h2 class="wp-block-heading">Sources and Further Reading</h2>



<ol class="wp-block-list">
<li><a href="https://amzn.to/3MAy8j5" target="_blank" rel="noreferrer noopener"></a><a href="https://peerj.com/preprints/3190/" target="_blank" rel="noreferrer noopener">Taylor and Letham, 2017, Forecasting at scale</a></li>



<li><a href="https://facebook.github.io/prophet/docs/quick_start.html" target="_blank" rel="noreferrer noopener">github.io/prophet/docs/quick_start.html</a></li>



<li><a href="https://amzn.to/3EKidwE" target="_blank" rel="noreferrer noopener">David Forsyth (2019) Applied Machine Learning Springer</a></li>
</ol>



<p class="has-contrast-2-color has-base-3-background-color has-text-color has-background"><em>The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.</em></p>



<p>Other Methods for Time Series Forecasting</p>



<ul class="wp-block-list">
<li><a href="https://www.relataly.com/univariate-stock-market-forecasting-using-a-recurrent-neural-network/122/" target="_blank" rel="noreferrer noopener">Univariate time series forecasting with Recurrent Neural Networks</a></li>



<li><a href="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/" target="_blank" rel="noreferrer noopener">Multivariate time series forecasting with Recurrent Neural Networks</a></li>



<li><a href="https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/" target="_blank" rel="noreferrer noopener">Forecasting sales data with ARIMA models</a></li>
</ul>
<p>The post <a href="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/">Univariate Stock Market Forecasting using Facebook Prophet in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">10351</post-id>	</item>
		<item>
		<title>Create a Personalized Movie Recommendation Engine using Content-based Filtering in Python</title>
		<link>https://www.relataly.com/content-based-movie-recommender-using-python/4294/</link>
					<comments>https://www.relataly.com/content-based-movie-recommender-using-python/4294/#respond</comments>
		
		<dc:creator><![CDATA[Florian Follonier]]></dc:creator>
		<pubDate>Mon, 25 Jul 2022 11:29:00 +0000</pubDate>
				<category><![CDATA[Content-based Filtering]]></category>
		<category><![CDATA[Correlation]]></category>
		<category><![CDATA[Cosine Similarity]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[nltk]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Recommender Systems]]></category>
		<category><![CDATA[Retail]]></category>
		<category><![CDATA[Scikit-Learn]]></category>
		<category><![CDATA[Seaborn]]></category>
		<category><![CDATA[AI in E-Commerce]]></category>
		<category><![CDATA[Intermediate Tutorials]]></category>
		<category><![CDATA[Supervised Learning]]></category>
		<guid isPermaLink="false">https://www.relataly.com/?p=4294</guid>

					<description><![CDATA[<p>Content-based recommender systems are a popular type of machine learning algorithm that recommends relevant articles based on what a user has previously consumed or liked. This approach aims to identify items with certain keywords, understand what the customer likes, and then identify other items that are similar to items the user has previously consumed or ... <a title="Create a Personalized Movie Recommendation Engine using Content-based Filtering in Python" class="read-more" href="https://www.relataly.com/content-based-movie-recommender-using-python/4294/" aria-label="Read more about Create a Personalized Movie Recommendation Engine using Content-based Filtering in Python">Read more</a></p>
<p>The post <a href="https://www.relataly.com/content-based-movie-recommender-using-python/4294/">Create a Personalized Movie Recommendation Engine using Content-based Filtering in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Content-based recommender systems are a popular type of machine learning algorithm that recommends relevant articles based on what a user has previously consumed or liked. This approach aims to identify items with certain keywords, understand what the customer likes, and then identify other items that are similar to items the user has previously consumed or rated. The recommendations are based on the similarity of the items, represented by similarity scores in a vector matrix. The attributes used to describe an item are called &#8220;content.&#8221; For example, in the case of movie recommendations, content could be the genre, actors, director, year of release, etc. A well-designed content-based recommendation service will suggest movies of the same genre, actors, or keywords. This tutorial will implement a content-based recommendation service for movies using Python and Scikit-learn. </p>



<p>The rest of this tutorial proceeds as follows: After a brief introduction to content-based recommenders, we will work with a database that contains several thousands of <a href="https://www.imdb.com/" target="_blank" rel="noreferrer noopener">IMDB </a>movie titles and create a feature model that uses actors, release year, and a short description for each movie. In this tutorial, you will also learn how to deal with some challenges of building a content-based recommender. For example, we will look at how we can engineer features for content-based model words and reduce the dimensionality of our model. Finally, we use our model to generate some sample predictions.</p>



<p>Note: Another popular type of recommender system that I have covered in a previous article is <a href="https://www.relataly.com/building-a-movie-recommender-using-collaborative-filtering/4376/" target="_blank" rel="noreferrer noopener">collaborative filtering</a>.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img decoding="async" width="508" height="508" data-attachment-id="12673" data-permalink="https://www.relataly.com/signpost-decisions-recommender-system-machine-learning-python-tutorial-midjourney-2-min/" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/03/signpost-decisions-recommender-system-machine-learning-python-tutorial-midjourney-2-min.png" data-orig-size="508,508" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="signpost decisions recommender system machine learning python tutorial midjourney (2)-min" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/signpost-decisions-recommender-system-machine-learning-python-tutorial-midjourney-2-min.png" src="https://www.relataly.com/wp-content/uploads/2023/03/signpost-decisions-recommender-system-machine-learning-python-tutorial-midjourney-2-min.png" alt="Recommendation systems can ease decision-making. Image created with Midjourney." class="wp-image-12673" srcset="https://www.relataly.com/wp-content/uploads/2023/03/signpost-decisions-recommender-system-machine-learning-python-tutorial-midjourney-2-min.png 508w, https://www.relataly.com/wp-content/uploads/2023/03/signpost-decisions-recommender-system-machine-learning-python-tutorial-midjourney-2-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/03/signpost-decisions-recommender-system-machine-learning-python-tutorial-midjourney-2-min.png 140w" sizes="(max-width: 508px) 100vw, 508px" /><figcaption class="wp-element-caption">Recommendation systems can ease decision-making. Image created with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a>.</figcaption></figure>
</div>
</div>
</div>
</div>



<h2 class="wp-block-heading" id="h-what-is-content-based-filtering">What is Content-Based Filtering?</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>The idea behind content-based recommenders is to generate recommendations based on user&#8217;s preferences and tastes. These preferences revolve around past user choices, for example, the number of times a user has watched a movie, purchased an item, or clicked on a link. </p>



<p>Content-based filtering uses domain-specific item features to measure the similarity between items. Given the user preferences, the algorithm will recommend items similar to what the user has consumed or liked before. For movie recommendations, this content can be the genre, actors, release year, director, film length, or keywords used to describe the movies. This approach works particularly well for domains with a lot of textual metadata, such as movies and videos, books, or products.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-kadence-image kb-image_efc597-b6 size-large"><img decoding="async" width="512" height="500" data-attachment-id="12668" data-permalink="https://www.relataly.com/polaroid-film-movie-recommender-machine-learning-python-tutorial-min/" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/03/polaroid-film-movie-recommender-machine-learning-python-tutorial-min.png" data-orig-size="518,506" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="polaroid film movie recommender machine learning python tutorial-min" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/polaroid-film-movie-recommender-machine-learning-python-tutorial-min.png" src="https://www.relataly.com/wp-content/uploads/2023/03/polaroid-film-movie-recommender-machine-learning-python-tutorial-min-512x500.png" alt="" class="kb-img wp-image-12668" srcset="https://www.relataly.com/wp-content/uploads/2023/03/polaroid-film-movie-recommender-machine-learning-python-tutorial-min.png 512w, https://www.relataly.com/wp-content/uploads/2023/03/polaroid-film-movie-recommender-machine-learning-python-tutorial-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/03/polaroid-film-movie-recommender-machine-learning-python-tutorial-min.png 518w" sizes="(max-width: 512px) 100vw, 512px" /><figcaption>Content-based movie recommendations will suggest more of the same, for example, actors, genres, stories, and directors.<br></figcaption></figure>
</div>
</div>



<p></p>



<h3 class="wp-block-heading">Basic Steps to Building a Content-based Recommender System</h3>



<p>The approach to building a content-based recommender involves four essential steps:</p>



<ol class="wp-block-list">
<li>The first step is to create a so-called &#8216;bag of words&#8217; model from the input data, which is a list of words used to characterize the items. This step involves selecting useful content for describing and differentiating the items. The more precise the information, the better the recommendations will be. </li>



<li>The next step is to turn the bag (of words) into a feature vector. Different algorithms can be used for this step, for example, the Tfdif vectorizer or the count vectorizer. The result is a vector matrix with items as records and features as columns. This step often also includes applying techniques for dimensionality reduction. </li>



<li>The idea of content-based recommendations is based on measuring item similarity. Similarity scores are assigned through pairwise comparison. Here again, we can choose between different measures, e.g., the dot product or cosine similarity. </li>



<li>Once you have the similarity scores, you can return the most similar items by sorting the data by similarity scores. Given user preferences (single or multiple items a user consumed or liked), the algorithm will then recommend the most similar items. </li>
</ol>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="8937" data-permalink="https://www.relataly.com/content-based-movie-recommender-using-python/4294/image-11-5/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/07/image-11.png" data-orig-size="1911,1227" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-11" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/07/image-11.png" src="https://www.relataly.com/wp-content/uploads/2022/07/image-11-1024x657.png" alt="" class="wp-image-8937" width="1169" height="751" srcset="https://www.relataly.com/wp-content/uploads/2022/07/image-11.png 1024w, https://www.relataly.com/wp-content/uploads/2022/07/image-11.png 300w, https://www.relataly.com/wp-content/uploads/2022/07/image-11.png 768w, https://www.relataly.com/wp-content/uploads/2022/07/image-11.png 1536w, https://www.relataly.com/wp-content/uploads/2022/07/image-11.png 1911w" sizes="(max-width: 1169px) 100vw, 1169px" /><figcaption class="wp-element-caption">Approach to Building a Content-based Recommender System</figcaption></figure>



<h3 class="wp-block-heading">Similarity Scoring</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>The quality of the content-based recommendations is significantly influenced by how well the algorithm succeeds in measuring the similarity of the items. There are different techniques to calculate similarity, including Cosine Similarity, Pearson Similarity, Dot Product, and Euclidian Distance. They have in common that they use numerical characteristics of the text to calculate the distance between text vectors in an n-dimensional vector space.</p>



<p>It is worth denoting that these techniques can only measure word-level similarity. This means the algorithms compare the word of the item for word without considering the semantic meaning of the sentences. In some instances, this can lead to errors. For example, how similar are <em>&#8220;now that they were sitting on a bank, he noticed she stole his heart, and he was in love&#8221;</em> and <em>&#8220;They are gangsters who love to steal from a large bank&#8221;</em>? By just looking at the words, one may appear similar because the words have a good overlap.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<p></p>



<h3 class="wp-block-heading">Pros and Cons of Content-based Filtering</h3>



<p>Like most machine learning algorithms, content-based recommenders have their strength and weaknesses.</p>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<p><strong>Advantages</strong></p>



<ul class="wp-block-list">
<li>Content-based filtering is good at capturing a user&#8217;s specific interests and will recommend more of the same (for example, genre, actors, directors, etc.). It will also recommend niche items if they match the user preferences, even if these items draw little attention.</li>



<li>Another advantage is that the model can generate recommendations for a specific user without the knowledge of other users. This is particularly helpful if you want to generate predictions for many users.</li>
</ul>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<p><strong>Disadvantages</strong></p>



<ul class="wp-block-list">
<li>On the other hand, there are also a couple of downsides. The feature representation of the items has to be done manually to a certain extent, and the prediction quality strongly depends on whether items are described in detail. Therefore, content-based filtering requires a lot of expertise.</li>



<li>Since recommendations are based on the user&#8217;s previous interests. However, the recommendations are unlikely to go beyond that and expand to areas (e.g., genres) that are still unknown to the user.  Content-based models thus tend to develop some tunnel vision, so that the model recommends more and more of the same.</li>
</ul>
</div>
</div>



<div style="height:54px" aria-hidden="true" class="wp-block-spacer"></div>



<h2 class="wp-block-heading" id="h-implementing-a-content-based-movie-recommender-in-python">Implementing a Content-based Movie Recommender in Python</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>In the following, we will implement a content-based movie recommender using Python and Scikit-learn. We will carry out all steps necessary to create a content-based recommender. The data comes from an IMDB dataset containing more than 40k films between 1996 and 2018. Based on the data, we define the features we want to use for recommending the movies. These features include the genre, director, main actors, plot keywords, or other metadata associated with the movies. Then we preprocess the data to extract these features and create a feature matrix. The feature matrix becomes the foundation for a similarity matrix that measures the similarity between the items based on their feature vectors. Finally, we use the similarity matrix to generate recommendations for a given item. </p>



<p>By the end of this Python tutorial, you will have learned how to implement a content-based recommendation system for movies using Python and Scikit-learn. This knowledge can be applied to other types of recommendations, such as articles, products, or songs.</p>



<p>The code is available on the GitHub repository. </p>



<div class="wp-block-kadence-advancedbtn kb-buttons-wrap kb-btns_c5cd86-dc"><a class="kb-button kt-button button kb-btn_a1ac8b-37 kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit  kt-btn-has-text-true kt-btn-has-svg-true  wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-tutorials/blob/master/05%20Recommender%20Systems/032%20Movie%20Recommender%20using%20Content-based%20Filtering.ipynb" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fe_eye kt-btn-icon-side-left"><svg viewBox="0 0 24 24"  fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M1 12s4-8 11-8 11 8 11 8-4 8-11 8-11-8-11-8z"/><circle cx="12" cy="12" r="3"/></svg></span><span class="kt-btn-inner-text">View on GitHub </span></a>

<a class="kb-button kt-button button kb-btn_f7ca5b-85 kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit  kt-btn-has-text-true kt-btn-has-svg-true  wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-API-tutorials" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fa_github kt-btn-icon-side-left"><svg viewBox="0 0 496 512"  fill="currentColor" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg></span><span class="kt-btn-inner-text">Relataly GitHub Repo </span></a></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img decoding="async" src="https://www.relataly.com/wp-content/uploads/2023/01/DALL·E-2023-01-12-19.26.12-A-plush-robot-watching-a-movie-on-tv-min.png" alt="This Robot doesn't know what to watch on tv. Let's build a recommender system for him! Image generated using DALL-E 2 by OpenAI." class="wp-image-11993"/><figcaption class="wp-element-caption">This Robot doesn&#8217;t know what to watch on tv. Let&#8217;s build a recommender system for him! Image generated using <a href="https://openai.com/dall-e-2/" target="_blank" rel="noreferrer noopener">DALL-E 2 by OpenAI</a>.</figcaption></figure>
</div>
</div>



<h3 class="wp-block-heading" id="h-prerequisites">Prerequisites</h3>



<p>Before you start with the coding part, ensure you have set up your Python 3 environment and required packages. If you don&#8217;t have an environment, consider the&nbsp;<a href="https://www.anaconda.com/products/individual" target="_blank" rel="noreferrer noopener">Anaconda Python environment</a>. Follow <a href="https://www.relataly.com/category/data-science/setup-anaconda-environment/" target="_blank" rel="noreferrer noopener">this tutorial</a> to set it up.</p>



<p>Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:&nbsp;</p>



<ul class="wp-block-list">
<li><em><a href="https://pandas.pydata.org/" target="_blank" rel="noreferrer noopener">pandas</a></em></li>



<li><em><a href="https://numpy.org/" target="_blank" rel="noreferrer noopener">NumPy</a></em></li>



<li><em><a href="https://matplotlib.org/" target="_blank" rel="noreferrer noopener">matplotlib</a></em></li>
</ul>



<p>In addition, we will be using <a href="https://seaborn.pydata.org/" target="_blank" rel="noreferrer noopener">Seaborn </a>for visualization and the natural language processing library <a href="https://www.nltk.org/" target="_blank" rel="noreferrer noopener">nltk</a>. </p>



<p>You can install these packages by using one of the following commands: </p>



<ul class="wp-block-list">
<li><em>pip install &lt;package name&gt;</em></li>



<li><em>conda install &lt;package name&gt;</em>&nbsp;(if you are using the anaconda packet manager)</li>
</ul>



<h3 class="wp-block-heading" id="h-about-the-imdb-movies-dataset">About the IMDB Movies Dataset</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>We will train our movie recommender on a popular Movies Dataset (you can download it from <a href="https://grouplens.org/datasets/movielens/latest/" target="_blank" rel="noreferrer noopener">grouplens.org</a>). The MovieLens recommendation service collected the Dataset from 610 users between 1996 and 2018. Unpack the data into the working folder of your project.</p>



<p>The full Dataset contains metadata on over 45,000 movies and 26 million ratings from over 270,000 users. The Dataset contains the following files (Source of the data description: <a href="https://www.kaggle.com/rounakbanik/the-movies-dataset" target="_blank" rel="noreferrer noopener">Kaggle.com</a>):</p>



<ul class="wp-block-list">
<li><strong>movies_metadata.csv:</strong>&nbsp;The main Movies Metadata file contains information on 45,000 movies featured in the Full MovieLens Dataset. Features include posters, backdrops, budget, revenue, release dates, languages, production countries, and companies.
<ul class="wp-block-list">
<li></li>
</ul>
</li>



<li><strong>ratings_small.csv:</strong>&nbsp;The subset of 100,000 ratings from 700 users on 9,000 movies. Each line corresponds to a 5-star movie rating with half-star increments (0.5 &#8211; 5.0 stars).</li>



<li><strong>keywords.csv:</strong>&nbsp;Contains the movie plot keywords for our MovieLens movies. Available in the form of a stringified JSON Object.</li>



<li><strong>credits.csv:</strong>&nbsp;Consists of Cast and Crew Information for all our films. Available in the form of a stringified JSON Object.</li>
</ul>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="7128" data-permalink="https://www.relataly.com/building-a-movie-recommender-using-collaborative-filtering/4376/mdb-movie-database/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/04/MDB-Movie-Database.png" data-orig-size="1024,537" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="recomender systems collaborative filtering imdb movies" data-image-description="&lt;p&gt;recomender systems collaborative filtering imdb movies&lt;/p&gt;
" data-image-caption="&lt;p&gt;recomender systems collaborative filtering imdb movies&lt;/p&gt;
" data-large-file="https://www.relataly.com/wp-content/uploads/2022/04/MDB-Movie-Database.png" src="https://www.relataly.com/wp-content/uploads/2022/04/MDB-Movie-Database.png" alt="MDB Movie Database Recommender Systems Collaborative Filtering " class="wp-image-7128" width="366" height="192" srcset="https://www.relataly.com/wp-content/uploads/2022/04/MDB-Movie-Database.png 1024w, https://www.relataly.com/wp-content/uploads/2022/04/MDB-Movie-Database.png 300w, https://www.relataly.com/wp-content/uploads/2022/04/MDB-Movie-Database.png 768w" sizes="(max-width: 366px) 100vw, 366px" /><figcaption class="wp-element-caption">IMDB Movie Database </figcaption></figure>
</div>
</div>



<p>Several other files are included that we won&#8217;t use, incl. ratings_small, links_small, and links.</p>



<p>You can download it <a href="https://grouplens.org/datasets/movielens/latest/" target="_blank" rel="noreferrer noopener">here</a> or from <a href="https://www.kaggle.com/rounakbanik/the-movies-dataset" target="_blank" rel="noreferrer noopener">Kaggle</a>.</p>



<h3 class="wp-block-heading" id="h-step-1-load-the-data">Step #1: Load the Data</h3>



<p>Our goal is to create a content-based recommender system for movie recommendations. In this case, the content will be meta information on movies, such as genre, actors, the description.</p>



<p>We begin by making imports and loading the data from three files:</p>



<ul class="wp-block-list">
<li>movies_metadata.csv</li>



<li>credits.csv</li>



<li>keywords.csv</li>
</ul>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style('white', { 'axes.spines.right': False, 'axes.spines.top': False})
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.decomposition import TruncatedSVD
from nltk.corpus import stopwords

# the IMDB movies data is available on Kaggle.com
# https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset

# in case you have placed the files outside of your working directory, you need to specify the path
path = 'data/movie_recommendations/' 

# load the movie metadata
df_meta=pd.read_csv(path + 'movies_metadata.csv', low_memory=False, encoding='UTF-8') 

# some records have invalid ids, which is why we remove them
df_meta = df_meta.drop([19730, 29503, 35587])

# convert the id to type int and set id as index
df_meta = df_meta.set_index(df_meta['id'].str.strip().replace(',','').astype(int))
pd.set_option('display.max_colwidth', 20)
df_meta.head(2)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">		adult	belongs_to_collection			budget		genres				homepage			id		imdb_id		original_language	original_title	overview			...	release_date	revenue		runtime	spoken_languages	status		tagline	title	video	vote_average	vote_count
id																					
862		False	{'id': 10194, 'n...				30000000	[{'id': 16, 'nam...	http://toystory....	862		tt0114709	en					Toy Story		Led by Woody, An...	...	1995-10-30		373554033.0	81.0	[{'iso_639_1': '...	Released	NaN	Toy Story	False	7.7	5415.0
8844	False	NaN								65000000	[{'id': 12, 'nam...	NaN					8844	tt0113497	en				Jumanji				When siblings Ju...	...	1995-12-15		262797249.0	104.0	[{'iso_639_1': '...	Released	Roll the dice an...	Jumanji	False	6.9	2413.0</pre></div>



<p>After we have loaded credits and keywords, we will combine the data into a single dataframe. Now we have various input fields available. However, we will only use keywords, cast, year of release, genres, and overview. If you like, you can enhance the data with additional inputs, for example, budget, running time, or film language. </p>



<p>Once we have gathered our data in a single dataframe, we print out the first rows to gain an overview of the data. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># load the movie credits
df_credits = pd.read_csv(path + 'credits.csv', encoding='UTF-8')
df_credits = df_credits.set_index('id')

# load the movie keywords
df_keywords=pd.read_csv(path + 'keywords.csv', low_memory=False, encoding='UTF-8') 
df_keywords = df_keywords.set_index('id')

# merge everything into a single dataframe 
df_k_c = df_keywords.merge(df_credits, left_index=True, right_on='id')
df = df_k_c.merge(df_meta[['release_date','genres','overview','title']], left_index=True, right_on='id')
df.head(3)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">		keywords			cast				crew				release_date	genres				overview			title
id							
862		[{'id': 931, 'na...	[{'cast_id': 14,...	[{'credit_id': '...	1995-10-30		[{'id': 16, 'nam...	Led by Woody, An...	Toy Story
8844	[{'id': 10090, '...	[{'cast_id': 1, ...	[{'credit_id': '...	1995-12-15		[{'id': 12, 'nam...	When siblings Ju...	Jumanji
15602	[{'id': 1495, 'n...	[{'cast_id': 2, ...	[{'credit_id': '...	1995-12-22		[{'id': 10749, '...	A family wedding...	Grumpier Old Men</pre></div>



<div style="height:15px" aria-hidden="true" class="wp-block-spacer"></div>



<p>We can see cast, crew, and genres have a dictionary-like structure. To create a cosine similarity matrix, we need to extract the keywords from these columns and gather them in a single column. This is what we will do in the next step. </p>



<h3 class="wp-block-heading" id="h-step-2-feature-engineering-and-data-cleaning">Step #2: Feature Engineering and Data Cleaning</h3>



<p>A problem with modeling text is that machine learning algorithms have difficulty processing text directly. An essential step in creating content-based recommenders is bringing the text into a machine-readable form. This is what we call feature engineering. </p>



<h4 class="wp-block-heading">2.1 Creating a Bag-of-Words Model</h4>



<p>We begin with feature engineering and creating the bag of words. As mentioned, a bag of words is a list of words relevant to describe items in a dataset, such as films, and differentiate them. Creating a bag of words removes stopwords but preserves multiplicity so that words can occur multiple times in the concatenated text. Later, each word can be used as a feature in calculating cosine similarities. </p>



<p>The input for a bag of words does not necessarily come from a single input column. We will use keywords, genres, cast, and overview and merge them into a new single column that we call tags. Make sure to capture the text field&#8217;s nature. We will keep names and surnames together and not split them, as we will do with the words from the overview column. The result of this process is our bag.</p>



<p>In addition, we add the movie title and a new index (id), which will later ease working with the similarity matrix. Finally, we print the first rows of our feature dataframe. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># create an empty DataFrame
df_movies = pd.DataFrame()

# extract the keywords
df_movies['keywords'] = df['keywords'].apply(lambda x: [i['name'] for i in eval(x)])
df_movies['keywords'] = df_movies['keywords'].apply(lambda x: ' '.join([i.replace(&quot; &quot;, &quot;&quot;) for i in x]))

# extract the overview
df_movies['overview'] = df['overview'].fillna('')

# extract the release year 
df_movies['release_date'] = pd.to_datetime(df['release_date'], errors='coerce').apply(lambda x: str(x).split('-')[0] if x != np.nan else np.nan)

# extract the actors
df_movies['cast'] = df['cast'].apply(lambda x: [i['name'] for i in eval(x)])
df_movies['cast'] = df_movies['cast'].apply(lambda x: ' '.join([i.replace(&quot; &quot;, &quot;&quot;) for i in x]))

# extract genres
df_movies['genres'] = df['genres'].apply(lambda x: [i['name'] for i in eval(x)])
df_movies['genres'] = df_movies['genres'].apply(lambda x: ' '.join([i.replace(&quot; &quot;, &quot;&quot;) for i in x]))

# add the title
df_movies['title'] = df['title']

# merge fields into a tag field
df_movies['tags'] = df_movies['keywords'] + df_movies['cast']+' '+df_movies['genres']+' '+df_movies['release_date']

# drop records with empty tags and dublicates
df_movies.drop(df_movies[df_movies['tags']==''].index, inplace=True)
df_movies.drop_duplicates(inplace=True)

# add a fresh index to the dataframe, which we will later use when refering to items in a vector matrix
df_movies['new_id'] = range(0, len(df_movies))

# Reduce the data to relevant columns
df_movies = df_movies[['new_id', 'title', 'tags']]

# display the data
pd.set_option('display.max_colwidth', 500)
pd.set_option('display.expand_frame_repr', False)
print(df_movies.shape)
df_movies.head(5)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">		new_id	title							tags
id			
862		0		Toy Story						jealousy toy boy friendship friends rivalry boynextdoor newtoy toycomestolifeTomHanks TimAllen DonRickles JimVarney WallaceShawn JohnRatzenberger AnniePotts JohnMorris ErikvonDetten LaurieMetcalf R.LeeErmey SarahFreeman PennJillette Animation Comedy Family 1995
8844	1		Jumanji							boardgame disappearance basedonchildren'sbook newhome recluse giantinsectRobinWilliams JonathanHyde KirstenDunst BradleyPierce BonnieHunt BebeNeuwirth DavidAlanGrier PatriciaClarkson AdamHann-Byrd LauraBellBundy JamesHandy GillianBarber BrandonObray CyrusThiedeke GaryJosephThorup LeonardZola LloydBerry MalcolmStewart AnnabelKershaw DarrylHenriques RobynDriscoll PeterBryant SarahGilson FloricaVlad JuneLion BrendaLockmuller Adventure Fantasy Family 1995
15602	2		Grumpier Old Men				fishing bestfriend duringcreditsstinger oldmenWalterMatthau JackLemmon Ann-Margret SophiaLoren DarylHannah BurgessMeredith KevinPollak Romance Comedy 1995
31357	3		Waiting to Exhale				basedonnovel interracialrelationship singlemother divorce chickflickWhitneyHouston AngelaBassett LorettaDevine LelaRochon GregoryHines DennisHaysbert MichaelBeach MykeltiWilliamson LamontJohnson WesleySnipes Comedy Drama Romance 1995
11862	4		Father of the Bride Part II		baby midlifecrisis confidence aging daughter motherdaughterrelationship pregnancy contraception gynecologistSteveMartin DianeKeaton MartinShort KimberlyWilliams-Paisley GeorgeNewbern KieranCulkin BDWong PeterMichaelGoetz KateMcGregor-Stewart JaneAdams EugeneLevy LoriAlan Comedy 1995</pre></div>



<h4 class="wp-block-heading" id="h-2-2-visualizing-text-length">2.2 Visualizing Text Length</h4>



<p>We can use a bar chart to illustrate each movie&#8217;s word bag length. This gives us an idea of how detailed the movie descriptions are. Items with short descriptions have, in principle, a lower probability of being recommended later. Recommenders produce better results if the length of the descriptions is somewhat balanced.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># add the tag length to the movies df
df_movies['tag_len'] = df_movies['tags'].apply(lambda x: len(x))

# illustrate the tag text length
sns.displot(data=df_movies.dropna(), bins=list(range(0, 2000, 25)), height=5, x='tag_len', aspect=3, kde=True)
plt.title('Distribution of tag text length')
plt.xlim([0, 2500])</pre></div>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="346" data-attachment-id="8866" data-permalink="https://www.relataly.com/content-based-movie-recommender-using-python/4294/tags-text-length-visualization-python-content-based-recommender-system-2/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/07/tags-text-length-visualization-python-content-based-recommender-system-2.png" data-orig-size="1083,366" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="tags-text-length-visualization-python-content-based-recommender-system-2" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/07/tags-text-length-visualization-python-content-based-recommender-system-2.png" src="https://www.relataly.com/wp-content/uploads/2022/07/tags-text-length-visualization-python-content-based-recommender-system-2-1024x346.png" alt="content-based recommender system - illustration of the distribution of word length in our bag-of-word model" class="wp-image-8866" srcset="https://www.relataly.com/wp-content/uploads/2022/07/tags-text-length-visualization-python-content-based-recommender-system-2.png 1024w, https://www.relataly.com/wp-content/uploads/2022/07/tags-text-length-visualization-python-content-based-recommender-system-2.png 300w, https://www.relataly.com/wp-content/uploads/2022/07/tags-text-length-visualization-python-content-based-recommender-system-2.png 768w, https://www.relataly.com/wp-content/uploads/2022/07/tags-text-length-visualization-python-content-based-recommender-system-2.png 1083w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<h3 class="wp-block-heading" id="h-step-3-vectorization-using-tfidfvectorizer">Step #3: Vectorization using TfidfVectorizer</h3>



<p>The next step is to create a vector matrix from the Bag of Words model. Each column from the matrix represents a word feature. This step is the basis for determining the similarity of the movies afterward. Before the vectorization, we will remove stop words from the text (e.g., and, it, that, or, why, where, etc.). In addition, I limited the number of features in the matrix to 5000 to reduce training time.</p>



<p>A simple vectorization approach is to determine the word frequency for each movie using a count vectorizer. However, a frequently mentioned disadvantage of this approach is that it does not consider how often a word occurs. For example, some words may appear in almost all items. On the other hand, some words may be prevalent in a few items but are rare in general.  So we can argue that observing rare words in an item is more informative than observing common words. Instead of a count vectorizer, we will use a more practical approach called TfidfVectorizer from the <a href="https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html" target="_blank" rel="noreferrer noopener">scikit-learn package</a>. </p>



<p>Tfidf stands for term frequency-inverse document frequency. Compared to a count vectorizer, the tf-idf vectorizer considers the overall word frequencies and weights the general importance of the words when spanning the vectors. This way, tf-idf can determine which words are more important than others, reducing the model&#8217;s complexity and improving performance. This <a href="https://medium.com/@cmukesh8688/tf-idf-vectorizer-scikit-learn-dbc0244a911a" target="_blank" rel="noreferrer noopener">medium article</a> explains the math behind tf-idf vectorization in more detail.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># set a custom stop list from nltk
stop = list(stopwords.words('english'))

# create the tfid vectorizer, alternatively you can also use countVectorizer
tfidf =  TfidfVectorizer(max_features=5000, analyzer = 'word', stop_words=set(stop))
vectorized_data = tfidf.fit_transform(df_movies['tags'])
count_matrix = pd.DataFrame(vectorized_data.toarray(), index=df_movies['tags'].index.tolist())
print(count_matrix)

</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">			0     1     2     3     4     5     6     7     8     9     ...  4990  4991  4992  4993  4994  4995  4996  4997  4998  4999
862      	0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0
8844     	0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0
15602    	0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0
31357    	0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0
11862    	0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0
...      	...   ...   ...   ...   ...   ...   ...   ...   ...   ...  ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...
439050   	0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0
111109   	0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0
67758    	0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0
227506   	0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0
461257   	0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0

[45432 rows x 5000 columns]</pre></div>



<p>The vectorization process results in a feature matrix in which each feature is a word from the text bag of words. </p>



<p>We can display features with the get_feature_names_out function from the tfidf vectorizer.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># print feature names
print(tfidf.get_feature_names_out()[940:990])</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">['climbing' 'clinteastwood' 'clinthoward' 'clive' 'cliveowen'
 'cliverevill' 'cliverussell' 'clone' 'clorisleachman' 'cloviscornillac'
 'clown' 'clugulager' 'clydekusatsu' 'co' 'coach' 'cobb' 'cocaine' 'code'
 'coffin' 'cohen' 'coldwar' 'cole' 'colehauser' 'coleman' 'colinfarrell'
 'colinfirth' 'colinhanks' 'colinkenny' 'colinsalmon' 'colleencamp'
 'college' 'colmfeore' 'colmmeaney' 'coma' 'combat' 'comedian' 'comedy'
 'comicbook' 'comingofage' 'comingout' 'common' 'communism' 'communist'
 'company' 'competition' 'composer' 'computer' 'con' 'concentrationcamp'
 'concert']</pre></div>



<p>As you can see, features are specific words,</p>



<h3 class="wp-block-heading" id="h-step-4-dimensionality-reduction-and-calculate-consine-similarities">Step #4 Dimensionality Reduction and Calculate Consine Similarities</h3>



<p>In the previous section, we created a vector matrix that contains movies and features. This matrix is the foundation for calculating similarity scores for all movies. Before we assign feature scores, we will apply dimensionality reduction.</p>



<h4 class="wp-block-heading">4.1 Dimensionality Reduction using SVD</h4>



<p>The matrix spans a high-dimensional vector space with more than 5000 feature columns. Do we need all of these features? The answer is most likely not. There are likely a lot of words in the matrix that only occur once or twice. On the other hand, words may occur in almost all movies. How can we deal with this issue?</p>



<p>The reason for this is that we have a very dimensional vector space. By reducing this space to fewer, more essential features, we can save some time training our recommender model. We will use TruncatedSVD from the scikit-learn package, a popular algorithm for dimensionality reduction. The algorithm smoothens the matrix and approximates it to a lower dimensional space, thereby reducing noise and model complexity.</p>



<p>This way, we will reduce the vector space from 5000 to 3000 features. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># reduce dimensionality for improved performance
svd = TruncatedSVD(n_components=3000)
reduced_data = svd.fit_transform(count_matrix)</pre></div>



<h4 class="wp-block-heading">4.2 Calculate Text Similarity Scores for all Movies</h4>



<p>Now that we have reduced the complexity of our vector matrix, we can calculate the similarity scores for all movies. In this process, we assign a similarity score to all item pairs that measure content closeness according to the position of the items in the vector space. </p>



<p>We use the cosine function to calculate the similarity value of the movies. The cosine similarity is a mathematical calculation to determine the mathematical similarity of two vectors. In our case, the vectors are the movie descriptions. The cosine similarity function uses these feature vectors to compare each movie to every other and assigns them a similarity value. </p>



<ul class="wp-block-list">
<li>A similarity value of -1 means that two feature vectors are correlated, and the movies are entirely different. </li>



<li>A value of 1 means that the two movies are identical. </li>



<li>A value of 0 is between and means f an average match of the feature vectors.</li>
</ul>



<p>The cosine similarity function will calculate pairwise similarities for all movies in our vector matrix. We can determine the number of pairwise comparisons with the formula k²/2, whereby k is the number of items in the vector matrix. In our case, we have a k of 45000 movies. This means the cosine similarity function must calculate about 1 billion similarity scores. So don&#8217;t worry if the process takes some time to complete. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># compute the cosine similarity matrix
similarity = cosine_similarity(reduced_data)
similarity</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">array([[ 1.00000000e+00,  9.75542082e-02,  6.00755620e-02, ...,
        -3.03965235e-04,  0.00000000e+00,  5.81243547e-05],
       [ 9.75542082e-02,  1.00000000e+00,  5.92929339e-02, ...,
        -2.97565163e-03,  0.00000000e+00,  4.57945869e-05],
       [ 6.00755620e-02,  5.92929339e-02,  1.00000000e+00, ...,
         9.40459504e-03,  0.00000000e+00, -2.22415551e-04],
       ...,
       [-3.03965235e-04, -2.97565163e-03,  9.40459504e-03, ...,
         1.00000000e+00,  0.00000000e+00, -2.60823346e-04],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [ 5.81243547e-05,  4.57945869e-05, -2.22415551e-04, ...,
        -2.60823346e-04,  0.00000000e+00,  1.00000000e+00]])</pre></div>



<div style="height:38px" aria-hidden="true" class="wp-block-spacer"></div>



<h3 class="wp-block-heading" id="h-step-5-generate-content-based-movie-recommendations">Step #5: Generate Content-based Movie Recommendations </h3>



<p>Once you have created the similarity matrix, it&#8217;s time to generate some recommendations. We begin by generating recommendations based on a single movie. In the cosine similarity matrix, the most similar movies have the highest similarity scores. Once we have the film with the highest scores, we can visualize the results in a bar chart that shows the cosine similarity scores. </p>



<p>The example below displays the results of the movie &#8220;The Matrix.&#8221; Oh, how I love this movie 🙂</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># create a function that takes in movie title as input and returns a list of the most similar movies
def get_recommendations(title, n, cosine_sim=similarity):
    
    # get the index of the movie that matches the title
    movie_index = df_movies[df_movies.title==title].new_id.values[0]
    print(movie_index, title)
    
    # get the pairwsie similarity scores of all movies with that movie and sort the movies based on the similarity scores
    sim_scores_all = sorted(list(enumerate(cosine_sim[movie_index])), key=lambda x: x[1], reverse=True)
    
    # checks if recommendations are limited
    if n &gt; 0:
        sim_scores_all = sim_scores_all[1:n+1]
        
    # get the movie indices of the top similar movies
    movie_indices = [i[0] for i in sim_scores_all]
    scores = [i[1] for i in sim_scores_all]
    
    # return the top n most similar movies from the movies df
    top_titles_df = pd.DataFrame(df_movies.iloc[movie_indices]['title'])
    top_titles_df['sim_scores'] = scores
    top_titles_df['ranking'] = range(1, len(top_titles_df) + 1)
    
    return top_titles_df, sim_scores_all

# generate a list of recommendations for a specific movie title
movie_name = 'The Matrix'
number_of_recommendations = 15
top_titles_df, _ = get_recommendations(movie_name, number_of_recommendations)
 
# visualize the results
def show_results(movie_name, top_titles_df):
    fix, ax = plt.subplots(figsize=(11, 5))
    sns.barplot(data=top_titles_df, y='title', x= 'sim_scores', color='blue')
    plt.xlim((0,1))
    plt.title(f'Top 15 recommendations for {movie_name}')
    pct_values = ['{:.2}'.format(elm) for elm in list(top_titles_df['sim_scores'])]
    ax.bar_label(container=ax.containers[0], labels=pct_values, size=12)

show_results(movie_name, top_titles_df)</pre></div>



<p>Example for the movies &#8220;Spectre&#8221; and &#8220;The Lion King&#8221;</p>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<figure class="wp-block-image size-full"><img decoding="async" width="830" height="329" data-attachment-id="9748" data-permalink="https://www.relataly.com/content-based-movie-recommender-using-python/4294/image-2-13/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/10/image-2.png" data-orig-size="830,329" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-2" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/10/image-2.png" src="https://www.relataly.com/wp-content/uploads/2022/10/image-2.png" alt="" class="wp-image-9748" srcset="https://www.relataly.com/wp-content/uploads/2022/10/image-2.png 830w, https://www.relataly.com/wp-content/uploads/2022/10/image-2.png 300w, https://www.relataly.com/wp-content/uploads/2022/10/image-2.png 768w" sizes="(max-width: 830px) 100vw, 830px" /></figure>
</div>
</div>



<h3 class="wp-block-heading">Step #6: Generate Content-based Movie Recommendations</h3>



<p>But what if you want to generate recommendations for specific users that have seen several movies? For this, we can aggregate the similarity scores for all films the user has seen. This way, we create a new dataframe that sums up similarity scores. To return the top-recommended movies, we can sort this dataframe by similarity scores and replace the top elements. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># list of movies a user has seen
movie_list = ['The Lion King', 'Seven', 'RoboCop 3', 'Blade Runner', 'Quantum of Solace', 'Casino Royale', 'Skyfall']

# create a copy of the movie dataframe and add a column in which we aggregated the scores
user_scores = pd.DataFrame(df_movies['title'])
user_scores['sim_scores'] = 0.0

# top number of scores to be considered for each movie
number_of_recommendations = 10000
for movie_name in movie_list:
    top_titles_df, _ = get_recommendations(movie_name, number_of_recommendations)
    # aggregate the scores
    user_scores = pd.concat([user_scores, top_titles_df[['title', 'sim_scores']]]).groupby(['title'], as_index=False).sum({'sim_scores'})
# sort and print the aggregated scores
user_scores.sort_values(by='sim_scores', ascending=False)[1:20]</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="8888" data-permalink="https://www.relataly.com/content-based-movie-recommender-using-python/4294/image-8-12/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/07/image-8.png" data-orig-size="1375,583" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-8" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/07/image-8.png" src="https://www.relataly.com/wp-content/uploads/2022/07/image-8-1024x434.png" alt="Content-based movie recommdations for a user that has previously watched The Lion King, Seven, RoboCop 3, Blade Runner, Quantum of Solace, Casino Royale, Skyfall" class="wp-image-8888" width="805" height="341" srcset="https://www.relataly.com/wp-content/uploads/2022/07/image-8.png 1024w, https://www.relataly.com/wp-content/uploads/2022/07/image-8.png 300w, https://www.relataly.com/wp-content/uploads/2022/07/image-8.png 768w, https://www.relataly.com/wp-content/uploads/2022/07/image-8.png 1375w" sizes="(max-width: 805px) 100vw, 805px" /></figure>



<div style="height:65px" aria-hidden="true" class="wp-block-spacer"></div>



<h2 class="wp-block-heading" id="h-summary">Summary</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>In this tutorial, you have learned to implement a simple content-based recommender system for movie recommendations in Python.  We have used several movie-specific details to calculate a similarity matrix for all movies in our dataset. Finally, we have used this model to generate recommendations for two cases:</p>



<ul class="wp-block-list">
<li>Films that are similar to a specific movie</li>



<li>Films that are recommended based on the watchlist of a particular user.</li>
</ul>



<p>A downside of content-based recommenders is that you cannot test their performance unless you know how users perceived the recommendations. This is because content-based recommenders can only determine which items in a dataset are similar. To understand how well the suggestions work, you must include additional data about actual user preferences. </p>



<p>More advanced recommenders will combine content-based recommendations with user-item interactions (<a href="https://www.relataly.com/building-a-movie-recommender-using-collaborative-filtering/4376/" target="_blank" rel="noreferrer noopener">e.g., collaborative filtering</a>). Such models are called hybrid recommenders, but this is something for another article.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-large"><img decoding="async" width="512" height="472" data-attachment-id="12683" data-permalink="https://www.relataly.com/dog-with-popcorn-machine-learning-movie-recommender-python-tutorial-relataly-midjourney-ai-min/" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/03/dog-with-popcorn-machine-learning-movie-recommender-python-tutorial-relataly-midjourney-ai-min.png" data-orig-size="532,490" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="dog with popcorn machine learning movie recommender python tutorial relataly midjourney ai-min" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/dog-with-popcorn-machine-learning-movie-recommender-python-tutorial-relataly-midjourney-ai-min.png" src="https://www.relataly.com/wp-content/uploads/2023/03/dog-with-popcorn-machine-learning-movie-recommender-python-tutorial-relataly-midjourney-ai-min-512x472.png" alt="" class="wp-image-12683" srcset="https://www.relataly.com/wp-content/uploads/2023/03/dog-with-popcorn-machine-learning-movie-recommender-python-tutorial-relataly-midjourney-ai-min.png 512w, https://www.relataly.com/wp-content/uploads/2023/03/dog-with-popcorn-machine-learning-movie-recommender-python-tutorial-relataly-midjourney-ai-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/03/dog-with-popcorn-machine-learning-movie-recommender-python-tutorial-relataly-midjourney-ai-min.png 532w" sizes="(max-width: 512px) 100vw, 512px" /><figcaption class="wp-element-caption">Image created with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a></figcaption></figure>
</div>
</div>



<p></p>



<h2 class="wp-block-heading">Sources and Further Reading</h2>



<p>Below are some resources for further reading on recommender systems and content-based models.</p>



<p><strong>Books</strong></p>



<ol class="wp-block-list"><li><a href="https://amzn.to/3T3Sl2V" target="_blank" rel="noreferrer noopener">Charu C. Aggarwal (2016) Recommender Systems: The Textbook</a></li><li><a href="https://amzn.to/3D0P8eX" target="_blank" rel="noreferrer noopener">Kin Falk (2019) Practical Recommender Systems</a></li><li><a href="https://amzn.to/3MAy8j5" target="_blank" rel="noreferrer noopener">Andriy Burkov (2020) Machine Learning Engineering</a></li><li><a href="https://amzn.to/3D0gB0e" target="_blank" rel="noreferrer noopener">Oliver Theobald (2020) Machine Learning For Absolute Beginners: A Plain English Introduction</a></li><li><a href="https://amzn.to/3S9Nfkl" target="_blank" rel="noreferrer noopener">Aurélien Géron (2019) Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems </a></li><li><a href="https://amzn.to/3EKidwE" target="_blank" rel="noreferrer noopener">David Forsyth (2019) Applied Machine Learning Springer</a></li></ol>



<p class="has-contrast-2-color has-base-3-background-color has-text-color has-background"><em>The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.</em></p>



<p><strong>Articles</strong></p>



<ol class="wp-block-list"><li><a href="https://surprise.readthedocs.io/en/stable/getting_started.html" target="_blank" rel="noreferrer noopener">Getting started with Suprise</a></li><li><a href="https://onespire.hu/sap-news-en/history-of-recommender-systems/" target="_blank" rel="noreferrer noopener">About the history of Recommender Systems</a></li><li><a href="https://www.freecodecamp.org/news/singular-value-decomposition-vs-matrix-factorization-in-recommender-systems-b1e99bc73599/" target="_blank" rel="noreferrer noopener"></a><a href="https://www.freecodecamp.org/news/singular-value-decomposition-vs-matrix-factorization-in-recommender-systems-b1e99bc73599/" target="_blank" rel="noreferrer noopener">Singular value decomposition vs. matrix factorization</a></li><li><a href="https://proceedings.neurips.cc/paper/2007/file/d7322ed717dedf1eb4e6e52a37ea7bcd-Paper.pdf" target="_blank" rel="noreferrer noopener">Probabilistic Matrix Factorization</a></li></ol>
<p>The post <a href="https://www.relataly.com/content-based-movie-recommender-using-python/4294/">Create a Personalized Movie Recommendation Engine using Content-based Filtering in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.relataly.com/content-based-movie-recommender-using-python/4294/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">4294</post-id>	</item>
		<item>
		<title>Using Random Search to Tune the Hyperparameters of a Random Decision Forest with Python</title>
		<link>https://www.relataly.com/using-random-search-to-tune-the-hyperparameters-of-a-random-decision-forest-with-python/6875/</link>
					<comments>https://www.relataly.com/using-random-search-to-tune-the-hyperparameters-of-a-random-decision-forest-with-python/6875/#respond</comments>
		
		<dc:creator><![CDATA[Florian Follonier]]></dc:creator>
		<pubDate>Thu, 07 Apr 2022 17:55:36 +0000</pubDate>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Classification (two-class)]]></category>
		<category><![CDATA[Cross-Validation]]></category>
		<category><![CDATA[Finance]]></category>
		<category><![CDATA[Hyperparameter Tuning]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Random Decision Forests]]></category>
		<category><![CDATA[Sales Forecasting]]></category>
		<category><![CDATA[Scikit-Learn]]></category>
		<category><![CDATA[Seaborn]]></category>
		<category><![CDATA[Use Cases]]></category>
		<category><![CDATA[Classic Machine Learning]]></category>
		<category><![CDATA[Intermediate Tutorials]]></category>
		<category><![CDATA[Multivariate Models]]></category>
		<category><![CDATA[Random Forest Regression]]></category>
		<category><![CDATA[Random Search]]></category>
		<category><![CDATA[Supervised Learning]]></category>
		<category><![CDATA[Tuning Random Decision Forests]]></category>
		<guid isPermaLink="false">https://www.relataly.com/?p=6875</guid>

					<description><![CDATA[<p>Perfecting your machine learning model&#8217;s hyperparameters can often feel like hunting for a proverbial needle in a haystack. But with the Random Search algorithm, this intricate process of hyperparameter tuning can be efficiently automated, saving you valuable time and effort. Hyperparameters are properties intrinsic to your model, like the number of estimators in an ensemble ... <a title="Using Random Search to Tune the Hyperparameters of a Random Decision Forest with Python" class="read-more" href="https://www.relataly.com/using-random-search-to-tune-the-hyperparameters-of-a-random-decision-forest-with-python/6875/" aria-label="Read more about Using Random Search to Tune the Hyperparameters of a Random Decision Forest with Python">Read more</a></p>
<p>The post <a href="https://www.relataly.com/using-random-search-to-tune-the-hyperparameters-of-a-random-decision-forest-with-python/6875/">Using Random Search to Tune the Hyperparameters of a Random Decision Forest with Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Perfecting your machine learning model&#8217;s hyperparameters can often feel like hunting for a proverbial needle in a haystack. But with the Random Search algorithm, this intricate process of hyperparameter tuning can be efficiently automated, saving you valuable time and effort. Hyperparameters are properties intrinsic to your model, like the number of estimators in an ensemble model, and heavily influence its performance. Unlike model parameters, which are discovered during training by the machine learning algorithm, hyperparameters require pre-specification.</p>



<p>In this comprehensive Python tutorial, we&#8217;ll guide you on how to harness the power of Random Search to optimize a regression model&#8217;s hyperparameters. Our illustrative example utilizes a Support Vector Machine (SVM) for predicting house prices. However, the fundamental principles you&#8217;ll learn can be seamlessly applied to any model. So why painstakingly fine-tune hyperparameters manually when Random Search can handle the task efficiently?</p>



<p>Here&#8217;s a preview of what this Python tutorial entails:</p>



<ol class="wp-block-list">
<li>A brief overview of how Random Search operates and instances where it might be preferable to Grid Search.</li>



<li>A hands-on Python tutorial featuring a public house price dataset from Kaggle.com. The aim here is to train a regression model capable of predicting US house prices based on various properties.</li>



<li>Training a &#8216;best-guess&#8217; model in Python, followed by using Random Search to discover a model with enhanced performance.</li>



<li>Finally, we&#8217;ll implement cross-validation to validate our models&#8217; performance.</li>
</ol>



<p>By the end of this tutorial, you&#8217;ll be well-equipped to let Random Search efficiently fine-tune your model&#8217;s hyperparameters, freeing up your time for other crucial tasks.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h2 class="wp-block-heading" id="h-hyperparameter-tuning">Hyperparameter Tuning </h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Hyperparameters are configuration options that allow us to customize machine learning models and improve their performance. While normal parameters are the internal coefficients that the model learns during training, we need to specify hyperparameters before the training. It is usually impossible to find the best configuration without testing different configurations. </p>



<p>Searching for a suitable model configuration is called &#8220;hyperparameter tuning&#8221; or &#8220;hyperparameter optimization.&#8221; Machine learning algorithms have varying hyperparameters and parameter values. For example, a random decision forest classifier allows us to configure varying parameters such as the number of trees, the maximum tree depth, and the minimum number of nodes required for a new branch. </p>



<p>The hyperparameters and the range of possible parameter values span a search space in which we seek to identify the best configuration. The larger the search space, the more difficult it gets to find an optimal model. We can use random search to automatize this process.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img decoding="async" width="506" height="501" data-attachment-id="12416" data-permalink="https://www.relataly.com/using-random-search-to-tune-the-hyperparameters-of-a-random-decision-forest-with-python/6875/random-hyperparameter-tuning-machine-learning/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/02/random-hyperparameter-tuning-machine-learning.png" data-orig-size="506,501" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="random-hyperparameter-tuning-machine-learning" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/02/random-hyperparameter-tuning-machine-learning.png" src="https://www.relataly.com/wp-content/uploads/2023/02/random-hyperparameter-tuning-machine-learning.png" alt="Random search can be an efficient way to tune the hyperparameters of a machine learning model. Image generated with Midjourney. Image of exploding dices with different colors. " class="wp-image-12416" srcset="https://www.relataly.com/wp-content/uploads/2023/02/random-hyperparameter-tuning-machine-learning.png 506w, https://www.relataly.com/wp-content/uploads/2023/02/random-hyperparameter-tuning-machine-learning.png 300w, https://www.relataly.com/wp-content/uploads/2023/02/random-hyperparameter-tuning-machine-learning.png 140w" sizes="(max-width: 506px) 100vw, 506px" /><figcaption class="wp-element-caption">Random search can be an efficient way to tune the hyperparameters of a machine learning model. Image generated with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a></figcaption></figure>
</div>
</div>



<h2 class="wp-block-heading">Techniques for Tuning Hyperparameters</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Hyperparameter tuning is the process of adjusting the hyperparameters of a machine learning algorithm to optimize its performance on a specific dataset or task. Several techniques can be used for hyperparameter tuning, including:</p>



<ol class="wp-block-list">
<li><strong><a href="https://www.relataly.com/hyperparameter-tuning-with-grid-search/2261/" target="_blank" rel="noreferrer noopener">Grid Search</a>:</strong> grid search is a brute-force search algorithm that systematically evaluates a given set of hyperparameter values by training and evaluating a model for each combination of values. It is a simple and effective technique, but it can be computationally expensive, especially for large or complex datasets.</li>



<li><strong>Random Search: </strong>As mentioned, random search is an alternative to grid search that randomly samples a given set of hyperparameter values rather than evaluating all possible combinations. It can be more efficient than grid search, but it may not find the optimal set of hyperparameters.</li>



<li><strong>Bayesian Optimization:</strong> A bayesian optimization is a probabilistic approach to hyperparameter tuning, which uses Bayesian inference to model the distribution of hyperparameter values that are likely to produce a good performance. It can be more efficient and effective than grid search or random search, but it can be more challenging to implement and interpret.</li>



<li><strong>Genetic Algorithms:</strong> genetic algorithms are optimization algorithms inspired by the principles of natural selection and genetics. They use a population of candidate solutions, which are iteratively evolved and selected based on their fitness or performance, to find the optimal set of hyperparameters.</li>
</ol>



<p>In this article, we specifically look at the Random Search technique.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img decoding="async" width="892" height="512" data-attachment-id="12417" data-permalink="https://www.relataly.com/using-random-search-to-tune-the-hyperparameters-of-a-random-decision-forest-with-python/6875/image-of-an-old-mechanic-tuning-a-car-hyperparameter-tuning-machine-learning-python-tutorial-scikit-learn/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/02/Image-of-an-old-mechanic-tuning-a-car.-Hyperparameter-Tuning-Machine-Learning-Python-Tutorial-Scikit-Learn.png" data-orig-size="892,512" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="Image-of-an-old-mechanic-tuning-a-car.-Hyperparameter-Tuning-Machine-Learning-Python-Tutorial-Scikit-Learn" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/02/Image-of-an-old-mechanic-tuning-a-car.-Hyperparameter-Tuning-Machine-Learning-Python-Tutorial-Scikit-Learn.png" src="https://www.relataly.com/wp-content/uploads/2023/02/Image-of-an-old-mechanic-tuning-a-car.-Hyperparameter-Tuning-Machine-Learning-Python-Tutorial-Scikit-Learn.png" alt="You can spend much time tuning a machine learning model. Image generated with Midjourney. portrait of an old mechanic working on a car. relataly.com" class="wp-image-12417" srcset="https://www.relataly.com/wp-content/uploads/2023/02/Image-of-an-old-mechanic-tuning-a-car.-Hyperparameter-Tuning-Machine-Learning-Python-Tutorial-Scikit-Learn.png 892w, https://www.relataly.com/wp-content/uploads/2023/02/Image-of-an-old-mechanic-tuning-a-car.-Hyperparameter-Tuning-Machine-Learning-Python-Tutorial-Scikit-Learn.png 300w, https://www.relataly.com/wp-content/uploads/2023/02/Image-of-an-old-mechanic-tuning-a-car.-Hyperparameter-Tuning-Machine-Learning-Python-Tutorial-Scikit-Learn.png 768w" sizes="(max-width: 892px) 100vw, 892px" /><figcaption class="wp-element-caption">You can spend much time tuning a machine learning model. Image generated with Midjourney.</figcaption></figure>
</div>
</div>



<h2 class="wp-block-heading">What is Random Search?</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>The random search algorithm generates models from hyperparameter permutations randomly selected from a grid of parameter values. The idea behind the randomized approach is that testing random configurations efficiently identifies a good model. We can use random search both for regression and classification models.</p>



<p>Random Search and Grid Search are the most popular techniques for hyperparametric tuning, and both methods are often compared. Unlike random search, grid search covers the search space exhaustively by trying all possible variants. The technique works well for testing a small number of configurations already known to work well. </p>



<p>As long as both search space and training time are small, the grid search technique is excellent for finding the best model. However, the number of model variants increases exponentially with the size of the search space. It is often more efficient for large search spaces or complex models to use random search.</p>



<p>Since random search does not exhaustively cover the search space, it does not necessarily yield the best model. However, it is also much faster than grid search and efficient in delivering a suitable model in a short time.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"><div class="wp-block-image">
<figure class="alignright size-full is-resized"><img decoding="async" data-attachment-id="6924" data-permalink="https://www.relataly.com/using-random-search-to-tune-the-hyperparameters-of-a-random-decision-forest-with-python/6875/image-16-9/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/04/image-16.png" data-orig-size="640,469" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-16" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/04/image-16.png" src="https://www.relataly.com/wp-content/uploads/2022/04/image-16.png" alt="random decision forest python,
hyperparameter tuning,
comparison between random search and grid search" class="wp-image-6924" width="409" height="301" srcset="https://www.relataly.com/wp-content/uploads/2022/04/image-16.png 640w, https://www.relataly.com/wp-content/uploads/2022/04/image-16.png 300w" sizes="(max-width: 409px) 100vw, 409px" /><figcaption class="wp-element-caption">Random Search vs. Exhaustive Grid Search</figcaption></figure>
</div></div>
</div>



<div style="height:48px" aria-hidden="true" class="wp-block-spacer"></div>



<h2 class="wp-block-heading" id="h-tuning-the-hyperparameters-of-a-random-decision-forest-regressor-in-python-using-random-search">Tuning the Hyperparameters of a Random Decision Forest Regressor in Python using Random Search</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p><br>In this tutorial, we delve into the use of the Random Search algorithm in Python, specifically for predicting house prices. We&#8217;ll be using a dataset rich in diverse house characteristics. Various elements, such as data quality and quantity, model intricacy, the selection of machine learning algorithms, and housing market stability, significantly influence the accuracy of house price predictions.</p>



<p>Our initial model employs a Random Decision Forest algorithm, which we&#8217;ll optimize using a random search approach for hyperparameters tuning. By identifying and implementing a more advantageous configuration, we aim to enhance our model&#8217;s performance significantly.</p>



<p>Here&#8217;s a concise outline of the steps we&#8217;ll undertake:</p>



<ol class="wp-block-list">
<li>Loading the house price dataset</li>



<li>Exploring the dataset intricacies</li>



<li>Preparing the data for modeling</li>



<li>Training a baseline Random Decision Forest model</li>



<li>Implementing a random search approach for model optimization</li>



<li>Measuring and evaluating the performance of our optimized model</li>
</ol>



<p>Through this step-by-step guide, you&#8217;ll learn to enhance model performance, further refining your understanding of Random Search algorithm implementation in Python.</p>



<p>The Python code is available in the relataly GitHub repository. </p>



<div class="wp-block-kadence-advancedbtn kb-buttons-wrap kb-btns_f9d778-26"><a class="kb-button kt-button button kb-btn_b5d394-00 kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit  kt-btn-has-text-true kt-btn-has-svg-true  wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-tutorials/blob/master/11%20Hyperparamter%20Tuning/016%20Hyperparameter%20Tuning%20of%20Random%20Decision%20Forests%20using%20Grid%20Search.ipynb" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fe_eye kt-btn-icon-side-left"><svg viewBox="0 0 24 24"  fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M1 12s4-8 11-8 11 8 11 8-4 8-11 8-11-8-11-8z"/><circle cx="12" cy="12" r="3"/></svg></span><span class="kt-btn-inner-text">View on GitHub </span></a>

<a class="kb-button kt-button button kb-btn_ce738a-fc kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit  kt-btn-has-text-true kt-btn-has-svg-true  wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-API-tutorials" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fa_github kt-btn-icon-side-left"><svg viewBox="0 0 496 512"  fill="currentColor" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg></span><span class="kt-btn-inner-text">Relataly Github Repo </span></a></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img decoding="async" width="505" height="504" data-attachment-id="12422" data-permalink="https://www.relataly.com/using-random-search-to-tune-the-hyperparameters-of-a-random-decision-forest-with-python/6875/isometric-house-min/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/02/isometric-house-min.png" data-orig-size="505,504" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="isometric-house-min" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/02/isometric-house-min.png" src="https://www.relataly.com/wp-content/uploads/2023/02/isometric-house-min.png" alt="Now that we have trained a house price prediction model, we can use it to asses the price of new houses. Image generated with Midjourney. Python machine learning tutorial. relataly.com" class="wp-image-12422" srcset="https://www.relataly.com/wp-content/uploads/2023/02/isometric-house-min.png 505w, https://www.relataly.com/wp-content/uploads/2023/02/isometric-house-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/02/isometric-house-min.png 140w" sizes="(max-width: 505px) 100vw, 505px" /><figcaption class="wp-element-caption">Once we have trained a house price prediction model, we can use it to asses the price of new houses. Image generated with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a>.</figcaption></figure>
</div>
</div>



<h3 class="wp-block-heading" id="h-prerequisites">Prerequisites</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Before starting the coding part, ensure that you have set up your Python (3.8 or higher) environment and required packages. If you don&#8217;t have an environment, follow&nbsp;<a href="https://www.relataly.com/anaconda-python-environment-machine-learning/1663/" target="_blank" rel="noreferrer noopener">this tutorial</a>&nbsp;to set up the&nbsp;<a href="https://www.anaconda.com/products/individual" target="_blank" rel="noreferrer noopener">Anaconda environment</a>.</p>



<p>Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:&nbsp;</p>



<ul class="wp-block-list">
<li><em><a href="https://pandas.pydata.org/" target="_blank" rel="noreferrer noopener">pandas</a></em></li>



<li><em><a href="https://numpy.org/" target="_blank" rel="noreferrer noopener">NumPy</a></em></li>



<li><a href="https://docs.python.org/3/library/math.html" target="_blank" rel="noreferrer noopener">math</a></li>



<li><em><a href="https://matplotlib.org/" target="_blank" rel="noreferrer noopener">matplotlib</a></em></li>
</ul>



<p>In addition, we will be using the Python Machine Learning library <a href="https://scikit-learn.org/stable/" target="_blank" rel="noreferrer noopener">Scikit-learn</a> to implement the random forest and the grid search technique. </p>



<p>You can install packages using console commands:</p>



<ul class="wp-block-list">
<li><em>pip install &lt;package name&gt;</em></li>



<li><em>conda install &lt;package name&gt;</em>&nbsp;(if you are using the anaconda packet manager)</li>
</ul>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h3 class="wp-block-heading" id="h-house-price-prediction-about-the-use-case-and-the-data">House Price Prediction: About the Use Case and the Data</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>House price prediction is the process of using statistical and machine learning techniques to predict the future value of a house. This can be useful for a variety of applications, such as helping homeowners and real estate professionals to make informed decisions about buying and selling properties. In order to make accurate predictions, it is important to have access to high-quality data about the housing market.</p>



<p>In this tutorial, we will work with a house price dataset from the <a href="//www.kaggle.com/c/house-prices-advanced-regression-techniques" target="_blank" rel="noreferrer noopener">house price regression challenge on Kaggle.com</a>. The dataset is available via a git hub repository. It contains information about 4800 houses sold between 2016 and 2020 in the US. The data includes the sale price and a list of 48 house characteristics, such as:</p>



<ul class="wp-block-list">
<li>Year &#8211; The year of construction, </li>



<li>SaleYear &#8211; The year in which the house was sold </li>



<li>Lot Area &#8211; The lot area of the house </li>



<li>Quality &#8211; The overall quality of the house from one (lowest) to ten (highest)</li>



<li>Road &#8211; The type of road, e.g., paved, etc. </li>



<li>Utility &#8211; The type of the utility </li>



<li>Park Lot Area &#8211; The parking space included with the property </li>



<li>Room number &#8211; The number of rooms </li>
</ul>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img decoding="async" width="505" height="505" data-attachment-id="12421" data-permalink="https://www.relataly.com/using-random-search-to-tune-the-hyperparameters-of-a-random-decision-forest-with-python/6875/isometric-house-collection-python-machine-learning-min/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/02/isometric-house-collection-python-machine-learning-min.png" data-orig-size="505,505" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="isometric-house-collection-python-machine-learning-min" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/02/isometric-house-collection-python-machine-learning-min.png" src="https://www.relataly.com/wp-content/uploads/2023/02/isometric-house-collection-python-machine-learning-min.png" alt="Predicting house prices with machine learning. Image generated with Midjourney. Isometric view of houses. relataly.com" class="wp-image-12421" srcset="https://www.relataly.com/wp-content/uploads/2023/02/isometric-house-collection-python-machine-learning-min.png 505w, https://www.relataly.com/wp-content/uploads/2023/02/isometric-house-collection-python-machine-learning-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/02/isometric-house-collection-python-machine-learning-min.png 140w" sizes="(max-width: 505px) 100vw, 505px" /><figcaption class="wp-element-caption">Predicting house prices with machine learning. Image generated with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a>.</figcaption></figure>
</div>
</div>



<h3 class="wp-block-heading" id="h-step-1-load-the-data">Step #1 Load the Data</h3>



<p>We begin by loading the house price data from the relataly GitHub repository. A separate download is not required.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># A tutorial for this file is available at www.relataly.com

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error
from sklearn import svm

# Source: 
# https://www.kaggle.com/c/house-prices-advanced-regression-techniques

# Load train and test datasets
path = &quot;https://raw.githubusercontent.com/flo7up/relataly_data/main/house_prices/train.csv&quot;
df = pd.read_csv(path)
print(df.columns)
df.head()</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Index(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street',
       'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig',
       'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType',
       'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',
       'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',
       'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual',
       'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1',
       'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating',
       'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF',
       'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
       'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual',
       'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType',
       'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual',
       'GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF',
       'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC',
       'Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType',
       'SaleCondition', 'SalePrice'],
      dtype='object')
	Id	MSSubClass	MSZoning	LotFrontage	LotArea	Street	Alley	LotShape	LandContour	Utilities	...	PoolArea	PoolQC	Fence	MiscFeature	MiscVal	MoSold	YrSold	SaleType	SaleCondition	SalePrice
0	1	60			RL			65.0		8450	Pave	NaN		Reg			Lvl			AllPub		...	0			NaN		NaN		NaN			0		2		2008	WD			Normal			208500
1	2	20			RL			80.0		9600	Pave	NaN		Reg			Lvl			AllPub		...	0			NaN		NaN		NaN			0		5		2007	WD			Normal			181500
2	3	60			RL			68.0		11250	Pave	NaN		IR1			Lvl			AllPub		...	0			NaN		NaN		NaN			0		9		2008	WD			Normal			223500
3	4	70			RL			60.0		9550	Pave	NaN		IR1			Lvl			AllPub		...	0			NaN		NaN		NaN			0		2		2006	WD			Abnorml			140000
4	5	60			RL			84.0		14260	Pave	NaN		IR1			Lvl			AllPub		...	0			NaN		NaN		NaN			0		12		2008	WD			Normal			250000
5 rows × 81 columns</pre></div>



<h3 class="wp-block-heading" id="h-step-2-explore-the-data">Step #2 Explore the Data</h3>



<p>Before jumping into preprocessing and model training, let&#8217;s quickly explore the data. A distribution plot can help us understand our dataset&#8217;s frequency of regression values.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Create histograms for feature columns separated by prediction label value
ax = sns.displot(data=df[['SalePrice']].dropna(), height=6, aspect=2)
plt.title('Sale Price Distribution')</pre></div>



<figure class="wp-block-image size-full"><img decoding="async" width="944" height="440" data-attachment-id="8424" data-permalink="https://www.relataly.com/using-random-search-to-tune-the-hyperparameters-of-a-random-decision-forest-with-python/6875/histplot/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/histplot.png" data-orig-size="944,440" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="histplot" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/histplot.png" src="https://www.relataly.com/wp-content/uploads/2022/05/histplot.png" alt="random search hyperparameter tuning python. random forest regression,
sale price distribution" class="wp-image-8424" srcset="https://www.relataly.com/wp-content/uploads/2022/05/histplot.png 944w, https://www.relataly.com/wp-content/uploads/2022/05/histplot.png 300w, https://www.relataly.com/wp-content/uploads/2022/05/histplot.png 768w" sizes="(max-width: 944px) 100vw, 944px" /></figure>



<p>For feature selection, it is helpful to understand the predictive power of the different variables in a dataset. We can use scatterplots to estimate the predictive power of specific features. Running the code below will create a scatterplot that visualizes the relation between the sale price, lot area, and the house&#8217;s overall quality.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Create histograms for feature columns separated by prediction label value
plt.figure(figsize=(16,6))
df_features = df[['SalePrice', 'LotArea', 'OverallQual']]
sns.scatterplot(data=df_features, x='LotArea', y='SalePrice', hue='OverallQual')
plt.title('Sale Price Distribution')</pre></div>



<figure class="wp-block-image size-full"><img decoding="async" width="966" height="387" data-attachment-id="8430" data-permalink="https://www.relataly.com/using-random-search-to-tune-the-hyperparameters-of-a-random-decision-forest-with-python/6875/overall-quality-vs-lotarea-depending-on-sale-price/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/Overall-Quality-vs-LotArea-depending-on-Sale-Price.png" data-orig-size="966,387" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="Overall-Quality-vs-LotArea-depending-on-Sale-Price" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/Overall-Quality-vs-LotArea-depending-on-Sale-Price.png" src="https://www.relataly.com/wp-content/uploads/2022/05/Overall-Quality-vs-LotArea-depending-on-Sale-Price.png" alt="random search hyperparameter tuning python. random forest regression,
scatter plot, feature selection" class="wp-image-8430" srcset="https://www.relataly.com/wp-content/uploads/2022/05/Overall-Quality-vs-LotArea-depending-on-Sale-Price.png 966w, https://www.relataly.com/wp-content/uploads/2022/05/Overall-Quality-vs-LotArea-depending-on-Sale-Price.png 300w, https://www.relataly.com/wp-content/uploads/2022/05/Overall-Quality-vs-LotArea-depending-on-Sale-Price.png 768w" sizes="(max-width: 966px) 100vw, 966px" /></figure>



<p>As expected, the scatterplot shows that the sale price increases with the overall quality. On the other hand, the LotArea has only a minor effect on the sale price. </p>



<h3 class="wp-block-heading">Step #3 Data Preprocessing</h3>



<p>Next, we prepare the data for use as input to train a regression model. Because we want to keep things simple, we reduce the number of variables and use only a small set of features. In addition, we encode categorical variables with integer dummy values.</p>



<p>To ensure that our regression model does not know the target variable, we separate house price (y) from features (x). Last, we split the data into separate datasets for training and testing. The result is four different data sets: x_train, y_train, x_test, and y_test.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">def preprocessFeatures(df):   
    # Define a list of relevant features
    feature_list = ['SalePrice', 'OverallQual', 'Utilities', 'GarageArea', 'LotArea', 'OverallCond']
    df_dummy = pd.get_dummies(df[feature_list])
    # Cleanse records with na values
    #df_prep = df_prep.dropna()
    return df_dummy

df_base = preprocessFeatures(df)

# Split the data into x_train and y_train data sets
x_train, x_test, y_train, y_test = train_test_split( df_base.copy(), df_base['SalePrice'].copy(), train_size=0.7, random_state=0)
x_train</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">		OverallQual	GarageArea	LotArea	OverallCond	Utilities_AllPub	Utilities_NoSeWa
682		6			431			2887	5			1					0
960		5			0			7207	7			1					0
1384	6			280			9060	5			1					0
1100	2			246			8400	5			1					0
416		6			440			7844	7			1					0</pre></div>



<h3 class="wp-block-heading" id="h-step-4-train-different-regression-models-using-random-search">Step #4 Train Different Regression Models using Random Search</h3>



<p>Now that the dataset is ready, we can train the random decision forest regressor. To do this, we first define a dictionary with different parameter ranges. In addition, we need to define the number of model variants (n) that the algorithm should try. The random search algorithm then selects n random permutations from the grid and uses them to train the model. </p>



<p>We use the RandomSearchCV algorithm from the scikit-learn package. The &#8220;CV&#8221; in the function name stands for cross-validation. Cross-validation involves splitting the data into subsets (folds) and rotating them between training and validation runs. This way, each model is trained and tested multiple times on different data partitions. When the search algorithm finally evaluates the model configuration, it summarizes these results into a test score.</p>



<p>We use a Random Decision Forest &#8211; a robust machine learning algorithm that can handle classification and regression tasks. As a so-called ensemble model, the Random Forest considers predictions from a set of multiple independent estimators. The estimator is an important parameter to pass to the RandomSearchCV function. Random decision forests have several hyperparameters that we can use to influence their behavior. We define the following parameter ranges:</p>



<ul class="wp-block-list">
<li>max_leaf_nodes = [2, 3, 4, 5, 6, 7]</li>



<li>min_samples_split = [5, 10, 20, 50]</li>



<li>max_depth = [5,10,15,20]</li>



<li>max_features = [3,4,5]</li>



<li>n_estimators = [50, 100, 200]</li>
</ul>



<p>These parameter ranges define the search space from which the randomized search algorithm (RandomSearchCV) will select random configurations. Other parameters will use default values as defined by <a href="https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html?highlight=random%20forest#sklearn.ensemble.RandomForestClassifier" target="_blank" rel="noreferrer noopener">scikit-learn</a>.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Define the Estimator and the Parameter Ranges
dt = RandomForestRegressor()
number_of_iterations = 20
max_leaf_nodes = [2, 3, 4, 5, 6, 7]
min_samples_split = [5, 10, 20, 50]
max_depth = [5,10,15,20]
max_features = [3,4,5]
n_estimators = [50, 100, 200]

# Define the param distribution dictionary
param_distributions = dict(max_leaf_nodes=max_leaf_nodes, 
                           min_samples_split=min_samples_split, 
                           max_depth=max_depth,
                           max_features=max_features,
                           n_estimators=n_estimators)

# Build the gridsearch
grid = RandomizedSearchCV(estimator=dt, 
                          param_distributions=param_distributions, 
                          n_iter=number_of_iterations, 
                          cv = 5)

grid_results = grid.fit(x_train, y_train)

# Summarize the results in a readable format
print(&quot;Best params: {0}, using {1}&quot;.format(grid_results.cv_results_['mean_test_score'], grid_results.best_params_))
results_df = pd.DataFrame(grid_results.cv_results_)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Best params: [0.68738293 0.49581669 0.52138751 0.61235299 0.65360944 0.61165147
 0.70392285 0.52278886 0.67687248 0.68219638 0.70031536 0.65842909
 0.51939338 0.70801017 0.70911805 0.69543885 0.67983801 0.60744371
 0.68270285 0.70741042], using {'n_estimators': 100, 'min_samples_split': 5, 'max_leaf_nodes': 7, 'max_features': 3, 'max_depth': 15}
	mean_fit_time	std_fit_time	mean_score_time	std_score_time	param_n_estimators	param_min_samples_split	param_max_leaf_nodes	param_max_features	param_max_depth	params	split0_test_score	split1_test_score	split2_test_score	split3_test_score	split4_test_score	mean_test_score	std_test_score	rank_test_score
0	0.049196		0.002071		0.004074		0.000820		50					20						5	4	15	{'n_estimators': 50, 'min_samples_split': 20, ...	0.662973	0.705533	0.669520	0.702608	0.696280	0.687383	0.017637	7
1	0.041115		0.000554		0.003046		0.000094		50					50						2	3	10	{'n_estimators': 50, 'min_samples_split': 50, ...	0.490984	0.527231	0.426270	0.523086	0.511513	0.495817	0.036978	20
2	0.043325		0.000779		0.003486		0.000447		50					50						2	5	20	{'n_estimators': 50, 'min_samples_split': 50, ...	0.484524	0.559358	0.485459	0.517253	0.560343	0.521388	0.033545	18
3	0.162083		0.005665		0.012420		0.004788		200					5						3	3	20	{'n_estimators': 200, 'min_samples_split': 5, ...	0.586586	0.638341	0.573437	0.626793	0.636608	0.612353	0.027021	14
4	0.166659		0.003026		0.010958		0.000084		200					10						4	3	15	{'n_estimators': 200, 'min_samples_split': 10,...	0.633305	0.679161	0.623236	0.661864	0.670481	0.653609	0.021636	13</pre></div>



<p>These are the five best models and their respective hyperparameter configurations.</p>



<h3 class="wp-block-heading" id="h-step-5-select-the-best-model-and-measure-performance"><strong>Step #5 Select the best Model and Measure Performance</strong></h3>



<p>Finally, we will choose the best model from the list using the &#8220;best_model&#8221; function. We then calculate the MAE and the MAPE to understand how the model performs on the overall test dataset. We then print a comparison between actual sale prices and predicted sale prices.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Select the best Model and Measure Performance
best_model = grid_results.best_estimator_
y_pred = best_model.predict(x_test)
y_df = pd.DataFrame(y_test)
y_df['PredictedPrice']=y_pred
y_df.head()</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">	SalePrice	PredictedPrice
529	200624		166037.831002
491	133000		135860.757958
459	110000		123030.336177
279	192000		206488.444327
655	88000		130453.604206</pre></div>



<p>Next, let&#8217;s take a look at the classification errors.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_pred, y_test)
print('Mean Absolute Error (MAE): ' + str(np.round(MAE, 2)))

# Mean Absolute Percentage Error (MAPE)
MAPE = mean_absolute_percentage_error(y_pred, y_test)
print('Median Absolute Percentage Error (MAPE): ' + str(np.round(MAPE*100, 2)) + ' %')</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Mean Absolute Error (MAE): 29591.56 
Median Absolute Percentage Error (MAPE): 15.57 %</pre></div>



<p>On average, the model deviates from the actual value by 16 %. Considering we only used a fraction of the available features and defined a small search space, there is much room for improvement.</p>



<h2 class="wp-block-heading" id="h-summary">Summary</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>This article has shown how we can use grid Search in Python to efficiently search for the optimal hyperparameter configuration of a machine learning model. In the conceptual part, you learned about hyperparameters and how to use random search to try out all permutations of a predefined parameter grid. The second part was a Python hands-on tutorial, in which you learned to use random search to tune the hyperparameters of a regression model. We worked with a house price dataset and trained a random decision forest regressor that predicts the sale price for houses depending on several characteristics. Then we defined parameter ranges and tested random permutations. In this way, we quickly identified a configuration that outperforms our initial baseline model. </p>



<p>Remember that a random search efficiently identifies a good-performing model but does not necessarily return the best-performing one. Tech random search techniques can be used to tune the hyperparameters of both regression and classification models.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<p></p>



<h2 class="wp-block-heading">Sources and Further Reading</h2>



<p>I hope this article was helpful. If you have any questions or suggestions, please write them in the comments. </p>



<div style="display: inline-block;">
  <iframe sandbox="allow-popups allow-scripts allow-modals allow-forms allow-same-origin" style="width:120px;height:240px;" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" src="//ws-eu.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&amp;OneJS=1&amp;Operation=GetAdHtml&amp;MarketPlace=DE&amp;source=ss&amp;ref=as_ss_li_til&amp;ad_type=product_link&amp;tracking_id=flo7up-21&amp;language=de_DE&amp;marketplace=amazon&amp;region=DE&amp;placement=3030181162&amp;asins=3030181162&amp;linkId=669e46025028259138fbb5ccec12dfbe&amp;show_border=true&amp;link_opens_in_new_window=true"></iframe>
<iframe sandbox="allow-popups allow-scripts allow-modals allow-forms allow-same-origin" style="width:120px;height:240px;" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" src="//ws-eu.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&amp;OneJS=1&amp;Operation=GetAdHtml&amp;MarketPlace=DE&amp;source=ss&amp;ref=as_ss_li_til&amp;ad_type=product_link&amp;tracking_id=flo7up-21&amp;language=de_DE&amp;marketplace=amazon&amp;region=DE&amp;placement=1999579577&amp;asins=1999579577&amp;linkId=91d862698bf9010ff4c09539e4c49bf4&amp;show_border=true&amp;link_opens_in_new_window=true"></iframe>
<iframe sandbox="allow-popups allow-scripts allow-modals allow-forms allow-same-origin" style="width:120px;height:240px;" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" src="//ws-eu.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&amp;OneJS=1&amp;Operation=GetAdHtml&amp;MarketPlace=DE&amp;source=ss&amp;ref=as_ss_li_til&amp;ad_type=product_link&amp;tracking_id=flo7up-21&amp;language=de_DE&amp;marketplace=amazon&amp;region=DE&amp;placement=1839217715&amp;asins=1839217715&amp;linkId=356ba074068849ff54393f527190825d&amp;show_border=true&amp;link_opens_in_new_window=true"></iframe>
<iframe sandbox="allow-popups allow-scripts allow-modals allow-forms allow-same-origin" style="width:120px;height:240px;" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" src="//ws-eu.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&amp;OneJS=1&amp;Operation=GetAdHtml&amp;MarketPlace=DE&amp;source=ss&amp;ref=as_ss_li_til&amp;ad_type=product_link&amp;tracking_id=flo7up-21&amp;language=de_DE&amp;marketplace=amazon&amp;region=DE&amp;placement=1492032646&amp;asins=1492032646&amp;linkId=2214804dd039e7103577abd08722abac&amp;show_border=true&amp;link_opens_in_new_window=true"></iframe>
</div>



<p class="has-contrast-2-color has-base-3-background-color has-text-color has-background"><em>The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.</em></p>
<p>The post <a href="https://www.relataly.com/using-random-search-to-tune-the-hyperparameters-of-a-random-decision-forest-with-python/6875/">Using Random Search to Tune the Hyperparameters of a Random Decision Forest with Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.relataly.com/using-random-search-to-tune-the-hyperparameters-of-a-random-decision-forest-with-python/6875/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">6875</post-id>	</item>
		<item>
		<title>Stock Market Forecasting Neural Networks for Multi-Output Regression in Python</title>
		<link>https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/</link>
					<comments>https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/#comments</comments>
		
		<dc:creator><![CDATA[Florian Follonier]]></dc:creator>
		<pubDate>Tue, 13 Jul 2021 21:10:23 +0000</pubDate>
				<category><![CDATA[Finance]]></category>
		<category><![CDATA[Keras]]></category>
		<category><![CDATA[Neural Networks]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Recurrent Neural Networks]]></category>
		<category><![CDATA[Scikit-Learn]]></category>
		<category><![CDATA[Seaborn]]></category>
		<category><![CDATA[Stock Market Forecasting]]></category>
		<category><![CDATA[Time Series Forecasting]]></category>
		<category><![CDATA[Yahoo Finance API]]></category>
		<category><![CDATA[AI in Finance]]></category>
		<category><![CDATA[Deep Learning]]></category>
		<category><![CDATA[Intermediate Tutorials]]></category>
		<category><![CDATA[Multi-output Neural Network]]></category>
		<category><![CDATA[Multi-Step Time Series Forecasting]]></category>
		<category><![CDATA[Multivariate Models]]></category>
		<category><![CDATA[Stock Market Prediction]]></category>
		<category><![CDATA[Supervised Learning]]></category>
		<guid isPermaLink="false">https://www.relataly.com/?p=5800</guid>

					<description><![CDATA[<p>Multi-output time series regression can forecast several steps of a time series at once. The number of neurons in the final output layer determines how many steps the model can predict. Models with one output return single-step forecasts. Models with various outputs can return entire series of time steps and thus deliver a more detailed ... <a title="Stock Market Forecasting Neural Networks for Multi-Output Regression in Python" class="read-more" href="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/" aria-label="Read more about Stock Market Forecasting Neural Networks for Multi-Output Regression in Python">Read more</a></p>
<p>The post <a href="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/">Stock Market Forecasting Neural Networks for Multi-Output Regression in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Multi-output time series regression can forecast several steps of a time series at once. The number of neurons in the final output layer determines how many steps the model can predict. Models with one output return single-step forecasts. Models with various outputs can return entire series of time steps and thus deliver a more detailed projection of how a time series could develop in the future. This article is a hands-on Python tutorial that shows how to design a neural network architecture with multiple outputs. The goal is to create a multi-output model for stock-price forecasting using Python and Keras. By the end of this tutorial, you will have learned how to design a multi-output model for stock price forecasting using Python and Keras. This knowledge can be applied to other types of time series forecasting tasks, such as weather forecasting or sales forecasting.</p>



<p>This article proceeds as follows: We briefly discuss the architecture of a multi-output neural network. After familiarizing ourselves with the model architecture, we develop a Keras neural network for multi-output regression. For data preparation, we perform various steps, including cleaning, splitting, selecting, and scaling the data. Afterward, we define a model architecture with multiple LSTM layers and ten output neurons in the last layer. This architecture enables the model to generate projections for ten consecutive steps. After configuring the model architecture, we train the model with the historical daily prices of the Apple stock. Finally, we use this model to generate a ten-day forecast.</p>



<div class="wp-block-group"><div class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained">
<div class="wp-block-kadence-infobox kt-info-box_317393-a1"><span class="kt-blocks-info-box-link-wrap info-box-link kt-blocks-info-box-media-align-top kt-info-halign-left"><div class="kt-infobox-textcontent"><h2 class="kt-blocks-info-box-title">Disclaimer</h2><p class="kt-blocks-info-box-text">This article does not constitute financial advice. Stock markets can be very volatile and are generally difficult to predict. Predictive models and other forms of analytics applied in this article only serve the purpose of illustrating machine learning use cases.</p></div></span></div>
</div></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<div style="height:5px" aria-hidden="true" class="wp-block-spacer"></div>



<h2 class="wp-block-heading" id="h-multi-output-regression-vs-single-output-regression">Multi-Output Regression vs. Single-Output Regression</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>In time series regression, we train a statistical model on the past values of a time series to make statements about how the time series develops further. During model training, we feed the model with so-called mini-batches and the corresponding target values. The model then creates forecasts for all input batches and compares these predictions to the actual target values to calculate the residuals (prediction errors). In this way, the model can adjust its parameters iteratively and learn to make better predictions.</p>



<p>Multivariate forecasting models take into account multiple input variables, such as historical time series data and additional features like moving averages or momentum indicators, to improve the accuracy of their predictions. The idea is that these various variables can help the model identify patterns in the data that suggest future price movements.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"><div class="wp-block-image">
<figure class="alignright size-large is-resized"><img decoding="async" data-attachment-id="7569" data-permalink="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/multi-output-neural-networks-time-series-regression-architecture/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/04/multi-output-neural-networks-time-series-regression-architecture.png" data-orig-size="2017,1342" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="multi-output-neural-networks-time-series-regression-architecture" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/04/multi-output-neural-networks-time-series-regression-architecture.png" src="https://www.relataly.com/wp-content/uploads/2022/04/multi-output-neural-networks-time-series-regression-architecture-1024x681.png" alt="An exemplary architecture of a neural network with five input neurons (blue) and four output neurons (red), keras, python, tutorial, stock market prediction" class="wp-image-7569" width="371" height="247" srcset="https://www.relataly.com/wp-content/uploads/2022/04/multi-output-neural-networks-time-series-regression-architecture.png 1024w, https://www.relataly.com/wp-content/uploads/2022/04/multi-output-neural-networks-time-series-regression-architecture.png 300w, https://www.relataly.com/wp-content/uploads/2022/04/multi-output-neural-networks-time-series-regression-architecture.png 768w, https://www.relataly.com/wp-content/uploads/2022/04/multi-output-neural-networks-time-series-regression-architecture.png 1536w, https://www.relataly.com/wp-content/uploads/2022/04/multi-output-neural-networks-time-series-regression-architecture.png 2017w" sizes="(max-width: 371px) 100vw, 371px" /><figcaption class="wp-element-caption">An exemplary architecture of a neural network with five input neurons (blue) and four output neurons (red)</figcaption></figure>
</div></div>
</div>



<h2 class="wp-block-heading">The Architecture of a Neural Network with Multiple Outputs</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Next, we will discuss the architecture of a neural network with multiple outputs. The architecture consists of several layers, including an input layer, several hidden layers, and an output layer. The number of neurons in the first layer must match the input data, and the number of neurons in the output layer determines the period length of the predictions. </p>



<p>Models with a single neuron in the output layer are used to predict a single time step. It is possible to predict multiple price steps with a single-output model. It requires a <a href="https://www.relataly.com/multi-step-time-series-forecasting-a-step-by-step-guide/275/" target="_blank" rel="noreferrer noopener">rolling forecasting approach</a> in which the outputs are iteratively reused to make further-reaching predictions. However, this way is somewhat cumbersome. A more elegant way is to train a multi-output model right away.</p>



<figure class="wp-block-image size-large is-resized is-style-default"><img decoding="async" data-attachment-id="7586" data-permalink="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/architecture-neural-network-multi-output-regression-model/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/04/architecture-neural-network-multi-output-regression-model.png" data-orig-size="3395,1503" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="architecture-neural-network-multi-output-regression-model" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/04/architecture-neural-network-multi-output-regression-model.png" src="https://www.relataly.com/wp-content/uploads/2022/04/architecture-neural-network-multi-output-regression-model-1024x453.png" alt="The inputs and outputs of a neural network for time series regression with five input neurons and four outputs. Stock market forecasting" class="wp-image-7586" width="755" height="334" srcset="https://www.relataly.com/wp-content/uploads/2022/04/architecture-neural-network-multi-output-regression-model.png 1024w, https://www.relataly.com/wp-content/uploads/2022/04/architecture-neural-network-multi-output-regression-model.png 300w, https://www.relataly.com/wp-content/uploads/2022/04/architecture-neural-network-multi-output-regression-model.png 768w, https://www.relataly.com/wp-content/uploads/2022/04/architecture-neural-network-multi-output-regression-model.png 1536w, https://www.relataly.com/wp-content/uploads/2022/04/architecture-neural-network-multi-output-regression-model.png 2048w, https://www.relataly.com/wp-content/uploads/2022/04/architecture-neural-network-multi-output-regression-model.png 2475w" sizes="(max-width: 755px) 100vw, 755px" /><figcaption class="wp-element-caption">The inputs and outputs of a neural network for time series regression with five input neurons and four outputs</figcaption></figure>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<p> </p>



<h2 class="wp-block-heading">Training Neural Networks with Multiple Outputs</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>A model with multiple neurons in the output layer can predict numerous steps once per batch. Multi-output regression models train on many sequences of subsequent values, followed by the consecutive output sequence. The model architecture thus contains multiple neurons in the initial layer and various neurons in the output layer (as illustrated). </p>



<p>In a multi-output regression model, each neuron in the output layer is responsible for predicting a different time step in the future. To train such a model, you need to provide a sequence of input data followed by the corresponding sequence of output data. For example, if you want to predict the stock price for the next ten days, you would provide a sequence of input data containing the historical stock prices for the past 50 days, followed by a sequence of output data containing the stock prices for the next 10 days.</p>



<p>The model will then learn to map the input sequence to the output sequence so that it can make predictions for multiple time steps in the future based on the input data. </p>



<p>In the next part of this tutorial, we will walk through the process of developing a multi-output regression model in more detail.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h2 class="wp-block-heading has-contrast-color has-text-color" id="h-implementing-a-neural-network-model-for-multi-output-multi-step-regression-in-python">Implementing a Neural Network Model for Multi-Output Multi-Step Regression in Python</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Let&#8217;s get started with the hands-on Python part. In the following, we will develop a neural network with Keras and Tensorflow that forecasts the Apple stock price. To prepare the data for a neural network with multiple outputs in time series forecasting, we will spend the most time preparing it and bringing it into the right shape. Broadly this involves the following steps:</p>



<ol class="wp-block-list">
<li>Load the time series data that we want to use as input and output for your model. We use historical price data that is available via the yahoo finance API.</li>



<li>Then we split our data into training and testing sets. We will use the training set to fit the model and the testing set to evaluate the model&#8217;s performance.</li>



<li>Preprocess the data: This includes scaling the data and selecting relevant features.</li>



<li>Reshape the data and bring them into a format that can be input into the neural network. This involves converting the data into a 3D array for time series data.</li>



<li>Finally, we will train our model and generate the forecasting.</li>
</ol>



<p>The code is available on the GitHub repository.</p>



<div class="wp-block-kadence-advancedbtn kb-buttons-wrap kb-btns_e5fd46-d1"><a class="kb-button kt-button button kb-btn_25b4a6-dd kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit  kt-btn-has-text-true kt-btn-has-svg-true  wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-tutorials/blob/master/01%20Time%20Series%20Forecasting%20%26%20Regression/006%20Multi-Output%20Regression.ipynb" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fe_eye kt-btn-icon-side-left"><svg viewBox="0 0 24 24"  fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M1 12s4-8 11-8 11 8 11 8-4 8-11 8-11-8-11-8z"/><circle cx="12" cy="12" r="3"/></svg></span><span class="kt-btn-inner-text">View on GitHub </span></a>

<a class="kb-button kt-button button kb-btn_3ee95e-9c kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit  kt-btn-has-text-true kt-btn-has-svg-true  wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-API-tutorials" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fa_github kt-btn-icon-side-left"><svg viewBox="0 0 496 512"  fill="currentColor" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg></span><span class="kt-btn-inner-text">Relataly GitHub Repo </span></a></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-large"><img decoding="async" width="512" height="337" data-attachment-id="12797" data-permalink="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/multiple_waterfalls_coming_from_a_single_waterfall/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/03/multiple_waterfalls_coming_from_a_single_waterfall.png" data-orig-size="768,505" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="multiple_waterfalls_coming_from_a_single_waterfall" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/multiple_waterfalls_coming_from_a_single_waterfall.png" src="https://www.relataly.com/wp-content/uploads/2023/03/multiple_waterfalls_coming_from_a_single_waterfall-512x337.png" alt="Neural network architectures with multiple outputs allow for more potent solutions but are more complex to train. Image created with Midjourney. Stock market forecasting, multi-output multi-step  regression, python" class="wp-image-12797" srcset="https://www.relataly.com/wp-content/uploads/2023/03/multiple_waterfalls_coming_from_a_single_waterfall.png 512w, https://www.relataly.com/wp-content/uploads/2023/03/multiple_waterfalls_coming_from_a_single_waterfall.png 300w, https://www.relataly.com/wp-content/uploads/2023/03/multiple_waterfalls_coming_from_a_single_waterfall.png 768w" sizes="(max-width: 512px) 100vw, 512px" /><figcaption class="wp-element-caption">Neural network architectures with multiple outputs allow for more potent solutions but are more complex to train. Image created with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a>.</figcaption></figure>



<p></p>
</div>
</div>



<h3 class="wp-block-heading" id="h-prerequisites">Prerequisites</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Before beginning the coding part, ensure that you have set up your Python 3 environment and required packages. If you don&#8217;t have a Python environment, consider <a href="https://www.anaconda.com/products/individual" target="_blank" rel="noreferrer noopener">Anaconda</a>. To set it up, you can follow the steps in&nbsp;<a href="https://www.relataly.com/category/data-science/setup-anaconda-environment/" target="_blank" rel="noreferrer noopener">this tutorial</a>.</p>



<p>Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:&nbsp;</p>



<ul class="wp-block-list">
<li><em><a href="https://pandas.pydata.org/" target="_blank" rel="noreferrer noopener">pandas</a></em></li>



<li><em><a href="https://numpy.org/" target="_blank" rel="noreferrer noopener">NumPy</a></em></li>



<li><em><a href="https://matplotlib.org/" target="_blank" rel="noreferrer noopener">matplotlib</a></em></li>
</ul>



<p>In addition, we will be using the machine learning libraries Keras, Scikit-learn, and Tensorflow. For visualization, we will be using the Seaborn package.</p>



<p>Please also have either the <a href="https://pandas-datareader.readthedocs.io/en/latest/" target="_blank" rel="noreferrer noopener">pandas_datareader</a> or the <a href="https://pypi.org/project/yfinance/" target="_blank" rel="noreferrer noopener">yfinance</a> package installed. You will use one of these packages to retrieve the historical stock quotes.</p>



<p>You can install these packages using console commands:</p>



<ul class="wp-block-list">
<li><em>pip install &lt;package name&gt;</em></li>



<li><em>conda install &lt;package name&gt;</em>&nbsp;(if you are using the anaconda packet manager)</li>
</ul>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h3 class="wp-block-heading" id="h-step-1-load-the-data">Step #1: Load the Data</h3>



<p>The Pandas DataReader library is our first choice for interacting with the yahoo finance API. If the library causes a problem (it sometimes does), you can also use the yfinance package, which should return the same data. We begin by loading historical price quotes of the Apple stock from the public yahoo finance API. Running the code below will load the data into a Pandas DataFrame.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># import pandas_datareader as webreader # Remote data access for pandas
import math # Mathematical functions 
import numpy as np # Fundamental package for scientific computing with Python
import pandas as pd # Additional functions for analysing and manipulating data
from datetime import date, timedelta, datetime # Date Functions
from pandas.plotting import register_matplotlib_converters # This function adds plotting functions for calender dates
import matplotlib.pyplot as plt # Important package for visualization - we use this to plot the market data
import matplotlib.dates as mdates # Formatting dates
from sklearn.metrics import mean_absolute_error, mean_squared_error # Packages for measuring model performance / errors
from keras.models import Sequential # Deep learning library, used for neural networks
from keras.layers import LSTM, Dense, Dropout # Deep learning classes for recurrent and regular densely-connected layers
from keras.callbacks import EarlyStopping # EarlyStopping during model training
from sklearn.preprocessing import RobustScaler, MinMaxScaler # This Scaler removes the median and scales the data according to the quantile range to normalize the price data 
import seaborn as sns

# from pandas_datareader.nasdaq_trader import get_nasdaq_symbols
# symbols = get_nasdaq_symbols()

# Setting the timeframe for the data extraction
today = date.today()
date_today = today.strftime(&quot;%Y-%m-%d&quot;)
date_start = '2010-01-01'

# Getting NASDAQ quotes
stockname = 'Apple'
symbol = 'AAPL'
# df = webreader.DataReader(
#     symbol, start=date_start, end=date_today, data_source=&quot;yahoo&quot;
# )

import yfinance as yf #Alternative package if webreader does not work: pip install yfinance
df = yf.download(symbol, start=date_start, end=date_today)

# # Create a quick overview of the dataset
df.head()</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Tensorflow Version: 2.6.0
Num GPUs: 1
[*********************100%***********************]  1 of 1 completed
			Open		High		Low			Close		Adj Close	Volume
Date						
2010-01-04	7.622500	7.660714	7.585000	7.643214	6.515213	493729600
2010-01-05	7.664286	7.699643	7.616071	7.656429	6.526477	601904800
2010-01-06	7.656429	7.686786	7.526786	7.534643	6.422666	552160000
2010-01-07	7.562500	7.571429	7.466071	7.520714	6.410791	477131200
2010-01-08	7.510714	7.571429	7.466429	7.570714	6.453413	447610800</pre></div>



<p>The data should comprise the following columns:</p>



<ul class="wp-block-list">
<li>Close</li>



<li>Open</li>



<li>High</li>



<li>Low</li>



<li>Adj Close</li>



<li>Volume</li>
</ul>



<p>The target variable that we are trying to predict is the Closing price (Close).</p>



<h3 class="wp-block-heading">Step #2: Explore the Data</h3>



<p>Once we have loaded the data, we print a quick overview of the time-series data using different line graphs. The following code will plot a line chart for each column in df_plot using the <code>seaborn</code> library. The charts will be organized in a grid with nrows number of rows and ncols number of columns. The sharex parameter is set to True, which means that the x-axes of the subplots will be shared. The figsize parameter determines the size of the plot in inches.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Plot line charts
df_plot = df.copy()

ncols = 2
nrows = int(round(df_plot.shape[1] / ncols, 0))

fig, ax = plt.subplots(nrows=nrows, ncols=ncols, sharex=True, figsize=(14, 7))
for i, ax in enumerate(fig.axes):
        sns.lineplot(data = df_plot.iloc[:, i], ax=ax)
        ax.tick_params(axis=&quot;x&quot;, rotation=30, labelsize=10, length=0)
        ax.xaxis.set_major_locator(mdates.AutoDateLocator())
fig.tight_layout()
plt.show()</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="5805" data-permalink="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/image-5-14/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2021/07/image-5.png" data-orig-size="999,496" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-5" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2021/07/image-5.png" src="https://www.relataly.com/wp-content/uploads/2021/07/image-5.png" alt="The Apple Stock's historical price data, including quotes, highs, lows, and volume" class="wp-image-5805" width="817" height="406" srcset="https://www.relataly.com/wp-content/uploads/2021/07/image-5.png 999w, https://www.relataly.com/wp-content/uploads/2021/07/image-5.png 300w, https://www.relataly.com/wp-content/uploads/2021/07/image-5.png 768w" sizes="(max-width: 817px) 100vw, 817px" /></figure>



<p>The line plots look as expected and reflect the Apple stock price history. Because we are fetching daily data from an API, please note that the lineplots will look different depending on when you run the code. </p>



<h3 class="wp-block-heading" id="h-step-3-preprocess-the-data">Step #3: Preprocess the Data</h3>



<p>Next, we prepare the data for the training process of our multi-output forecasting model. Preparing the data for multivariate forecasting involves several steps: </p>



<ul class="wp-block-list">
<li>Selecting features for model training</li>



<li>Scaling and splitting the data into separate sets for training and testing</li>



<li>Slicing the time series into several shifted training batches</li>
</ul>



<p>Remember that the steps are specific to our data and the use case. The steps required to prepare the data for a neural network with multiple outputs in time series forecasting will depend on the characteristics of your data and the requirements of your model. It is essential to consider these factors and tailor your data preparation accordingly and carefully.</p>



<h4 class="wp-block-heading">3.1 Basic Preparations</h4>



<p>We begin by creating a copy of the initial data and resetting the index.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Indexing Batches
df_train = df.sort_values(by=['Date']).copy()

# We safe a copy of the dates index, before we need to reset it to numbers
date_index = df_train.index

# We reset the index, so we can convert the date-index to a number-index
df_train = df_train.reset_index(drop=True).copy()
df_train.head(5)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">			Open		High		Low			Close		Adj Close	Volume
Date						
2022-11-29	144.289993	144.809998	140.350006	141.169998	141.169998	83763800
2022-11-30	141.399994	148.720001	140.550003	148.029999	148.029999	111224400
2022-12-01	148.210007	149.130005	146.610001	148.309998	148.309998	71250400
2022-12-02	145.960007	148.000000	145.649994	147.809998	147.809998	65421400
2022-12-05	147.770004	150.919998	145.770004	146.630005	146.630005	68732400</pre></div>



<h4 class="wp-block-heading" id="h-3-2-feature-selection-and-scaling">3.2 Feature Selection and Scaling</h4>



<p>We proceed with feature selection. To keep things simple, we will use the features from the input data without any modifications. After selecting the features, we scale them to a range between 0 and 1. To ease unscaling the predictions after training, we create two different scalers: One for the training data, which takes five columns, and one for the output data that scales a single column (the Close Price). I have covered <a href="https://www.relataly.com/feature-engineering-for-multivariate-time-series-models-with-python/1813/" target="_blank" rel="noreferrer noopener">feature engineering in a separate article</a> if you want to learn more about this topic.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">def prepare_data(df):

    # List of considered Features
    FEATURES = ['Open', 'High', 'Low', 'Close', 'Volume']

    print('FEATURE LIST')
    print([f for f in FEATURES])

    # Create the dataset with features and filter the data to the list of FEATURES
    df_filter = df[FEATURES]
    
    # Convert the data to numpy values
    np_filter_unscaled = np.array(df_filter)
    #np_filter_unscaled = np.reshape(np_unscaled, (df_filter.shape[0], -1))
    print(np_filter_unscaled.shape)

    np_c_unscaled = np.array(df['Close']).reshape(-1, 1)
    
    return np_filter_unscaled, np_c_unscaled, df_filter
    
np_filter_unscaled, np_c_unscaled, df_filter = prepare_data(df_train)
                                          
# Creating a separate scaler that works on a single column for scaling predictions
# Scale each feature to a range between 0 and 1
scaler_train = MinMaxScaler()
np_scaled = scaler_train.fit_transform(np_filter_unscaled)
    
# Create a separate scaler for a single column
scaler_pred = MinMaxScaler()
np_scaled_c = scaler_pred.fit_transform(np_c_unscaled)   </pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">FEATURE LIST
['Open', 'High', 'Low', 'Close', 'Volume']
(3254, 5)</pre></div>



<p>The final step of the data preparation is to create the structure for the input data. This structure needs to match the input layer of the model architecture.</p>



<h4 class="wp-block-heading">3.3 Slicing the Data for a Model with Multiple In- and Outputs</h4>



<p>The code below starts a sliding window process that cuts the initial time series data into multiple slices, i.e., mini-batches. Each batch is a smaller fraction of the initial time series shifted by a single step. Because we will feed our model with multivariate input data, the time series consists of five input columns/features. Each batch comprises a period of 50 steps from the time series and an output sequence of ten consecutive values. To validate that the batches have the right shape, we visualize mini-batches in a line graph with their consecutive target values. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Set the input_sequence_length length - this is the timeframe used to make a single prediction
input_sequence_length = 50
# The output sequence length is the number of steps that the neural network predicts
output_sequence_length = 10 #

# Prediction Index
index_Close = df_train.columns.get_loc(&quot;Close&quot;)

# Split the training data into train and train data sets
# As a first step, we get the number of rows to train the model on 80% of the data 
train_data_length = math.ceil(np_scaled.shape[0] * 0.8)

# Create the training and test data
train_data = np_scaled[:train_data_length, :]
test_data = np_scaled[train_data_length - input_sequence_length:, :]

# The RNN needs data with the format of [samples, time steps, features]
# Here, we create N samples, input_sequence_length time steps per sample, and f features
def partition_dataset(input_sequence_length, output_sequence_length, data):
    x, y = [], []
    data_len = data.shape[0]
    for i in range(input_sequence_length, data_len - output_sequence_length):
        x.append(data[i-input_sequence_length:i,:]) #contains input_sequence_length values 0-input_sequence_length * columns
        y.append(data[i:i + output_sequence_length, index_Close]) #contains the prediction values for validation (3rd column = Close),  for single-step prediction
    
    # Convert the x and y to numpy arrays
    x = np.array(x)
    y = np.array(y)
    return x, y

# Generate training data and test data
x_train, y_train = partition_dataset(input_sequence_length, output_sequence_length, train_data)
x_test, y_test = partition_dataset(input_sequence_length, output_sequence_length, test_data)

# Print the shapes: the result is: (rows, training_sequence, features) (prediction value, )
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# Validate that the prediction value and the input match up
# The last close price of the second input sample should equal the first prediction value
nrows = 3 # number of shifted plots
fig, ax = plt.subplots(nrows=nrows, ncols=1, figsize=(16, 8))
for i, ax in enumerate(fig.axes):
    xtrain = pd.DataFrame(x_train[i][:,index_Close], columns={f'x_train_{i}'})
    ytrain = pd.DataFrame(y_train[i][:output_sequence_length-1], columns={f'y_train_{i}'})
    ytrain.index = np.arange(input_sequence_length, input_sequence_length + output_sequence_length-1)
    xtrain_ = pd.concat([xtrain, ytrain[:1].rename(columns={ytrain.columns[0]:xtrain.columns[0]})])
    df_merge = pd.concat([xtrain_, ytrain])
    sns.lineplot(data = df_merge, ax=ax)
plt.show</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">(2544, 50, 5) (2544, 10)
(640, 50, 5) (640, 10)
&lt;function matplotlib.pyplot.show(close=None, block=None)&gt;</pre></div>



<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="8670" data-permalink="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/batch-test-visualizations-deep-neural-networks-for-multi-output-stock-market-forecasting/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/batch-test-visualizations-deep-neural-networks-for-multi-output-stock-market-forecasting.png" data-orig-size="939,465" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="batch-test-visualizations-deep-neural-networks-for-multi-output-stock-market-forecasting" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/batch-test-visualizations-deep-neural-networks-for-multi-output-stock-market-forecasting.png" src="https://www.relataly.com/wp-content/uploads/2022/05/batch-test-visualizations-deep-neural-networks-for-multi-output-stock-market-forecasting.png" alt="batch test visualizations, deep neural networks for multi-output stock market forecasting" class="wp-image-8670" width="891" height="441" srcset="https://www.relataly.com/wp-content/uploads/2022/05/batch-test-visualizations-deep-neural-networks-for-multi-output-stock-market-forecasting.png 939w, https://www.relataly.com/wp-content/uploads/2022/05/batch-test-visualizations-deep-neural-networks-for-multi-output-stock-market-forecasting.png 300w, https://www.relataly.com/wp-content/uploads/2022/05/batch-test-visualizations-deep-neural-networks-for-multi-output-stock-market-forecasting.png 768w" sizes="(max-width: 891px) 100vw, 891px" /></figure>



<h3 class="wp-block-heading" id="h-step-4-prepare-the-neural-network-architecture-and-train-the-multi-output-regression-model">Step #4: Prepare the Neural Network Architecture and Train the Multi-Output Regression Model</h3>



<p>Now that we have the training data prepared and ready, the next step is to configure the architecture of the multi-out neural network. Because we will be using multiple input series, our model is, in fact, a multivariate architecture so that it corresponds to the input training batches. </p>



<h4 class="wp-block-heading" id="h-4-1-configuring-and-training-the-model">4.1 Configuring and Training the Model</h4>



<p>We choose a comparably simple architecture with only two LSTM layers and two additional dense layers. The first dense layer has 20 neurons, and the second layer is the output layer, which has ten output neurons. If you wonder how I got to the number of neurons in the third layer, I conducted several experiments and found that this number leads to solid results. </p>



<p>To ensure that the architecture matches our input data&#8217;s structure, we reuse the variables for the previous code section (n_input_neurons, n_output_neurons. The input sequence length is 50, and the output sequence (the steps for the period we want to predict) is ten.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Configure the neural network model
model = Sequential()
n_output_neurons = output_sequence_length

# Model with n_neurons = inputshape Timestamps, each with x_train.shape[2] variables
n_input_neurons = x_train.shape[1] * x_train.shape[2]
print(n_input_neurons, x_train.shape[1], x_train.shape[2])
model.add(LSTM(n_input_neurons, return_sequences=True, input_shape=(x_train.shape[1], x_train.shape[2]))) 
model.add(LSTM(n_input_neurons, return_sequences=False))
model.add(Dense(20))
model.add(Dense(n_output_neurons))

# Compile the model
model.compile(optimizer='adam', loss='mse')</pre></div>



<p>After configuring the model architecture, we can initiate the training process and illustrate how the loss develops over the training epochs. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Training the model
epochs = 10
batch_size = 16
early_stop = EarlyStopping(monitor='loss', patience=5, verbose=1)
history = model.fit(x_train, y_train, 
                    batch_size=batch_size, 
                    epochs=epochs,
                    validation_data=(x_test, y_test)
                   )
                    
                    #callbacks=[early_stop])</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Epoch 1/5
159/159 [==============================] - 7s 14ms/step - loss: 0.0047 - val_loss: 0.0262
Epoch 2/5
159/159 [==============================] - 2s 11ms/step - loss: 3.6759e-04 - val_loss: 0.0097
Epoch 3/5
159/159 [==============================] - 2s 11ms/step - loss: 1.5222e-04 - val_loss: 0.0056
Epoch 4/5
159/159 [==============================] - 2s 11ms/step - loss: 1.0327e-04 - val_loss: 0.0031
Epoch 5/5
159/159 [==============================] - 2s 11ms/step - loss: 1.1690e-04 - val_loss: 0.0026</pre></div>



<h4 class="wp-block-heading" id="h-4-2-loss-curve">4.2 Loss Curve</h4>



<p>Next, we plot the loss curve, which represents the amount of error between the model&#8217;s predicted values and the actual values in the training data. A lower loss value indicates that the model makes more accurate predictions on the training data. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Plot training &amp; validation loss values
fig, ax = plt.subplots(figsize=(10, 5), sharex=True)
plt.plot(history.history[&quot;loss&quot;])
plt.title(&quot;Model loss&quot;)
plt.ylabel(&quot;Loss&quot;)
plt.xlabel(&quot;Epoch&quot;)
ax.xaxis.set_major_locator(plt.MaxNLocator(epochs))
plt.legend([&quot;Train&quot;, &quot;Test&quot;], loc=&quot;upper left&quot;)
plt.grid()
plt.show()</pre></div>



<figure class="wp-block-image size-full is-resized is-style-default"><img decoding="async" data-attachment-id="5856" data-permalink="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/image-4-16/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2021/08/image-4.png" data-orig-size="628,333" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-4" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2021/08/image-4.png" src="https://www.relataly.com/wp-content/uploads/2021/08/image-4.png" alt="loss curve after training the multi-output neural network, keras, multi-step time series regression" class="wp-image-5856" width="720" height="381" srcset="https://www.relataly.com/wp-content/uploads/2021/08/image-4.png 628w, https://www.relataly.com/wp-content/uploads/2021/08/image-4.png 300w" sizes="(max-width: 720px) 100vw, 720px" /></figure>



<p>As we can see, the loss curve drops quickly during training, which typically means that the model is quickly learning to make accurate predictions.</p>



<h3 class="wp-block-heading" id="h-step-5-evaluate-model-performance">Step #5 Evaluate Model Performance</h3>



<p>Now that we have trained the model, we can make forecasts on the test data and use traditional regression metrics such as the MAE, MAPE, or MDAPE to measure the performance of our model. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Get the predicted values
y_pred_scaled = model.predict(x_test)

# Unscale the predicted values
y_pred = scaler_pred.inverse_transform(y_pred_scaled)
y_test_unscaled = scaler_pred.inverse_transform(y_test).reshape(-1, output_sequence_length)

# Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_unscaled, y_pred)
print(f'Median Absolute Error (MAE): {np.round(MAE, 2)}')

# Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled))) * 100
print(f'Mean Absolute Percentage Error (MAPE): {np.round(MAPE, 2)} %')

# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(y_test_unscaled, y_pred)/ y_test_unscaled)) ) * 100
print(f'Median Absolute Percentage Error (MDAPE): {np.round(MDAPE, 2)} %')


def prepare_df(i, x, y, y_pred_unscaled):
    # Undo the scaling on x, reshape the testset into a one-dimensional array, so that it fits to the pred scaler
    x_test_unscaled_df = pd.DataFrame(scaler_pred.inverse_transform((x[i]))[:,index_Close]).rename(columns={0:'x_test'})
    
    y_test_unscaled_df = []
    # Undo the scaling on y
    if type(y) == np.ndarray:
        y_test_unscaled_df = pd.DataFrame(scaler_pred.inverse_transform(y)[i]).rename(columns={0:'y_test'})

    # Create a dataframe for the y_pred at position i, y_pred is already unscaled
    y_pred_df = pd.DataFrame(y_pred_unscaled[i]).rename(columns={0:'y_pred'})
    return x_test_unscaled_df, y_pred_df, y_test_unscaled_df


def plot_multi_test_forecast(x_test_unscaled_df, y_test_unscaled_df, y_pred_df, title): 
    # Package y_pred_unscaled and y_test_unscaled into a dataframe with columns pred and true   
    if type(y_test_unscaled_df) == pd.core.frame.DataFrame:
        df_merge = y_pred_df.join(y_test_unscaled_df, how='left')
    else:
        df_merge = y_pred_df.copy()
    
    # Merge the dataframes 
    df_merge_ = pd.concat([x_test_unscaled_df, df_merge]).reset_index(drop=True)
    
    # Plot the linecharts
    fig, ax = plt.subplots(figsize=(20, 8))
    plt.title(title, fontsize=12)
    ax.set(ylabel = stockname + &quot;_stock_price_quotes&quot;)
    sns.lineplot(data = df_merge_, linewidth=2.0, ax=ax)

# Creates a linechart for a specific test batch_number and corresponding test predictions
batch_number = 50
x_test_unscaled_df, y_pred_df, y_test_unscaled_df = prepare_df(i, x_test, y_test, y_pred)
title = f&quot;Predictions vs y_test - test batch number {batch_number}&quot;
plot_multi_test_forecast(x_test_unscaled_df, y_test_unscaled_df, y_pred_df, title) </pre></div>



<figure class="wp-block-image size-large is-style-default"><img decoding="async" width="1024" height="419" data-attachment-id="8667" data-permalink="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/multioutput-regression-deep-neural-networks-test-predictions/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks-test-predictions.png" data-orig-size="1170,479" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="multioutput-regression-deep-neural-networks-test-predictions" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks-test-predictions.png" src="https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks-test-predictions-1024x419.png" alt="A line chart showing predictions and test data, multioutput regression deep neural networks test predictions" class="wp-image-8667" srcset="https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks-test-predictions.png 1024w, https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks-test-predictions.png 300w, https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks-test-predictions.png 768w, https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks-test-predictions.png 1170w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>The quality of the predictions is acceptable, considering that this tutorial aimed not to achieve excellent predictions but to demonstrate the process and architecture of training a multi-output regression. So, there is certainly room for improvement. Feel free to experiment with different features or try other hyperparameters and neural network layers.</p>



<h3 class="wp-block-heading" id="h-step-6-create-a-new-forecast">Step #6 Create a New Forecast</h3>



<p>Finally, let&#8217;s create a forecast on a new dataset. We take the scaled dataset from section 2 (np_scaled) and extract a series with the latest 50 values. The data is reshaped into a 3D array with shape (1, 50, 5) to match the expected input shape of the model. We use these values to generate a new prediction for the next ten days using the predict method. We store the result in the y_pred_scaled variable. In addition, we need to transform the predictions back to the original scale. We do this by using the inverse_transform method of the scaler_pred object, which was fit on the training data. Finally, we visualize the multi-step forecast in another line chart. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Get the latest input batch from the test dataset, which is contains the price values for the last ten trading days
x_test_latest_batch = np_scaled[-50:,:].reshape(1,50,5)

# Predict on the batch
y_pred_scaled = model.predict(x_test_latest_batch)
y_pred_unscaled = scaler_pred.inverse_transform(y_pred_scaled)

# Prepare the data and plot the input data and the predictions
x_test_unscaled_df, y_test_unscaled_df, _ = prepare_df(0, x_test_latest_batch, '', y_pred_unscaled)
plot_multi_test_forecast(x_test_unscaled_df, '', y_test_unscaled_df, &quot;x_new Vs. y_new_pred&quot;)</pre></div>



<figure class="wp-block-image size-large is-style-default"><img decoding="async" width="1024" height="420" data-attachment-id="8666" data-permalink="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/multioutput-regression-deep-neural-networks/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks.png" data-orig-size="1167,479" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="multioutput-regression-deep-neural-networks" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks.png" src="https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks-1024x420.png" alt="multioutput regression deep neural networks - x_text and y_pred" class="wp-image-8666" srcset="https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks.png 1024w, https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks.png 300w, https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks.png 768w, https://www.relataly.com/wp-content/uploads/2022/05/multioutput-regression-deep-neural-networks.png 1167w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p></p>



<h2 class="wp-block-heading" id="h-summary">Summary</h2>



<p>In this tutorial, we demonstrated how to use multiple output neural networks to make predictions at different time steps. We first discussed the architecture of a recurrent neural network and how it can be used to process sequential data. We then showed how to properly preprocess the data and split it into training and test sets for training a multi-output regression model.</p>



<p>Next, we trained a model to predict the stock price of Apple ten steps into the future using historical data. We also discussed how to use the trained model to make multi-step predictions on new data and how to visualize the results.</p>



<p>To further improve the performance of the model, you can experiment with different hyperparameters and adjust the model architecture. For example, adding more neurons to the output layers will increase the prediction horizon, but remember that prediction error will also increase as the horizon lengthens. You can also try using different activation functions or adding more layers to the model to see how it affects the performance.</p>



<p>I hope this article was helpful in understanding multi-output neural networks better. If you have any questions or comments, please let me know.</p>



<h2 class="wp-block-heading" id="h-sources-and-further-reading">Sources and Further Reading</h2>



<ol class="wp-block-list">
<li><a href="https://amzn.to/3MyU6Tj" target="_blank" rel="noreferrer noopener">Charu C. Aggarwal (2018) Neural Networks and Deep Learning</a></li>



<li><a href="https://amzn.to/3yIQdWi" target="_blank" rel="noreferrer noopener">Jansen (2020) Machine Learning for Algorithmic Trading: Predictive models to extract signals from market and alternative data for systematic trading strategies with Python</a></li>



<li><a href="https://amzn.to/3S9Nfkl" target="_blank" rel="noreferrer noopener">Aurélien Géron (2019) Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems </a></li>



<li><a href="https://amzn.to/3EKidwE" target="_blank" rel="noreferrer noopener">David Forsyth (2019) Applied Machine Learning Springer</a></li>



<li><a href="https://amzn.to/3MAy8j5" target="_blank" rel="noreferrer noopener">Andriy Burkov (2020) Machine Learning Engineering</a></li>
</ol>



<p class="has-contrast-2-color has-base-3-background-color has-text-color has-background"><em>The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.</em></p>



<p>If you want to learn about an alternative approach to univariate stock market forecasting, consider taking a look <a href="https://www.relataly.com/time-series-forecasting-using-facebook-prophet-in-python/10351/" target="_blank" rel="noreferrer noopener">at Facebook Prophet</a> or <a href="https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/" target="_blank" rel="noreferrer noopener">ARIMA models</a></p>
<p>The post <a href="https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/">Stock Market Forecasting Neural Networks for Multi-Output Regression in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.relataly.com/stock-price-prediction-multi-output-regression-using-neural-networks-in-python/5800/feed/</wfw:commentRss>
			<slash:comments>31</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">5800</post-id>	</item>
		<item>
		<title>Predictive Policing: Preventing Crime in San Francisco using XGBoost and Python</title>
		<link>https://www.relataly.com/predicting-crimes-in-san-francisco-creatingsf-crime-map-using-xgboost/2960/</link>
					<comments>https://www.relataly.com/predicting-crimes-in-san-francisco-creatingsf-crime-map-using-xgboost/2960/#respond</comments>
		
		<dc:creator><![CDATA[Florian Follonier]]></dc:creator>
		<pubDate>Sun, 07 Mar 2021 16:16:19 +0000</pubDate>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Classification (multi-class)]]></category>
		<category><![CDATA[Decision Trees]]></category>
		<category><![CDATA[Fighting Crime]]></category>
		<category><![CDATA[Gradient Boosting]]></category>
		<category><![CDATA[Insurance]]></category>
		<category><![CDATA[Kaggle Competitions]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[mplleaflet]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Random Decision Forests]]></category>
		<category><![CDATA[Scikit-Learn]]></category>
		<category><![CDATA[Seaborn]]></category>
		<category><![CDATA[Classic Machine Learning]]></category>
		<category><![CDATA[Crime Data]]></category>
		<category><![CDATA[Geographic Maps]]></category>
		<category><![CDATA[Intermediate Tutorials]]></category>
		<category><![CDATA[Kaggle]]></category>
		<category><![CDATA[Multivariate Models]]></category>
		<category><![CDATA[Smart City]]></category>
		<category><![CDATA[Spatial Data]]></category>
		<category><![CDATA[Supervised Learning]]></category>
		<guid isPermaLink="false">https://www.relataly.com/?p=2960</guid>

					<description><![CDATA[<p>In this tutorial, we&#8217;ll be using machine learning to predict and map out crime in San Francisco. We&#8217;ll be working with a dataset from Kaggle that contains information on 39 different types of crimes, including everything from vehicle theft to drug offenses. Using Python and the powerful Scikit-Learn library, we&#8217;ll train a classification model using ... <a title="Predictive Policing: Preventing Crime in San Francisco using XGBoost and Python" class="read-more" href="https://www.relataly.com/predicting-crimes-in-san-francisco-creatingsf-crime-map-using-xgboost/2960/" aria-label="Read more about Predictive Policing: Preventing Crime in San Francisco using XGBoost and Python">Read more</a></p>
<p>The post <a href="https://www.relataly.com/predicting-crimes-in-san-francisco-creatingsf-crime-map-using-xgboost/2960/">Predictive Policing: Preventing Crime in San Francisco using XGBoost and Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>In this tutorial, we&#8217;ll be using machine learning to predict and map out crime in San Francisco. We&#8217;ll be working with a dataset from Kaggle that contains information on 39 different types of crimes, including everything from vehicle theft to drug offenses. Using Python and the powerful Scikit-Learn library, we&#8217;ll train a classification model using the <a href="https://www.relataly.com/category/machine-learning-algorithms/gradient-boosting/" target="_blank" rel="noreferrer noopener">XGboost algorithm</a> to predict 39 types of crimes based on when and where it occurred. We&#8217;ll then use the Plotly library to visualize the results on a map of the city, highlighting areas with higher rates of certain crimes. This type of prediction and mapping is similar to what the San Francisco Police Department uses in their practice of predictive policing, where they allocate resources to at-risk areas in an effort to prevent crime.</p>



<p>As we embark on this thrilling journey, we&#8217;ll start by downloading and preprocessing the San Francisco crime data. Next, we&#8217;ll channel the data to train two distinct classification models. The first model will utilize a standard Random Forest Classifier, while the second will leverage the exceptional XGBoost package. We&#8217;ll experiment with various models that boast different hyperparameters. Ultimately, we&#8217;ll visualize our predictions on a striking SF crime map and assess the performance of our diverse models. So, buckle up and let&#8217;s dive into the exhilarating world of crime prediction and mapping!</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img decoding="async" width="508" height="513" data-attachment-id="12478" data-permalink="https://www.relataly.com/predicting-crimes-in-san-francisco-creatingsf-crime-map-using-xgboost/2960/crime-prediction-san-francisco-city-map-xgboost-python-tutorial-min/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/02/crime-prediction-san-francisco-city-map-xgboost-python-tutorial-min.png" data-orig-size="508,513" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="crime-prediction-san-francisco-city-map-xgboost-python-tutorial-min" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/02/crime-prediction-san-francisco-city-map-xgboost-python-tutorial-min.png" src="https://www.relataly.com/wp-content/uploads/2023/02/crime-prediction-san-francisco-city-map-xgboost-python-tutorial-min.png" alt="crime prediction san francisco city map xgboost python tutorial. Image generated using Midjourney. relataly.com" class="wp-image-12478" srcset="https://www.relataly.com/wp-content/uploads/2023/02/crime-prediction-san-francisco-city-map-xgboost-python-tutorial-min.png 508w, https://www.relataly.com/wp-content/uploads/2023/02/crime-prediction-san-francisco-city-map-xgboost-python-tutorial-min.png 297w, https://www.relataly.com/wp-content/uploads/2023/02/crime-prediction-san-francisco-city-map-xgboost-python-tutorial-min.png 140w" sizes="(max-width: 508px) 100vw, 508px" /><figcaption class="wp-element-caption">Predictive policing can make police work much more efficient and effective. Image generated using <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a>.</figcaption></figure>
</div>
</div>



<h2 class="wp-block-heading">What is Predictive Policing?</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>The use case we are looking at in this article falls into predictive policing. Predictive policing uses data, algorithms, and other technological tools to predict where and when crimes are likely to occur. The goal of predictive policing is to help law enforcement agencies better allocate their resources and focus their efforts on areas where crime is likely to happen, with the ultimate goal of reducing crime and improving public safety. This approach to policing is based on the idea that by using data and other tools to identify patterns and trends, law enforcement agencies can better anticipate where crimes are likely to occur and take steps to prevent them from happening.</p>



<p>The benefits of predictive policing include the ability to allocate law enforcement resources better, the potential to reduce crime and improve public safety, and the ability to identify trends and patterns that may not be immediately obvious to law enforcement officers. Additionally, by using data and other tools to anticipate where crimes are likely to occur, law enforcement agencies can take proactive steps to prevent those crimes from happening, which can save time and money.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h2 class="wp-block-heading" id="h-creating-a-crime-map-for-predictive-policing-using-xgboost-in-python">Creating a Crime Map for Predictive Policing using XGBoost in Python</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>In this practical tutorial, we&#8217;ll construct an XGBoost multi-label classifier to predict crime types in San Francisco. Urban crime, such as in San Francisco, is a dynamic and multifaceted issue that can dramatically vary based on location, time, and other factors. Our aim is to develop a predictive algorithm capable of forecasting specific crime types based on a given location and time parameters. The end product is an interactive San Francisco crime map providing a snapshot of crime hotspots throughout the city.</p>



<p>Law enforcement agencies, like the San Francisco Police Department, use similar maps for strategic resource allocation to curb crime rates effectively. Additionally, this SF crime map will underscore crime clusters &#8211; areas notorious for particular types of crime incidents. By the end of this tutorial, you&#8217;ll have a deeper understanding of using machine learning in practical scenarios and aiding real-world decision-making.</p>



<p>The code is available on the GitHub repository.</p>



<div class="wp-block-kadence-advancedbtn kb-buttons-wrap kb-btns_262c97-e5"><a class="kb-button kt-button button kb-btn_e6ce86-27 kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit  kt-btn-has-text-true kt-btn-has-svg-true  wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-tutorials/blob/master/02%20Classification/018%20Forecasting%20Criminal%20Activity%20in%20San%20Francisco.ipynb" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fe_eye kt-btn-icon-side-left"><svg viewBox="0 0 24 24"  fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M1 12s4-8 11-8 11 8 11 8-4 8-11 8-11-8-11-8z"/><circle cx="12" cy="12" r="3"/></svg></span><span class="kt-btn-inner-text">View on GitHub </span></a>

<a class="kb-button kt-button button kb-btn_3b4a4f-fe kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit  kt-btn-has-text-true kt-btn-has-svg-true  wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-API-tutorials" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fa_github kt-btn-icon-side-left"><svg viewBox="0 0 496 512"  fill="currentColor" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg></span><span class="kt-btn-inner-text">Relataly Github Repo </span></a></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img decoding="async" width="500" height="487" data-attachment-id="12476" data-permalink="https://www.relataly.com/predicting-crimes-in-san-francisco-creatingsf-crime-map-using-xgboost/2960/san-francisco-crime-prediction-machine-learning-python-tutorial-crime-map/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/02/san-francisco-crime-prediction-machine-learning-python-tutorial-crime-map.png" data-orig-size="500,487" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="san-francisco-crime-prediction-machine-learning-python-tutorial-crime-map" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/02/san-francisco-crime-prediction-machine-learning-python-tutorial-crime-map.png" src="https://www.relataly.com/wp-content/uploads/2023/02/san-francisco-crime-prediction-machine-learning-python-tutorial-crime-map.png" alt="" class="wp-image-12476" srcset="https://www.relataly.com/wp-content/uploads/2023/02/san-francisco-crime-prediction-machine-learning-python-tutorial-crime-map.png 500w, https://www.relataly.com/wp-content/uploads/2023/02/san-francisco-crime-prediction-machine-learning-python-tutorial-crime-map.png 300w" sizes="(max-width: 500px) 100vw, 500px" /><figcaption class="wp-element-caption">Crime doesn&#8217;t sleep in San Francisco. That&#8217;s why predictive policing can make a real impact. Image generated with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a></figcaption></figure>
</div>
</div>



<h3 class="wp-block-heading" id="h-prerequisites">Prerequisites</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Before starting the Python coding part, ensure that you have set up your <a href="https://www.python.org/downloads/" target="_blank" rel="noreferrer noopener">Python 3</a> environment and required packages. If you don&#8217;t have an environment, follow&nbsp;<a href="https://www.relataly.com/anaconda-python-environment-machine-learning/1663/" target="_blank" rel="noreferrer noopener">this tutorial</a>&nbsp;to set up the&nbsp;<a href="https://www.anaconda.com/products/individual" target="_blank" rel="noreferrer noopener">Anaconda environment</a>.</p>



<p>Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:&nbsp;</p>



<ul class="wp-block-list">
<li><em><a href="https://pandas.pydata.org/" target="_blank" rel="noreferrer noopener">pandas</a></em></li>



<li><em><a href="https://numpy.org/" target="_blank" rel="noreferrer noopener">NumPy</a></em></li>



<li><em><a href="https://matplotlib.org/" target="_blank" rel="noreferrer noopener">matplotlib</a></em></li>



<li>Seaborn</li>
</ul>



<p>In addition, we will be using XGBoost (&#8216;xgboost&#8217;) and the machine learning library scikit-learn. </p>



<p>You can install packages using console commands:</p>



<ul class="wp-block-list">
<li><em>pip install &lt;package name&gt;</em></li>



<li><em>conda install &lt;package name&gt;</em>&nbsp;(if you are using the anaconda packet manager)</li>
</ul>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h3 class="wp-block-heading" id="h-step-1-load-the-data">Step #1 Load the Data</h3>



<p>We begin by downloading the San Francisco crime challenge data on kaggle.com. Once you have downloaded the dataset, place the CSV files (train.csv) into your Python working folder.</p>



<p>The dataset was collected by the SFO police department between 2003 and 2015. According to the data description from the SF crime challenge, the dataset contains the following variables:</p>



<ul class="wp-block-list">
<li><strong>Dates</strong>: timestamp of the crime incident</li>



<li><strong>Category</strong>: Category of the crime incident (only in train.csv) that we will use as the target variable</li>



<li><strong>Descript</strong>: detailed description of the crime incident&nbsp;(only in train.csv)</li>



<li><strong>DayOfWeek:</strong> the day of the week</li>



<li><strong>PdDistrict</strong>: the name of the Police Department District</li>



<li><strong>Resolution</strong>: how the crime incident was resolved&nbsp;(only in train.csv)</li>



<li><strong>Address</strong>: the approximate street address of the crime incident&nbsp;</li>



<li><strong>X</strong>: Longitude</li>



<li><strong>Y</strong>: Latitude</li>
</ul>



<p>The next step is to load the data into a dataframe. Then we use the head() command to print the first five lines and ensure you can see the data.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt 
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix
from xgboost import XGBClassifier
import plotly.express as px

# The Data is part of the Kaggle Competition: https://www.kaggle.com/c/sf-crime/data
df_base = pd.read_csv(&quot;data/crime/sf-crime/train.csv&quot;)

print(df_base.describe())
df_base.head()</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">		X              Y
count  	878049.000000  878049.000000
mean     -122.422616      37.771020
std         0.030354       0.456893
min      -122.513642      37.707879
25%      -122.432952      37.752427
50%      -122.416420      37.775421
75%      -122.406959      37.784369
max      -120.500000      90.000000</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">	Dates				Category	Descript			DayOfWeek	PdDistrict	Resolution	Address				X			Y
0	2015-05-13 23:53:00	WARRANTS	WARRANT ARREST		Wednesday	NORTHERN	ARREST, 	OAK ST / ...		-122.425892	37.774599
1	2015-05-13 23:53:00	OTHER ...	TRAFFIC ...			Wednesday	NORTHERN	ARREST, 	OAK ST / ...		-122.425892	37.774599
2	2015-05-13 23:33:00	OTHER ...	TRAFFIC ...			Wednesday	NORTHERN	ARREST, 	VANNESS AV... ST	-122.424363	37.800414
3	2015-05-13 23:30:00	LARCENY/THEFT	GRAND THEFT...	Wednesday	NORTHERN	NONE		1500 Block... ST	-122.426995	37.800873
4	2015-05-13 23:30:00	LARCENY/THEFT	GRAND THEFT ...	Wednesday	PARK		NONE		100 Block... ST		-122.438738	37.771541</pre></div>



<p>If the data was loaded correctly, you should see the first five records of the dataframe, as shown above.</p>



<h3 class="wp-block-heading" id="h-step-2-explore-the-data">Step #2 Explore the Data</h3>



<p>At the beginning of a new project, we usually don&#8217;t understand the data well and need to acquire that understanding. Therefore, next, we will explore the data and familiarize ourselves with its characteristics. </p>



<p>The following examples will help us better understand our data&#8217;s characteristics. For example, you can use whisker charts and a correlation matrix to understand better the correlation between variables, such as between weekdays and prediction categories. Feel free to create more charts.</p>



<h4 class="wp-block-heading" id="h-2-1-prediction-labels">2.1 Prediction Labels</h4>



<p>Running the code below shows a bar plot of the prediction labels. The plot shows the frequency in which the class labels occur in the data.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># print the value counts of the categories
plt.figure(figsize=(15,5))
ax = sns.countplot(x = df_base['Category'], orient='v', order = df_base['Category'].value_counts().index)
ax.set_xticklabels(ax.get_xticklabels(),rotation = 90)</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="3167" data-permalink="https://www.relataly.com/predicting-crimes-in-san-francisco-creatingsf-crime-map-using-xgboost/2960/image-13-8/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2021/04/image-13.png" data-orig-size="910,467" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-13" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2021/04/image-13.png" src="https://www.relataly.com/wp-content/uploads/2021/04/image-13.png" alt="crime types in San Francisco" class="wp-image-3167" width="914" height="470" srcset="https://www.relataly.com/wp-content/uploads/2021/04/image-13.png 910w, https://www.relataly.com/wp-content/uploads/2021/04/image-13.png 300w, https://www.relataly.com/wp-content/uploads/2021/04/image-13.png 768w" sizes="(max-width: 914px) 100vw, 914px" /></figure>



<p>As shown above, our class labels are highly imbalanced, affecting model accuracy. When we evaluate the performance of our model, we need to consider this.</p>



<h4 class="wp-block-heading" id="h-2-2-when-a-crime-occured-considering-dates-and-time">2.2 When a Crime Occured &#8211; Considering Dates and Time</h4>



<p>We assume that when a crime occurs impacts the type of crime. For this reason, we look at how crimes distribute across different days of the week and times of the day. First, we look at crime numbers per weekday. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Print Crime Counts per Weekday
plt.figure(figsize=(6,3))
ax = sns.countplot(y = df_base['DayOfWeek'], orient='h', order = df_base['DayOfWeek'].value_counts().index)
ax.set_xticklabels(ax.get_xticklabels(),rotation = 90)</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="3164" data-permalink="https://www.relataly.com/predicting-crimes-in-san-francisco-creatingsf-crime-map-using-xgboost/2960/image-12-8/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2021/04/image-12.png" data-orig-size="437,238" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-12" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2021/04/image-12.png" src="https://www.relataly.com/wp-content/uploads/2021/04/image-12.png" alt="" class="wp-image-3164" width="669" height="364" srcset="https://www.relataly.com/wp-content/uploads/2021/04/image-12.png 437w, https://www.relataly.com/wp-content/uploads/2021/04/image-12.png 300w" sizes="(max-width: 669px) 100vw, 669px" /></figure>



<p>Fewer crimes happen on Sundays, and most are on Fridays. So it seems that even criminals like to have a weekend. For the sake of clarity, we thereby limit the categories. Let&#8217;s take a look at the time when certain crimes are reported.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Convert the time to minutes
df_base['Hour_Min'] = pd.to_datetime(df_base['Dates']).dt.hour  + pd.to_datetime(df_base['Dates']).dt.minute / 60

# Print Crime Counts per Time and Category
df_base_filtered = df_base[df_base['Category'].isin([
    'PROSTITUTION', 
    'VEHICLE THEFT', 
    'DRUG/NARCOTIC', 
    'WARRENTS', 
    'BURGLERY', 
    'FRAUD', 
    'ASSAULT',
    'LARCENY/THEFT',
    'VANDALISM'])]

plt.figure(figsize=(16,10))
ax = sns.displot(x = 'Hour_Min', hue=&quot;Category&quot;, data = df_base_filtered, kind=&quot;kde&quot;, height=8, aspect=1.5)</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="3174" data-permalink="https://www.relataly.com/predicting-crimes-in-san-francisco-creatingsf-crime-map-using-xgboost/2960/image-17-7/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2021/04/image-17.png" data-orig-size="983,568" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-17" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2021/04/image-17.png" src="https://www.relataly.com/wp-content/uploads/2021/04/image-17.png" alt="different crime types in San Francisco and how often they occur during the day" class="wp-image-3174" width="906" height="522"/></figure>



<p>In addition, the time when a crime happens affects the likelihood of certain types. For example, we can see that FRAUD rarely occurs at night and usually during the day. We can see that criminals often go to work in the afternoon and at midnight. On the other hand, certain crimes, such as VEHICLE THEFT, mainly occur at night and late afternoon but less often in the morning.</p>



<p>If you want to gain an overview of additional features, you can use the pair plot function. Because our dataset is large, we reduce the computation time by plotting 1/100 of the data.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">sns.pairplot(data = df_base_filtered[0::100], height=4, aspect=1.5, hue='Category')</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="8386" data-permalink="https://www.relataly.com/predicting-crimes-in-san-francisco-creatingsf-crime-map-using-xgboost/2960/pairplot-by-category-san-francisco-crime-map/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/pairplot-by-category-san-francisco-crime-map.png" data-orig-size="1464,844" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="pairplot-by-category-san-francisco-crime-map" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/pairplot-by-category-san-francisco-crime-map.png" src="https://www.relataly.com/wp-content/uploads/2022/05/pairplot-by-category-san-francisco-crime-map-1024x590.png" alt="pairplot by category, san francisco crime map" class="wp-image-8386" width="1080" height="622" srcset="https://www.relataly.com/wp-content/uploads/2022/05/pairplot-by-category-san-francisco-crime-map.png 1024w, https://www.relataly.com/wp-content/uploads/2022/05/pairplot-by-category-san-francisco-crime-map.png 300w, https://www.relataly.com/wp-content/uploads/2022/05/pairplot-by-category-san-francisco-crime-map.png 768w, https://www.relataly.com/wp-content/uploads/2022/05/pairplot-by-category-san-francisco-crime-map.png 1464w" sizes="(max-width: 1080px) 100vw, 1080px" /></figure>



<h4 class="wp-block-heading" id="h-2-3-where-a-crime-occured-considering-address">2.3 Where a Crime Occured &#8211; Considering Address</h4>



<p>Next, we look at the address information, from which we can often extract additional information. We do this by printing some sample address values.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Extracting information from the streetnames
for i in df_base['Address'][0:10]:
    print(i)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">OAK ST / LAGUNA ST
OAK ST / LAGUNA ST
VANNESS AV / GREENWICH ST
1500 Block of LOMBARD ST
100 Block of BRODERICK ST
0 Block of TEDDY AV
AVALON AV / PERU AV
KIRKWOOD AV / DONAHUE ST
600 Block of 47TH AV
JEFFERSON ST / LEAVENWORTH ST</pre></div>



<p>The street names alone are not so helpful. However, the address data does provide additional information. For example, it tells us whether the location is a street intersection or not. In addition, it contains the type of street. This information is valuable because now we can extract parts of the text and use them as separate features.</p>



<p>We could do a lot more, but we&#8217;ve got a good enough idea of the data.</p>



<h3 class="wp-block-heading" id="h-step-3-data-preprocessing">Step #3 Data Preprocessing</h3>



<p>Probably the most exciting and important aspect of model development is feature engineering. Compared to model parameterization, the right features can often achieve more significant leaps in performance. </p>



<h4 class="wp-block-heading" id="h-3-1-remarks-on-data-preprocessing-for-xgboost">3.1 Remarks on Data Preprocessing for XGBoost </h4>



<p>When preprocessing the data, it is helpful to know which algorithms to use because some algorithms are picky about the shape of the data. We will prepare the data to train a gradient-boosting model (XGBoost). This algorithm uses a random forest ensemble, which can only handle integer and Boolean values, but no categorical data. Therefore we need to encode our values. We also need to map the categorical labels to integer values.</p>



<p>We don&#8217;t need to scale the continuous feature variables because gradient boosting and decision trees, generally, are not sensitive to variables that have different scales.</p>



<h4 class="wp-block-heading" id="h-3-2-feature-engineering">3.2 Feature Engineering</h4>



<p>Based on the data exploration that we have done in the previous section, we create three feature types:</p>



<ul class="wp-block-list">
<li><strong>Date &amp; Time:</strong> When a crime happens is essential. For example, when there is a lot of traffic on the street, there is a higher likelihood of traffic-related crimes. For example, when it is Saturday, more people will usually come to the nightlife district, which attracts certain crimes, e.g., drug-related. Therefore, we will create different features for the time, the day, the month, and the year. </li>



<li><strong>Address</strong>: As mentioned, we will extract additional features from the address column. First, we create different features for the street type (for example, ST, AV, WY, TR, DR). In addition, we check whether the address contains the word &#8220;Block.&#8221; In addition, we will let our model know whether the address is a street crossing.</li>



<li><strong>Latitude &amp; <strong>Longitude</strong></strong>: We will transform the latitude and longitude values into polar coordinates. We will also remove some outliers from the dataset whose latitude is far off the grid. Above all, this will make it easier for our model to make sense of the location.</li>
</ul>



<p>Considering these features, the primary input to our crime-type prediction model is the information on when and where a crime occurs. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Processing Function for Features
def cart2polar(x, y):
    dist = np.sqrt(x**2 + y**2)
    phi = np.arctan2(y, x)
    return dist, phi

def preprocessFeatures(dfx):
    
    # Time Feature Engineering
    df = pd.get_dummies(dfx[['DayOfWeek' , 'PdDistrict']])
    df['Hour_Min'] = pd.to_datetime(dfx['Dates']).dt.hour + pd.to_datetime(dfx['Dates']).dt.minute / 60
    # We add a feature that contains the expontential time
    df['Hour_Min_Exp'] = np.exp(df['Hour_Min'])
    
    df['Day'] = pd.to_datetime(dfx['Dates']).dt.day
    df['Month'] = pd.to_datetime(dfx['Dates']).dt.month
    df['Year'] = pd.to_datetime(dfx['Dates']).dt.year

    month_one_hot_encoded = pd.get_dummies(pd.to_datetime(dfx['Dates']).dt.month, prefix='Month')
    df = pd.concat([df, month_one_hot_encoded], axis=1, join=&quot;inner&quot;)
    
    # Convert Carthesian Coordinates to Polar Coordinates
    df[['X', 'Y']] = dfx[['X', 'Y']] # we maintain the original coordindates as additional features
    df['dist'], df['phi'] = cart2polar(dfx['X'], dfx['Y'])
  
    # Extracting Street Types
    df['Is_ST'] = dfx['Address'].str.contains(&quot; ST&quot;, case=True)
    df['Is_AV'] = dfx['Address'].str.contains(&quot; AV&quot;, case=True)
    df['Is_WY'] = dfx['Address'].str.contains(&quot; WY&quot;, case=True)
    df['Is_TR'] = dfx['Address'].str.contains(&quot; TR&quot;, case=True)
    df['Is_DR'] = dfx['Address'].str.contains(&quot; DR&quot;, case=True)
    df['Is_Block'] = dfx['Address'].str.contains(&quot; Block&quot;, case=True)
    df['Is_crossing'] = dfx['Address'].str.contains(&quot; / &quot;, case=True)
    
    return df

# Processing Function for Labels
def encodeLabels(dfx):
    df = pd.DataFrame (columns = [])
    factor = pd.factorize(dfx['Category'])
    return factor

# Remove Outliers by Longitude
df_cleaned = df_base[df_base['Y']&lt;70]

# Encode Labels as Integer
factor = encodeLabels(df_cleaned)
y_df = factor[0]
labels = list(factor[1])
# for val, i in enumerate(labels):
#     print(val, i)</pre></div>



<p>We could also try to further improve our features by using additional data sources, such as weather data. However, there is no guarantee that this will improve the model results, and it did not in the case of criminal records. Therefore, we have omitted this part.</p>



<h3 class="wp-block-heading" id="h-step-4-visualize-crime-types-on-a-map-of-san-francisco">Step #4 Visualize Crime Types on a Map of San Francisco</h3>



<p>Next, we create a San Francisco crime map using the cartesian coordinates indicating where a crime has occurred. First, we only plot the data without a geographical map. Later we will use these spatial data to create a dot plot and overlay it with a map of San Francisco. Visualizing the crime types on a map helps us understand how crime types distribute across the city. </p>



<h4 class="wp-block-heading" id="h-4-1-plot-crime-types-using-a-scatter-plot">4.1 Plot Crime Types using a Scatter Plot</h4>



<p>Next, we want to gain an overview of possible spatial patterns and hotspots. We expect to see streets and neighborhoods where certain crimes are more common than in the more expensive areas of the city. In addition, we expect to see places in the city where certain crime types occur relatively rarely. To gain an overview of the crime distribution in San Francisco, we use a scatter plot to display the crime coordinates on a blank chart. </p>



<p>Running the code below creates the crime map of San Francisco with all crime types. Depending on the speed of your machine, the creation of the map may take several minutes.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Plot Criminal Activities by Lat and Long
df_filtered = df_cleaned.sample(frac=0.05)  
#df_filtered = df_cleaned[df_cleaned['Category'].isin(['PROSTITUTION', 'VEHICLE THEFT', 'FRAUD'])].sample(frac=0.05) # to filter 

groups = df_filtered.groupby('Category')

fig, ax = plt.subplots(sharex=False, figsize=(20, 12))
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
for name, group in groups:
    ax.plot(group['X'], group['Y'], marker='.', linestyle='', label=name, alpha=0.9)
ax.legend()
plt.show()</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="8389" data-permalink="https://www.relataly.com/predicting-crimes-in-san-francisco-creatingsf-crime-map-using-xgboost/2960/output-8/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/output.png" data-orig-size="1170,683" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="output" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/output.png" src="https://www.relataly.com/wp-content/uploads/2022/05/output-1024x598.png" alt="Crime Map of San Francisco (sf crime map) - Kaggle Crime Prediction Challenge. Classification with XGBoost" class="wp-image-8389" width="1111" height="649" srcset="https://www.relataly.com/wp-content/uploads/2022/05/output.png 1024w, https://www.relataly.com/wp-content/uploads/2022/05/output.png 300w, https://www.relataly.com/wp-content/uploads/2022/05/output.png 768w, https://www.relataly.com/wp-content/uploads/2022/05/output.png 1170w" sizes="(max-width: 1111px) 100vw, 1111px" /></figure>



<p>The plot shows that certain streets in San Francisco are more prone to specific crime types than others. It is also clear that there are certain crime hotspots in the city, especially in the center. We can also see that few crimes are reported in public park areas. </p>



<h4 class="wp-block-heading" id="h-4-2-create-a-crime-map-of-san-francisco-using-plotly">4.2 Create a Crime Map of San Francisco using Plotly</h4>



<p>Next, we will create a San Francisco crime map using the Plotly Python library. Because the plugin can handle a limited amount of data simultaneously, we will reduce our data to a fraction of 1% and a few selected crime types. </p>



<p>Running the code below opens a _map.html file in your browser that displays the SF crime map. The result is a zoomable geographic map of San Francisco that shows how the selected crime types distribute across the city.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># 4.2 Create a Crime Map of San Francisco using Plotly
# Limit the data to a fraction and selected categories
df_filtered = df_cleaned.sample(frac=0.01) 
fig = px.scatter_mapbox(df_filtered, lat=&quot;Y&quot;, lon=&quot;X&quot;, hover_name=&quot;Category&quot;, color='Category', hover_data=[&quot;Y&quot;, &quot;X&quot;], zoom=12, height=800)
fig.update_layout(mapbox_style=&quot;open-street-map&quot;)
fig.update_layout(margin={&quot;r&quot;:0,&quot;t&quot;:0,&quot;l&quot;:0,&quot;b&quot;:0})
fig.show()</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="8401" data-permalink="https://www.relataly.com/predicting-crimes-in-san-francisco-creatingsf-crime-map-using-xgboost/2960/map-sf-crime/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/map-sf-crime.png" data-orig-size="1600,650" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="map-sf-crime" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/map-sf-crime.png" src="https://www.relataly.com/wp-content/uploads/2022/05/map-sf-crime-1024x416.png" alt="Crime Map of San Francisco (SF Crime Map) - Kaggle Crime Prediction Challenge. Crime Classification XGBoost" class="wp-image-8401" width="1097" height="445" srcset="https://www.relataly.com/wp-content/uploads/2022/05/map-sf-crime.png 1024w, https://www.relataly.com/wp-content/uploads/2022/05/map-sf-crime.png 300w, https://www.relataly.com/wp-content/uploads/2022/05/map-sf-crime.png 768w, https://www.relataly.com/wp-content/uploads/2022/05/map-sf-crime.png 1536w, https://www.relataly.com/wp-content/uploads/2022/05/map-sf-crime.png 1600w" sizes="(max-width: 1097px) 100vw, 1097px" /></figure>



<p>The SF crime map shows different types of crimes, including prostitution, vehicle theft, and fraud. The interactive map allows you to change zoom levels and filter the type of crime displayed on the map. For example, if you filter DRUG/NARCOTIC-related crimes, you can see that these crimes mainly occur in the city center near the financial district and the nightlife area.</p>



<h3 class="wp-block-heading" id="h-step-5-split-the-data">Step #5 Split the Data</h3>



<p>Before training our predictive model, we will split our data into separate datasets for training and testing. For this purpose, we use the train_test_split function of scikit-learn and configure a split ratio of 70%. Then we output the data, which we employ in the next step to train and validate a model.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Create train_df &amp; test_df
x_df = preprocessFeatures(df_cleaned).copy()

# Split the data into x_train and y_train data sets
x_train, x_test, y_train, y_test = train_test_split(x_df, y_df, train_size=0.7, random_state=0)
x_train</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">		DayOfWeek_Friday	DayOfWeek_Monday	DayOfWeek_Saturday	DayOfWeek_Sunday	DayOfWeek_Thursday	DayOfWeek_Tuesday	DayOfWeek_Wednesday	PdDistrict_BAYVIEW	PdDistrict_CENTRAL	PdDistrict_INGLESIDE	...	Y			dist		phi			Is_ST	Is_AV	Is_WY	Is_TR	Is_DR	Is_Block	Is_crossing
276998	0					0					0					0					0					1					0					0					0					0						...	37.785023	128.110900	2.842200	True	False	False	False	False	True		False
81579	0					0					0					0					0					1					0					0					0					0						...	37.748470	128.185052	2.842677	False	True	False	False	False	True		False
206676	0					0					0					1					0					0					0					0					0					0						...	37.762744	128.113657	2.842389	True	False	False	False	False	True		False
732006	0					0					0					0					0					0					1					0					0					0						...	37.784140	128.109653	2.842204	True	False	False	False	False	False		True
796194	1					0					0					0					0					0					0					0					0					0						...	37.791333	128.125982	2.842185	True	False	False	False	False	True		False
5 rows × 45 columns</pre></div>



<h3 class="wp-block-heading" id="h-step-6-train-a-random-forest-classifier">Step #6 Train a Random Forest Classifier</h3>



<p>We can train the predictive models now that we have prepared the data. We train a basic model based on the Random Forest algorithm in the first step. The Random Forest is a robust algorithm that can handle regression and classification problems. One of our <a href="https://www.relataly.com/anyone-about-to-leave-predicting-the-customer-churn-of-a-telecommunications-provider/2378/" target="_blank" rel="noreferrer noopener">recent articles provides more information on Random Forests</a> and how you can find the optimal configuration of their hyperparameters. In this tutorial, we use the Random Forest to establish a baseline against which we can measure the performance of our XGboost model.  We, therefore, use the Random Forest with a simple parameter configuration without tuning the hyperparameters.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Train a single random forest classifier - parameters are a best guess
clf = RandomForestClassifier(max_depth=100, random_state=0, n_estimators = 200)
clf.fit(x_train, y_train.ravel())
y_pred = clf.predict(x_test)

results_log = classification_report(y_test, y_pred)
print(results_log)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Output exceeds the size limit. Open the full output data in a text editor
              precision    recall  f1-score   support

           0       0.15      0.10      0.12     12657
           1       0.29      0.35      0.32     37898
           2       0.38      0.63      0.47     52237
           3       0.46      0.40      0.43     16136
           4       0.16      0.08      0.10     13426
           5       0.25      0.21      0.23     27798
           6       0.10      0.04      0.06      6850
           7       0.23      0.22      0.23     23087
           8       0.19      0.12      0.15      2586
           9       0.20      0.13      0.15     10942
          10       0.08      0.03      0.05      9559
          11       0.00      0.00      0.00      1300
          12       0.20      0.10      0.14      3200
          13       0.37      0.43      0.40     16282
          14       0.02      0.02      0.02      1350
          15       0.01      0.00      0.00      2912
          16       0.05      0.03      0.04      2217
          17       0.61      0.52      0.56      7865
          18       0.11      0.06      0.08      4954
          19       0.04      0.03      0.03       723
          20       0.28      0.19      0.23       581
          21       0.05      0.02      0.03       708
          22       0.25      0.13      0.17      1333
...
    accuracy                           0.31    263395
   macro avg       0.15      0.12      0.13    263395
weighted avg       0.28      0.31      0.28    263395</pre></div>



<p>The baseline model is a random forest classifier with 31% percent accuracy on the test dataset.</p>



<h3 class="wp-block-heading" id="h-step-7-train-an-xgboost-classifier">Step #7 Train an XGBoost Classifier</h3>



<p>Now that we have a baseline model, we can train our gradient boosting classifier using the XGBoost package. We expect this model to perform better than the baseline. </p>



<h4 class="wp-block-heading" id="h-7-1-about-gradient-boosting">7.1 About Gradient Boosting</h4>



<p>XGBoost is an implementation of a <a href="https://www.relataly.com/category/machine-learning-algorithms/gradient-boosting/" target="_blank" rel="noreferrer noopener">gradient-boosting algorithm</a> that uses a decision-tree-based ensemble machine learning algorithm. The algorithm searches for an optimal ensemble of trees. In this process, the algorithm iteratively adds trees to the model or removes them to reduce the prediction error of the previous tree constellation. The algorithm repeats these steps until it can make no further improvements. Thus, training does not optimize the model against the predictions but the previous model&#8217;s residuals (prediction errors).</p>



<p id="h-xgboost-an-implementation-of-a-gradient-boosting-algorithm-that-creates-a-random-forest-which-is-an-ensemble-of-decision-trees-the-basic-idea-of-a-decision-forest-is-to-have-an-ensemble-of-decision-trees-that-come-to-a-conclusion-on-the-prediction-label-by-casting-a-vote-gradient-boosting-is-used-to-find-an-optimal-ensemble-of-trees-and-parameters-boosting-is-the-process-of-iteratively-adding-trees-to-correct-the-prediction-error-of-a-previous-ensemble-of-trees-until-no-further-improvements-can-be-achieved-the-models-are-thereby-trained-to-predict-the-residuals-prediction-errors-of-the-previous-model">But XGBoost does more! It is an extreme version of gradient boosting that uses additional optimization techniques to achieve the best result with minimal effort. In contrast to the random decision forest, the XGBoost classification algorithm determines an optimal number of trees in the training process. We do not have to specify this number in advance.</p>



<p id="h-xgboost-an-implementation-of-a-gradient-boosting-algorithm-that-creates-a-random-forest-which-is-an-ensemble-of-decision-trees-the-basic-idea-of-a-decision-forest-is-to-have-an-ensemble-of-decision-trees-that-come-to-a-conclusion-on-the-prediction-label-by-casting-a-vote-gradient-boosting-is-used-to-find-an-optimal-ensemble-of-trees-and-parameters-boosting-is-the-process-of-iteratively-adding-trees-to-correct-the-prediction-error-of-a-previous-ensemble-of-trees-until-no-further-improvements-can-be-achieved-the-models-are-thereby-trained-to-predict-the-residuals-prediction-errors-of-the-previous-model">A disadvantage of XGBoost is that it tends to overfit the data. Therefore, testing against unseen data is essential. This tutorial will test only against a single test sample for simplicity, but using cross-validation would be a better choice.</p>



<h4 class="wp-block-heading" id="h-7-2-train-the-xgboost-classifier">7.2 Train the XGBoost Classifier</h4>



<p>Various Gradient Boosting Algorithms are available for Python, including one from scikit-learn. However, scikit-learn does not support multi-threading, which makes the training process slower than necessary. For this reason, we will use the gradient boosting classifier from the XGBoost package.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Configure the XGBoost model
param = {'booster': 'gbtree', 
         'tree_method': 'gpu_hist',
         'predictor': 'gpu_predictor',
         'max_depth': 140, 
         'eta': 0.3, 
         'objective': '{multi:softmax}', 
         'eval_metric': 'mlogloss', 
         'num_round': 30,
         'feature_selector ': 'cyclic'
        }

xgb_clf = XGBClassifier(param)
xgb_clf.fit(x_train, y_train.ravel())
score = xgb_clf.score(x_test, y_test.ravel())
print(score)

# Create predictions on the test dataset
y_pred = xgb_clf.predict(x_test)

# Print a classification report
results_log = classification_report(y_test, y_pred)
print(results_log)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Output exceeds the size limit. Open the full output data in a text editor
0.30852142219859907
              precision    recall  f1-score   support

           0       0.17      0.01      0.02     12657
           1       0.30      0.42      0.35     37898
           2       0.33      0.72      0.46     52237
           3       0.31      0.27      0.29     16136
           4       0.21      0.03      0.05     13426
           5       0.24      0.18      0.21     27798
           6       0.17      0.01      0.01      6850
           7       0.21      0.19      0.20     23087
           8       0.26      0.01      0.02      2586
           9       0.22      0.08      0.12     10942
          10       0.13      0.00      0.00      9559
          11       0.07      0.00      0.01      1300
          12       0.20      0.08      0.11      3200
          13       0.34      0.43      0.38     16282
          14       0.00      0.00      0.00      1350
          15       0.12      0.00      0.01      2912
          16       0.15      0.02      0.03      2217
          17       0.57      0.34      0.43      7865
          18       0.19      0.03      0.05      4954
          19       0.00      0.00      0.00       723
          20       0.50      0.24      0.32       581
          21       0.10      0.01      0.01       708
...
    accuracy                           0.31    263395
   macro avg       0.18      0.11      0.11    263395
weighted avg       0.27      0.31      0.25    263395
</pre></div>



<p>Now that we have trained our classification model, let&#8217;s see how it performs. For this purpose, we will generate predictions (y_pred) on the test dataset (x_test). Afterward, we use the predictions and the valid values (y_test) to create a classification report.</p>



<p>Our model achieves an accuracy score of 31%. At first hand, this might not look so good, but considering that we have 39 categories and only sparse information available, this performance is quite impressive.   </p>



<h3 class="wp-block-heading" id="h-step-8-measure-model-performance">Step #8 Measure Model Performance</h3>



<p>So how well does our XGboost model perform? To measure the performance of our model, we create a confusion matrix that visualizes the performance of the XGboost classifier. If you want to learn more about measuring the performance of classification models, check out<a href="https://www.relataly.com/measuring-classification-performance-with-python-and-scikit-learn/846/" target="_blank" rel="noreferrer noopener"> this tutorial on measuring classification performance</a>.</p>



<p>Running the code below creates the confusion matrix that shows the number of correct and false predictions for each crime category.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Print a multi-Class Confusion Matrix
cnf_matrix = confusion_matrix(y_test.reshape(-1), y_pred)
df_cm = pd.DataFrame(cnf_matrix, columns=np.unique(y_test), index = np.unique(y_test))
df_cm.index.name = 'Actual'
df_cm.columns.name = 'Predicted'
plt.figure(figsize = (16,12))
plt.tight_layout()
sns.set(font_scale=1.4) #for label size
sns.heatmap(df_cm, cbar=True, cmap= &quot;inferno&quot;, annot=False, fmt='.0f' #, annot_kws={&quot;size&quot;: 13}
           )</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="3213" data-permalink="https://www.relataly.com/predicting-crimes-in-san-francisco-creatingsf-crime-map-using-xgboost/2960/image-22-5/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2021/04/image-22.png" data-orig-size="903,709" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-22" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2021/04/image-22.png" src="https://www.relataly.com/wp-content/uploads/2021/04/image-22.png" alt="Evaluating the performance of our XGboost classifier; sfo crime map" class="wp-image-3213" width="669" height="525" srcset="https://www.relataly.com/wp-content/uploads/2021/04/image-22.png 903w, https://www.relataly.com/wp-content/uploads/2021/04/image-22.png 300w, https://www.relataly.com/wp-content/uploads/2021/04/image-22.png 768w" sizes="(max-width: 669px) 100vw, 669px" /></figure>



<p>The confusion matrix shows that our model frequently predicts crime category two and neglects the other crime types. The reason is the uneven distribution of crime types in the training data. As a result, when we evaluate the model, we need to pay attention to the importance of the different crime types. For example, we might train the model to predict certain crime types accurately, although this might come at a lower accuracy when predicting other crime types. However, such optimizations depend on the technical context and the goals one wants to achieve with the prediction model. </p>



<h2 class="wp-block-heading" id="h-summary">Summary</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>This tutorial has presented the machine learning use case &#8220;Predictive Policing&#8221; and showed how to implement it in Python. We have trained an XGBoost model that predicts crime types in San Francisco based on the information on when and where specific crimes have occurred. We also illustrated our data on an interactive crime map of San Francisco with the Plotly Python library. The Crime Map is an intuitive way of visualizing crime in a city and highlighting particular hotspots. Finally, we have used the prediction model to make test predictions and evaluate the model performance against other algorithms, such as a classic Random Decision Forest. The XGBoost model achieves a prediction accuracy of about 31%—a respectable performance, considering that the prediction problem involves 39 crime classes.</p>



<p>We hope this tutorial was helpful. If you have any questions or suggestions on what we could improve, feel free to post them in the comments. We appreciate your feedback.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="8410" data-permalink="https://www.relataly.com/predicting-crimes-in-san-francisco-creatingsf-crime-map-using-xgboost/2960/image-12-13/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/image-12.png" data-orig-size="658,592" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-12" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/image-12.png" src="https://www.relataly.com/wp-content/uploads/2022/05/image-12.png" alt="Crime Map of San Francisco - Kaggle Crime Prediction Challenge. Crim Classification XGBoost" class="wp-image-8410" width="346" height="311" srcset="https://www.relataly.com/wp-content/uploads/2022/05/image-12.png 658w, https://www.relataly.com/wp-content/uploads/2022/05/image-12.png 300w" sizes="(max-width: 346px) 100vw, 346px" /><figcaption class="wp-element-caption">Predictive policing with machine learning &#8211; Crime map of San Francisco, created with Python and Plotly</figcaption></figure>
</div>
</div>



<h2 class="wp-block-heading">Sources and Further Reading</h2>



<p>Looking for more esciting map vizualizations? Consider the relataly tutorial on <a href="https://www.relataly.com/visualize-covid-19-data-on-a-geographic-heat-maps/291/" target="_blank" rel="noreferrer noopener">visualizing COVID-19 data on geographic heatmaps using GeoPandas</a>.</p>



<div style="display: inline-block;">
  <iframe sandbox="allow-popups allow-scripts allow-modals allow-forms allow-same-origin" style="width:120px;height:240px;" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" src="//ws-eu.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&amp;OneJS=1&amp;Operation=GetAdHtml&amp;MarketPlace=DE&amp;source=ss&amp;ref=as_ss_li_til&amp;ad_type=product_link&amp;tracking_id=flo7up-21&amp;language=de_DE&amp;marketplace=amazon&amp;region=DE&amp;placement=3030181162&amp;asins=3030181162&amp;linkId=669e46025028259138fbb5ccec12dfbe&amp;show_border=true&amp;link_opens_in_new_window=true"></iframe>
<iframe sandbox="allow-popups allow-scripts allow-modals allow-forms allow-same-origin" style="width:120px;height:240px;" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" src="//ws-eu.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&amp;OneJS=1&amp;Operation=GetAdHtml&amp;MarketPlace=DE&amp;source=ss&amp;ref=as_ss_li_til&amp;ad_type=product_link&amp;tracking_id=flo7up-21&amp;language=de_DE&amp;marketplace=amazon&amp;region=DE&amp;placement=1999579577&amp;asins=1999579577&amp;linkId=91d862698bf9010ff4c09539e4c49bf4&amp;show_border=true&amp;link_opens_in_new_window=true"></iframe>
<iframe sandbox="allow-popups allow-scripts allow-modals allow-forms allow-same-origin" style="width:120px;height:240px;" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" src="//ws-eu.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&amp;OneJS=1&amp;Operation=GetAdHtml&amp;MarketPlace=DE&amp;source=ss&amp;ref=as_ss_li_til&amp;ad_type=product_link&amp;tracking_id=flo7up-21&amp;language=de_DE&amp;marketplace=amazon&amp;region=DE&amp;placement=1839217715&amp;asins=1839217715&amp;linkId=356ba074068849ff54393f527190825d&amp;show_border=true&amp;link_opens_in_new_window=true"></iframe>
<iframe sandbox="allow-popups allow-scripts allow-modals allow-forms allow-same-origin" style="width:120px;height:240px;" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" src="//ws-eu.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&amp;OneJS=1&amp;Operation=GetAdHtml&amp;MarketPlace=DE&amp;source=ss&amp;ref=as_ss_li_til&amp;ad_type=product_link&amp;tracking_id=flo7up-21&amp;language=de_DE&amp;marketplace=amazon&amp;region=DE&amp;placement=1492032646&amp;asins=1492032646&amp;linkId=2214804dd039e7103577abd08722abac&amp;show_border=true&amp;link_opens_in_new_window=true"></iframe>
</div>



<p class="has-contrast-2-color has-base-3-background-color has-text-color has-background"><em>The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.</em></p>
<p>The post <a href="https://www.relataly.com/predicting-crimes-in-san-francisco-creatingsf-crime-map-using-xgboost/2960/">Predictive Policing: Preventing Crime in San Francisco using XGBoost and Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.relataly.com/predicting-crimes-in-san-francisco-creatingsf-crime-map-using-xgboost/2960/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">2960</post-id>	</item>
		<item>
		<title>Forecasting Beer Sales with ARIMA in Python</title>
		<link>https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/</link>
					<comments>https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/#comments</comments>
		
		<dc:creator><![CDATA[Florian Follonier]]></dc:creator>
		<pubDate>Wed, 03 Feb 2021 22:23:08 +0000</pubDate>
				<category><![CDATA[ARIMA]]></category>
		<category><![CDATA[Logistics]]></category>
		<category><![CDATA[Manufacturing]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Retail]]></category>
		<category><![CDATA[Sales Forecasting]]></category>
		<category><![CDATA[statsmodels]]></category>
		<category><![CDATA[Stock Market Forecasting]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Time Series Forecasting]]></category>
		<category><![CDATA[Use Cases]]></category>
		<category><![CDATA[AI in Business]]></category>
		<category><![CDATA[AI in E-Commerce]]></category>
		<category><![CDATA[Classic Machine Learning]]></category>
		<category><![CDATA[Intermediate Tutorials]]></category>
		<category><![CDATA[Supervised Learning]]></category>
		<category><![CDATA[Univariate Time Series Forecasting]]></category>
		<guid isPermaLink="false">https://www.relataly.com/?p=2884</guid>

					<description><![CDATA[<p>Time series analysis and forecasting is a tough nut to crack, but the ARIMA model has been cracking it for decades. ARIMA, short for &#8220;Auto-Regressive Integrated Moving Average,&#8221; is a powerful statistical modeling technique for time series analysis. It&#8217;s particularly effective when the time series you&#8217;re analyzing follows a clear pattern, like seasonal changes in ... <a title="Forecasting Beer Sales with ARIMA in Python" class="read-more" href="https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/" aria-label="Read more about Forecasting Beer Sales with ARIMA in Python">Read more</a></p>
<p>The post <a href="https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/">Forecasting Beer Sales with ARIMA in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Time series analysis and forecasting is a tough nut to crack, but the ARIMA model has been cracking it for decades. ARIMA, short for &#8220;Auto-Regressive Integrated Moving Average,&#8221; is a powerful statistical modeling technique for time series analysis. It&#8217;s particularly effective when the time series you&#8217;re analyzing follows a clear pattern, like seasonal changes in weather or sales. ARIMA has been used to forecast everything from beer sales to order quantities, and this tutorial will show you how to build your own ARIMA model in Python. You&#8217;ll be making predictions like a pro in no time!</p>



<p>This tutorial proceeds in two parts: The first part covers the concepts behind ARIMA. You will learn how ARIMA works, what Stationarity means, and when it is appropriate to use ARIMA. The second part is a Python hands-on tutorial that applies auto-ARIMA to the Sales Forecasting domain. We&#8217;ll be working with a time series of beer sales, and our goal is to predict how the beer sales quantities will evolve in the coming years. First, we check if the time series is stationary. Then we train an ARIMA forecasting model. Finally, we use the model to produce a sales forecast and measure the model&#8217;s performance.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<div style="height:12px" aria-hidden="true" class="wp-block-spacer"></div>



<h2 class="wp-block-heading">About Sales Forecasting</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Sales forecasting is a crucial business strategy that involves predicting future sales volumes for a product (for example, beer) or service. It leverages sophisticated statistical and analytical techniques, such as time series analysis or machine learning algorithms, to scrutinize historical sales data. By identifying trends and patterns within this data, businesses can make informed predictions about their future sales performance.</p>



<p>This strategic forecasting plays a pivotal role in business operations. It is instrumental in guiding key decisions surrounding production, inventory management, staffing, and various other operational elements. By honing in on accurate sales forecasting, businesses can strike the perfect balance &#8211; maintaining enough inventory to meet customer demand without overproducing or overstocking. This equilibrium ensures a smooth flow in the supply chain and avoids unnecessary costs tied to excess production or storage.</p>



<p>Furthermore, sales forecasting serves as a roadmap for business growth. It aids in identifying potential market opportunities and predicting future sales revenue. This valuable foresight enables businesses to strategically plan their expansion, ensuring resources are optimally utilized and future goals are met. With this in-depth understanding of sales forecasting, businesses can stay ahead of market trends, navigate through business challenges, and ultimately steer towards success.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img decoding="async" width="506" height="508" data-attachment-id="12602" data-permalink="https://www.relataly.com/brewery-arima-beer-sales-forecasting-python-tutorial-machine-learning-relataly-min/" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/03/brewery-arima-beer-sales-forecasting-python-tutorial-machine-learning-relataly-min.png" data-orig-size="506,508" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="brewery arima beer sales forecasting python tutorial machine learning relataly-min" data-image-description="&lt;p&gt;Businesses rely on sales forecasting to make informed decisions about production, inventory management, staffing, and other key operational aspects.  Image created with Midjourney&lt;/p&gt;
" data-image-caption="&lt;p&gt;Businesses rely on sales forecasting to make informed decisions about production, inventory management, staffing, and other key operational aspects.  Image created with Midjourney&lt;/p&gt;
" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/brewery-arima-beer-sales-forecasting-python-tutorial-machine-learning-relataly-min.png" src="https://www.relataly.com/wp-content/uploads/2023/03/brewery-arima-beer-sales-forecasting-python-tutorial-machine-learning-relataly-min.png" alt="Businesses rely on sales forecasting to make informed decisions about production, inventory management, staffing, and other key operational aspects.  Image created with Midjourney" class="wp-image-12602" srcset="https://www.relataly.com/wp-content/uploads/2023/03/brewery-arima-beer-sales-forecasting-python-tutorial-machine-learning-relataly-min.png 506w, https://www.relataly.com/wp-content/uploads/2023/03/brewery-arima-beer-sales-forecasting-python-tutorial-machine-learning-relataly-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/03/brewery-arima-beer-sales-forecasting-python-tutorial-machine-learning-relataly-min.png 140w" sizes="(max-width: 506px) 100vw, 506px" /><figcaption class="wp-element-caption">Businesses rely on sales forecasting to make informed decisions about production, inventory management, staffing, and other key operational aspects. Image created with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a></figcaption></figure>
</div>
</div>



<div style="height:26px" aria-hidden="true" class="wp-block-spacer"></div>



<h2 class="wp-block-heading" id="h-introduction-to-arima-time-series-modelling">Introduction to ARIMA Time Series Modelling</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>ARIMA models provide an alternative approach to time series forecasting that differs significantly from machine learning methods. Working with ARIMA requires a good understanding of Stationarity and knowledge of the transformations used to make time-series data stationary. The concept of Stationarity is, therefore, first on our schedule.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h3 class="wp-block-heading" id="h-the-concept-of-stationarity">The Concept of Stationarity</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Stationarity is an essential concept in stochastic processes that describes the nature of a time series. We consider a time series strictly stationary if its statistical properties do not change over time. In this case, summary statistics, such as the mean and variance, do not change over time. However, the time-series data we encounter in the real world often show a trend or significant irregular fluctuations, making them non-stationary or weakly stationary.</p>



<p>So why is Stationarity such an essential concept for ARIMA? If a time series is stationary, we can assume that the past values of the time series are predictive of future development. In other words, a stationary time series exhibits consistent behavior that makes it predictable. On the other hand, a non-stationary time series is characterized by a kind of random behavior that will be difficult to capture in modeling. Namely, if random movements characterized the past, there is a high probability that the future will be no different.</p>



<p>Fortunately, in many cases, it is possible to transform a time series that is non-stationary into a stationary form and, in this way, build better prediction models.</p>



<div style="height:56px" aria-hidden="true" class="wp-block-spacer"></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img decoding="async" width="802" height="751" data-attachment-id="8592" data-permalink="https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/image-25-8/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/image-25.png" data-orig-size="802,751" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-25" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/image-25.png" src="https://www.relataly.com/wp-content/uploads/2022/05/image-25.png" alt="arima time series forecasting, stationary vs non-stationary data, python tutorial " class="wp-image-8592" srcset="https://www.relataly.com/wp-content/uploads/2022/05/image-25.png 802w, https://www.relataly.com/wp-content/uploads/2022/05/image-25.png 300w, https://www.relataly.com/wp-content/uploads/2022/05/image-25.png 768w" sizes="(max-width: 802px) 100vw, 802px" /><figcaption class="wp-element-caption">A stationary Vs. a non-stationary time series</figcaption></figure>
</div>
</div>



<h3 class="wp-block-heading">How to Test Whether a Time Series is Stationary</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>The first step in the ARIMA modeling approach is determining whether a time series is stationary. There are different ways to determine whether a time series is stationary:</p>



<ul class="wp-block-list">
<li><strong>Plotting:</strong> We can plot the time series and visually check if it shows consistent behavior or changes over a more extended period.</li>



<li><strong>Summary statistics</strong>: We can split the time series into different periods and calculate the summary statistics, such as the variance. If these metrics are subject to significant changes, the time series is non-stationary. However, the results will also depend on the respective periods, leading to false conclusions.</li>



<li><strong>Statistic tests</strong>: There are various tests to determine the stationary of a time series, such as Kwiatkowski–Phillips–Schmidt–Shin, Augmented Dickey-Fuller, or Phillips–Perron. These tests systematically check a time series and measure the results against the null hypothesis, providing an indicator of the trustworthiness of the results.</li>
</ul>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<div style="height:56px" aria-hidden="true" class="wp-block-spacer"></div>



<h3 class="wp-block-heading" id="h-what-is-an-s-arima-model">What is an (S)ARIMA Model?</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>As the name implies, ARIMA uses autoregression (AR), integration (differencing), and moving averages (MA) to fit a linear regression model to a time series. </p>



<h4 class="wp-block-heading" id="h-arima-parameters">ARIMA Parameters</h4>



<p>The default notation for ARIMA is a model with parameters p, d, and q, whereby each parameter takes an integer value: </p>



<ul class="wp-block-list">
<li><strong>d (differencing): </strong>In the case of a non-stationary time series, there is a chance to remove a trend from the data by differencing once or several times, thus bringing the data to a stationary state. The model parameter d determines the order of the differentiation. A value of d = 0 simplifies the ARIMA model to an ARMA model, lacking the integration aspect. If this is the case, we do not need to integrate the function because the time series is already stationary.</li>



<li><strong>p (order of the AR terms):</strong> The autoregressive process describes the dependent relationship between an observation and several lagged observations (lags). Predictions are then based on past data from the same time series using linear functions. p = 1 means the model uses values that lag by one period. </li>



<li><strong> q (order of the MA terms):</strong> The parameter q determines the number of lagged forecast errors in the prediction equation. In contrast to the AR process, the MA process assumes that values at a future point in time depend on the errors made by predictions at current and past points in time. This means that it is not previous events that determine the predictions but rather the previous estimation or prediction errors used to calculate the following time series value.</li>
</ul>



<h4 class="wp-block-heading" id="h-sarima">SARIMA</h4>



<p>In the real world, many time series have seasonal effects. Examples are monthly retail sales figures, temperature reports, weekly airline passenger data, etc. To consider this, we can specify a seasonal range (e.g., m=12 for monthly data) and additional seasonal AR or MA components for our model that deal with seasonality. Such a model is also called a SARIMA model, and we can define it as a model(p, d, q)(P, D, Q)[m]. </p>



<h4 class="wp-block-heading" id="h-auto-s-arima">Auto-(S)ARIMA </h4>



<p>When working with ARIMA, we can set the model parameters manually or use auto-ARIMA and let the model search for the optimal parameters. We do this by varying the parameters and then testing against Stationarity. With the seasonal option enabled, the process tries to identify the optimal hyperparameters for the seasonal components of the model. Auto-ARIMA works by conducting differencing tests to determine the order of differencing,&nbsp;<code>d</code> and then fitting models with parameters in defined ranges, e.g., start_<code>p</code>,&nbsp;<code>max_p</code>&nbsp;as well as <code>start_q</code>,&nbsp;<code>max_q</code>. If our model has a seasonal component, we can also define parameter ranges for the seasonal part of the model. </p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<div style="height:56px" aria-hidden="true" class="wp-block-spacer"></div>



<h2 class="wp-block-heading" id="h-creating-a-sales-forecast-with-arima-in-python">Creating a Sales Forecast with ARIMA in Python</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Having grasped the fundamental concepts behind ARIMA (AutoRegressive Integrated Moving Average), we&#8217;re now ready to dive into the practical aspect of crafting a sales forecasting model in Python. Utilizing ARIMA for forecasting sales data is an esteemed practice owing to the algorithm&#8217;s adeptness in modeling seasonal changes combined with long-term trends &#8211; a characteristic commonly exhibited by sales data.</p>



<p>In this tutorial, we&#8217;ll be employing a dataset representing the monthly beer sales across the United States from 1992 through 2018, recorded in millions of US dollars. Our objective is to construct a robust time series model using ARIMA to accurately predict future sales trends.</p>



<p>When it comes to the technological aspect, we&#8217;ll be using the Python-based &#8216;statsmodels&#8217; and &#8216;pmdarima&#8217; libraries to build our ARIMA sales forecasting model. So, if you&#8217;re ready to harness the power of Python and ARIMA for sales prediction, let&#8217;s get started!</p>



<p>The code is available on the GitHub repository.</p>



<div class="wp-block-kadence-advancedbtn kb-buttons-wrap kb-btns_87ac48-a9"><a class="kb-button kt-button button kb-btn_a01710-14 kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit  kt-btn-has-text-true kt-btn-has-svg-true  wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-tutorials/blob/master/01%20Time%20Series%20Forecasting%20%26%20Regression/001%20Forecasting%20US%20Beer%20Sales%20with%20(auto)%20ARIMA.ipynb" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fe_eye kt-btn-icon-side-left"><svg viewBox="0 0 24 24"  fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M1 12s4-8 11-8 11 8 11 8-4 8-11 8-11-8-11-8z"/><circle cx="12" cy="12" r="3"/></svg></span><span class="kt-btn-inner-text">View on GitHub </span></a>

<a class="kb-button kt-button button kb-btn_ff77d5-62 kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit  kt-btn-has-text-true kt-btn-has-svg-true  wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-API-tutorials" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fa_github kt-btn-icon-side-left"><svg viewBox="0 0 496 512"  fill="currentColor" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg></span><span class="kt-btn-inner-text">Relataly GitHub Repo </span></a></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-large"><img decoding="async" width="512" height="512" data-attachment-id="12612" data-permalink="https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/a-car-drinking-beer-after-creating-an-arima-sales-forecast-min-1/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/03/A-car-drinking-beer-after-creating-an-ARIMA-sales-forecast-min-1.png" data-orig-size="1024,1024" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="A-car-drinking-beer-after-creating-an-ARIMA-sales-forecast-min-1" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/A-car-drinking-beer-after-creating-an-ARIMA-sales-forecast-min-1.png" src="https://www.relataly.com/wp-content/uploads/2023/03/A-car-drinking-beer-after-creating-an-ARIMA-sales-forecast-min-1-512x512.png" alt="A fluffy cat drinking beer after creating an ARIMA sales forecast. Image created with Midjourney" class="wp-image-12612" srcset="https://www.relataly.com/wp-content/uploads/2023/03/A-car-drinking-beer-after-creating-an-ARIMA-sales-forecast-min-1.png 512w, https://www.relataly.com/wp-content/uploads/2023/03/A-car-drinking-beer-after-creating-an-ARIMA-sales-forecast-min-1.png 300w, https://www.relataly.com/wp-content/uploads/2023/03/A-car-drinking-beer-after-creating-an-ARIMA-sales-forecast-min-1.png 140w, https://www.relataly.com/wp-content/uploads/2023/03/A-car-drinking-beer-after-creating-an-ARIMA-sales-forecast-min-1.png 768w, https://www.relataly.com/wp-content/uploads/2023/03/A-car-drinking-beer-after-creating-an-ARIMA-sales-forecast-min-1.png 1024w" sizes="(max-width: 512px) 100vw, 512px" /><figcaption class="wp-element-caption">A fluffy cat drinking beer after creating an ARIMA sales forecast. Image created with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a></figcaption></figure>
</div>
</div>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<h3 class="wp-block-heading" id="h-prerequisites">Prerequisites</h3>



<p>Before we start coding, ensure you have set up your Python 3 environment and required packages. If you don&#8217;t have an environment, you can follow&nbsp;this tutorial&nbsp;to set up the&nbsp;<a href="https://www.anaconda.com/products/individual" target="_blank" rel="noreferrer noopener">Anaconda environment</a>.</p>



<p>Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:&nbsp;</p>



<ul class="wp-block-list">
<li><em><a href="https://pandas.pydata.org/" target="_blank" rel="noreferrer noopener">pandas</a></em></li>



<li><em><a href="https://numpy.org/" target="_blank" rel="noreferrer noopener">NumPy</a></em></li>



<li><em><a href="https://matplotlib.org/" target="_blank" rel="noreferrer noopener">matplotlib</a></em></li>
</ul>



<p>In addition, we will be using the <a href="https://www.statsmodels.org/stable/index.html" target="_blank" rel="noreferrer noopener">statsmodels </a>library and <a href="https://pypi.org/project/pmdarima/" target="_blank" rel="noreferrer noopener">pmdarima</a>. </p>



<p>You can install packages using console commands:</p>



<ul class="wp-block-list">
<li><em>pip install &lt;package name&gt;</em></li>



<li><em>conda install &lt;package name&gt;</em>&nbsp;(if you are using the anaconda packet manager)</li>
</ul>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h3 class="wp-block-heading" id="h-step-1-load-the-sales-data-to-our-python-project">Step #1 Load the Sales Data to Our Python Project</h3>



<p>In the initial step of this tutorial, we commence by setting up the necessary Python environment. We import several packages that we&#8217;ll be using for data manipulation, visualization, and implementing machine learning models. We then fetch the dataset we&#8217;ll be working with &#8211; the monthly beer sales in the United States from 1992 through 2018. This data is sourced from a publicly accessible URL and loaded into a pandas DataFrame.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># A tutorial for this file is available at www.relataly.com
# Tested with Python 3.88

# Setting up packages for data manipulation and machine learning
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pmdarima as pm
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.tsa.seasonal import seasonal_decompose
import seaborn as sns
sns.set_style('white', { 'axes.spines.right': False, 'axes.spines.top': False})

# Link to the dataset: 
# https://www.kaggle.com/bulentsiyah/for-simple-exercises-time-series-forecasting

path = &quot;https://raw.githubusercontent.com/flo7up/relataly_data/main/alcohol_sales/BeerWineLiquor.csv&quot;
df = pd.read_csv(path)
df.head()</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">		date	beer
0	1/1/1992	1509
1	2/1/1992	1541
2	3/1/1992	1597
3	4/1/1992	1675
4	5/1/1992	1822</pre></div>



<p>As shown above, the sales figures in this dataset stem from the first day of each month.</p>



<h3 class="wp-block-heading" id="h-step-2-visualize-the-time-series-and-check-it-for-stationarity">Step #2 Visualize the Time Series and Check it for Stationarity</h3>



<p>Before modeling the sales data, we visualize the time series and test it for Stationarity. Visualization helps us choose the parameters for our ARIMA model, thus making it an essential step.</p>



<p>First, we will look at the different components of the time series. We do this by using the seasonal_decompose function of the statsmodels library. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Decompose the time series
plt.rcParams[&quot;figure.figsize&quot;] = (10,6)
result = seasonal_decompose(df['beer'], model='multiplicative', period = 12)
result.plot()
plt.show()</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="4890" data-permalink="https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/image-62-3/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2021/06/image-62.png" data-orig-size="712,568" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-62" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2021/06/image-62.png" src="https://www.relataly.com/wp-content/uploads/2021/06/image-62.png" alt="forecasting us beer sales - time series decomposition" class="wp-image-4890" width="754" height="602" srcset="https://www.relataly.com/wp-content/uploads/2021/06/image-62.png 712w, https://www.relataly.com/wp-content/uploads/2021/06/image-62.png 300w" sizes="(max-width: 754px) 100vw, 754px" /></figure>



<p>To test for Stationarity, we use the ADFuller test. It is common to run this test multiple times throughout a data science project. Therefore, we create a function that we can then reuse later.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">def check_stationarity(df_sales, title_string, labels):
    # Visualize the data
    fig, ax = plt.subplots(figsize=(16, 8))
    plt.title(title_string, fontsize=14)
    if df_sales.index.size &gt; 12:
        df_sales['ma_12_month'] = df_sales['beer'].rolling(window=12).mean()
        df_sales['ma_25_month'] = df_sales['beer'].rolling(window=25).mean()
        sns.lineplot(data=df_sales[['beer', 'ma_25_month', 'ma_12_month']], palette=sns.color_palette(&quot;mako_r&quot;, 3))
        plt.legend(title='Smoker', loc='upper left', labels=labels)
    else:
        sns.lineplot(data=df_sales[['beer']])
    
    plt.show()
    
    sales = df_sales['beer'].dropna()
    # Perform an Ad Fuller Test
    # the default alpha = .05 stands for a 95% confidence interval
    adf_test = pm.arima.ADFTest(alpha = 0.05) 
    print(adf_test.should_diff(sales))
    
df_sales = pd.DataFrame(df['beer'], columns=['beer'])
df_sales.index = pd.to_datetime(df['date']) 
title = &quot;Beer sales in the US between 1992 and 2018 in million US$/month&quot;
labels = ['beer', 'ma_12_month', 'ma_25_month']
check_stationarity(df_sales, title, labels)</pre></div>



<figure class="wp-block-image size-full"><img decoding="async" width="940" height="495" data-attachment-id="8582" data-permalink="https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/arima-time-series/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/ARIMA-Time-series.png" data-orig-size="940,495" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="ARIMA-Time-series" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/ARIMA-Time-series.png" src="https://www.relataly.com/wp-content/uploads/2022/05/ARIMA-Time-series.png" alt="arima plot seaborn beer sales forecasting python" class="wp-image-8582" srcset="https://www.relataly.com/wp-content/uploads/2022/05/ARIMA-Time-series.png 940w, https://www.relataly.com/wp-content/uploads/2022/05/ARIMA-Time-series.png 300w, https://www.relataly.com/wp-content/uploads/2022/05/ARIMA-Time-series.png 768w" sizes="(max-width: 940px) 100vw, 940px" /></figure>



<p>The data does not appear to be stationary. We can see that our time series is steadily increasing and shows annual seasonality. The steady increase indicates a continuous growth in beer consumption over the last decades. The seasonality in the sales data likely results from people drinking more beer in summer than in other seasons.</p>



<h3 class="wp-block-heading" id="h-step-3-exemplary-differencing-and-autocorrelation">Step #3 Exemplary Differencing and Autocorrelation</h3>



<p>The chart from the previous section shows that our time series is non-stationary. The reason is that it follows a clear upward trend. We also know that the time series has a seasonal component. Therefore, we need to define additional parameters and construct a SARIMA model.</p>



<p>Before we use auto-correlation to determine the optimal parameters, we will try manual differencing to make the time series stationary. There is no guarantee that differencing works. It is essential to remember that differencing can sometimes also worsen prediction performance. So be careful, not to overdifference! We could also trust that the auto-ARIMA model chooses the best parameters for us. However, we should always validate the selected parameters.</p>



<p>The ideal differencing parameter is the least number of differencing steps to achieve a stationary time series. We will monitor the results with autocorrelation plots to check whether differencing was successful.</p>



<p>We print the autocorrelation for the original time series and after the first and second-order differencing.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># 3.1 Non-seasonal part
def auto_correlation(df, prefix, lags):
    plt.rcParams.update({'figure.figsize':(7,7), 'figure.dpi':120})
    
    # Define the plot grid
    fig, axes = plt.subplots(3,2, sharex=False)

    # First Difference
    axes[0, 0].plot(df)
    axes[0, 0].set_title('Original' + prefix)
    plot_acf(df, lags=lags, ax=axes[0, 1])

    # First Difference
    df_first_diff = df.diff().dropna()
    axes[1, 0].plot(df_first_diff)
    axes[1, 0].set_title('First Order Difference' + prefix)
    plot_acf(df_first_diff, lags=lags - 1, ax=axes[1, 1])

    # Second Difference
    df_second_diff = df.diff().diff().dropna()
    axes[2, 0].plot(df_second_diff)
    axes[2, 0].set_title('Second Order Difference' + prefix)
    plot_acf(df_second_diff, lags=lags - 2, ax=axes[2, 1])
    plt.tight_layout()
    plt.show()
    
auto_correlation(df_sales['beer'], '', 10)</pre></div>



<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="7076" data-permalink="https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/image-8-14/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/04/image-8.png" data-orig-size="826,824" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-8" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/04/image-8.png" src="https://www.relataly.com/wp-content/uploads/2022/04/image-8.png" alt="ARIMA Python Time Series Forecasting Sales Data, checking for stationarity" class="wp-image-7076" width="561" height="560" srcset="https://www.relataly.com/wp-content/uploads/2022/04/image-8.png 826w, https://www.relataly.com/wp-content/uploads/2022/04/image-8.png 300w, https://www.relataly.com/wp-content/uploads/2022/04/image-8.png 150w, https://www.relataly.com/wp-content/uploads/2022/04/image-8.png 768w" sizes="(max-width: 561px) 100vw, 561px" /></figure>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">(0.019143247561160443, False)</pre></div>



<p id="block-db542169-554a-4820-a6a7-a0df5b3b9b6e">The charts above show that the time series becomes stationary after one order differencing. However, we can see that the lag goes into the negative very quickly, which indicates overdifferencing.</p>



<p>Next, we perform the same procedure for the seasonal part of our time series.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># 3.2 Seasonal part

# Reduce the timeframe to a single seasonal period
df_sales_s = df_sales['beer'][0:12]

# Autocorrelation for the seasonal part
auto_correlation(df_sales_s, '', 10)</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="3102" data-permalink="https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/image-20-5/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2021/03/image-20.png" data-orig-size="831,827" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-20" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2021/03/image-20.png" src="https://www.relataly.com/wp-content/uploads/2021/03/image-20.png" alt="arima time series components" class="wp-image-3102" width="571" height="568" srcset="https://www.relataly.com/wp-content/uploads/2021/03/image-20.png 831w, https://www.relataly.com/wp-content/uploads/2021/03/image-20.png 150w, https://www.relataly.com/wp-content/uploads/2021/03/image-20.png 768w, https://www.relataly.com/wp-content/uploads/2021/03/image-20.png 120w" sizes="(max-width: 571px) 100vw, 571px" /></figure>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Check if the first difference of the seasonal period is stationary
df_diff = pd.DataFrame(df_sales_s.diff())
df_diff.index = pd.date_range(df_sales_s.diff().iloc[1], periods=12, freq='MS') 
check_stationarity(df_diff, &quot;First Difference (Seasonal)&quot;, ['difference'])</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="3010" data-permalink="https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/image-8-9/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2021/03/image-8.png" data-orig-size="1569,441" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-8" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2021/03/image-8.png" src="https://www.relataly.com/wp-content/uploads/2021/03/image-8-1024x288.png" alt="seasonality after differencing, arima time series forecasting" class="wp-image-3010" width="687" height="193" srcset="https://www.relataly.com/wp-content/uploads/2021/03/image-8.png 1024w, https://www.relataly.com/wp-content/uploads/2021/03/image-8.png 300w, https://www.relataly.com/wp-content/uploads/2021/03/image-8.png 768w, https://www.relataly.com/wp-content/uploads/2021/03/image-8.png 1536w, https://www.relataly.com/wp-content/uploads/2021/03/image-8.png 1569w" sizes="(max-width: 687px) 100vw, 687px" /></figure>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">(0.99, True)</pre></div>



<p id="block-3731e774-0091-46b9-afa1-2d55e1512fdd">After first order differencing, the seasonal part of the time series is stationary. The autocorrelation plot shows that the values go into the negative but remain within acceptable boundaries. Second-order differencing does not seem to improve these values. Consequently, we conclude that first-order differencing is a good choice for the D parameter. </p>



<h3 class="wp-block-heading" id="h-step-4-finding-an-optimal-model-with-auto-arima">Step #4 Finding an Optimal Model with Auto-ARIMA</h3>



<p>Next, we auto-fit an ARIMA model to our time series. In this way, we ensure that we can later measure the performance of our model against a fresh set of data that the model has not seen so far. We will split our dataset into train and test in preparation for this. </p>



<p>Once we have created the train and test data sets, we can configure the parameters for the auto_arima stepwise optimization. By setting max_d = 1, we tell the model to test no-differencing and first-order differencing. Also, we set max_p and max_q to 3. </p>



<p>To deal with the seasonality in our time series, we set the &#8220;seasonal&#8221; parameter to True and the &#8220;m&#8221; parameter to 12 data points. We turn our model into a SARIMA model that allows us to configure additional D, P, and Q parameters. We define a max value for Q and P of 3. Previously we have already seen that further differencing does not improve the Stationarity. Therefore, we can set the value of D to 1.</p>



<p>After configuring the parameters, we next fit the model to the time series. The model will try to find the optimal parameters and choose the model with the least AIC.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># split into train and test
pred_periods = 30
split_number = df_sales['beer'].count() - pred_periods # corresponds to a prediction horizion  of 2,5 years
df_train = pd.DataFrame(df_sales['beer'][:split_number]).rename(columns={'beer':'y_train'})
df_test = pd.DataFrame(df_sales['beer'][split_number:]).rename(columns={'beer':'y_test'})

# auto_arima
model_fit = pm.auto_arima(df_train, test='adf', 
                         max_p=3, max_d=3, max_q=3, 
                         seasonal=True, m=12,
                         max_P=3, max_D=2, max_Q=3,
                         trace=True,
                         error_action='ignore',  
                         suppress_warnings=True, 
                         stepwise=True)

# summarize the model characteristics
print(model_fit.summary())</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Performing stepwise search to minimize aic
 ARIMA(2,0,2)(1,1,1)[12] intercept   : AIC=inf, Time=3.89 sec
 ARIMA(0,0,0)(0,1,0)[12] intercept   : AIC=3383.210, Time=0.02 sec
 ARIMA(1,0,0)(1,1,0)[12] intercept   : AIC=3351.655, Time=0.38 sec
 ARIMA(0,0,1)(0,1,1)[12] intercept   : AIC=3364.350, Time=1.09 sec
 ARIMA(0,0,0)(0,1,0)[12]             : AIC=3604.145, Time=0.02 sec
 ARIMA(1,0,0)(0,1,0)[12] intercept   : AIC=3349.908, Time=0.11 sec
 ARIMA(1,0,0)(0,1,1)[12] intercept   : AIC=3351.532, Time=0.29 sec
 ARIMA(1,0,0)(1,1,1)[12] intercept   : AIC=3353.520, Time=1.24 sec
 ARIMA(2,0,0)(0,1,0)[12] intercept   : AIC=3312.656, Time=0.10 sec
 ARIMA(2,0,0)(1,1,0)[12] intercept   : AIC=3314.483, Time=0.57 sec
 ARIMA(2,0,0)(0,1,1)[12] intercept   : AIC=3314.378, Time=0.30 sec
 ARIMA(2,0,0)(1,1,1)[12] intercept   : AIC=3305.552, Time=3.02 sec
 ARIMA(2,0,0)(2,1,1)[12] intercept   : AIC=3291.425, Time=4.19 sec
 ARIMA(2,0,0)(2,1,0)[12] intercept   : AIC=3306.914, Time=3.06 sec
 ARIMA(2,0,0)(3,1,1)[12] intercept   : AIC=3276.501, Time=4.67 sec
 ARIMA(2,0,0)(3,1,0)[12] intercept   : AIC=3282.240, Time=5.24 sec
 ARIMA(2,0,0)(3,1,2)[12] intercept   : AIC=inf, Time=7.39 sec
 ARIMA(2,0,0)(2,1,2)[12] intercept   : AIC=inf, Time=4.74 sec
 ARIMA(1,0,0)(3,1,1)[12] intercept   : AIC=3313.877, Time=5.17 sec
 ARIMA(3,0,0)(3,1,1)[12] intercept   : AIC=3246.820, Time=5.72 sec
 ARIMA(3,0,0)(2,1,1)[12] intercept   : AIC=3255.313, Time=5.33 sec
 ARIMA(3,0,0)(3,1,0)[12] intercept   : AIC=3249.998, Time=6.77 sec
 ARIMA(3,0,0)(3,1,2)[12] intercept   : AIC=inf, Time=8.39 sec
 ARIMA(3,0,0)(2,1,0)[12] intercept   : AIC=3259.938, Time=3.55 sec
...
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).</pre></div>



<p>Auto-ARIMA has determined that the best model is (3,0,0)(3,1,1). These results match the results from section 3, in which we manually performed differencing. </p>



<h3 class="wp-block-heading" id="h-step-5-simulate-the-time-series-using-in-sample-forecasting">Step #5 Simulate the Time Series using in-sample Forecasting</h3>



<p>Now that we have trained our model, we want to use it to simulate the entire time series. We will do this by calling the predict method in the sample function. The prediction will match the same period as the original time series with which we trained the model. Because the model predicts one step, the prediction results will naturally be close to the actual values.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Generate in-sample Predictions
# The parameter dynamic=False means that the model makes predictions upon the lagged values.
# This means that the model is trained until a point in the time-series and then tries to predict the next value.
pred = model_fit.predict_in_sample(dynamic=False) # works only with auto-arima
df_train['y_train_pred'] = pred

# Calculate the percentage difference
df_train['diff_percent'] = abs((df_train['x_train'] - pred) / df_train['x_train'])* 100

# Print the predicted time-series
fig, ax1 = plt.subplots(figsize=(16, 8))
plt.title(&quot;In Sample Sales Prediction&quot;, fontsize=14)
sns.lineplot(data=df_train[['x_train', 'y_train_pred']], linewidth=1.0)

# Print percentage prediction errors on a separate axis (ax2)
ax2 = ax1.twinx() 
ax2.set_ylabel('Prediction Errors in %', color='purple', fontsize=14)  
ax2.set_ylim([0, 50])
ax2.bar(height=df_train['diff_percent'][20:], x=df_train.index[20:], width=20, color='purple', label='absolute errors')
plt.legend()
plt.show()</pre></div>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="521" data-attachment-id="11758" data-permalink="https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/image-6/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-6.png" data-orig-size="1620,825" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-6" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-6.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-6-1024x521.png" alt="arima forecast for us bear sales created in python" class="wp-image-11758" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-6.png 1024w, https://www.relataly.com/wp-content/uploads/2022/12/image-6.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-6.png 768w, https://www.relataly.com/wp-content/uploads/2022/12/image-6.png 1536w, https://www.relataly.com/wp-content/uploads/2022/12/image-6.png 1620w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>Next, we take a look at the prediction errors.</p>



<h3 class="wp-block-heading" id="h-step-6-generate-and-visualize-a-sales-forecast">Step #6 Generate and Visualize a Sales Forecast</h3>



<p>Now that we have trained an optimal model, we are ready to generate a sales forecast. First, we specify the number of periods that we want to predict. In addition, we create an index from the number of predictions adjacent to the original time series and continue it (prediction_index).</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Generate prediction for n periods, 
# Predictions start from the last date of the training data
test_pred = model_fit.predict(n_periods=pred_periods, dynamic=False)
df_test['y_test_pred'] = test_pred
df_union = pd.concat([df_train, df_test])
df_union.rename(columns={'beer':'y_test'}, inplace=True)

# Print the predicted time-series
fig, ax = plt.subplots(figsize=(16, 8))
plt.title(&quot;Test/Pred Comparison&quot;, fontsize=14)
sns.despine();
sns.lineplot(data=df_union[['y_train', 'y_train_pred', 'y_test', 'y_test_pred']], linewidth=1.0, dashes=False, palette='muted')
ax.set_xlim([df_union.index[150],df_union.index.max()])
plt.legend()
plt.show()</pre></div>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="539" data-attachment-id="8529" data-permalink="https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/arima-plot-seaborn-beer-sales-forecasting-python-test-predictions-1/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/05/arima-plot-seaborn-beer-sales-forecasting-python-test-predictions-1.png" data-orig-size="1566,825" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="arima-plot-seaborn-beer-sales-forecasting-python-test-predictions-1" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/05/arima-plot-seaborn-beer-sales-forecasting-python-test-predictions-1.png" src="https://www.relataly.com/wp-content/uploads/2022/05/arima-plot-seaborn-beer-sales-forecasting-python-test-predictions-1-1024x539.png" alt="time series forecast on us beer sales with arima Test Pred Comparison, python tutorial" class="wp-image-8529" srcset="https://www.relataly.com/wp-content/uploads/2022/05/arima-plot-seaborn-beer-sales-forecasting-python-test-predictions-1.png 1024w, https://www.relataly.com/wp-content/uploads/2022/05/arima-plot-seaborn-beer-sales-forecasting-python-test-predictions-1.png 300w, https://www.relataly.com/wp-content/uploads/2022/05/arima-plot-seaborn-beer-sales-forecasting-python-test-predictions-1.png 768w, https://www.relataly.com/wp-content/uploads/2022/05/arima-plot-seaborn-beer-sales-forecasting-python-test-predictions-1.png 1536w, https://www.relataly.com/wp-content/uploads/2022/05/arima-plot-seaborn-beer-sales-forecasting-python-test-predictions-1.png 1566w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>As shown above, our model&#8217;s forecast continues the seasonal pattern of the beer sales time series. On the one hand, this indicates that US beer sales will continue to rise and, on the other hand, that our model works just fine 🙂</p>



<h3 class="wp-block-heading" id="h-step-7-measure-the-performance-of-the-sales-forecasting-model">Step #7 Measure the Performance of the Sales Forecasting Model</h3>



<p>In this section, we will measure the performance of our ARIMA model. To learn more about this topic, check out <a href="https://www.relataly.com/regression-error-metrics-python/923/" target="_blank" rel="noreferrer noopener">this relataly article measuring regression performance</a>.</p>



<p>The previous section&#8217;s simulation chart shows a few outliers among the prediction errors. Therefore, we focus our analysis on the percentage errors. Two helpful metrics are the mean absolute error (MAPE) and the mean absolute percentage error (MDAPE).</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Mean Absolute Percentage Error (MAPE)
MAPE = np.mean((np.abs(np.subtract(df_test['y_test'], df_test['y_test_pred'])/ df_test['y_test']))) * 100
print(f'Mean Absolute Percentage Error (MAPE): {np.round(MAPE, 2)} %')

# Median Absolute Percentage Error (MDAPE)
MDAPE = np.median((np.abs(np.subtract(df_test['y_test'], df_test['y_test_pred'])/ df_test['y_test'])) ) * 100
print(f'Median Absolute Percentage Error (MDAPE): {np.round(MDAPE, 2)} %')</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Mean Absolute Percentage Error (MAPE): 3.94 %  Median Absolute Percentage Error (MDAPE): 3.49 %</pre></div>



<p>The percent errors show that our ARIMA model achieves a decent predictive performance.</p>



<h2 class="wp-block-heading" id="h-summary">Summary</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>This Python tutorial has shown how to use SARIMA for sales forecasting. Sales forecasting is important for businesses because it can help them to make informed decisions about production, inventory management, and staffing, among other things. By accurately forecasting sales, businesses can ensure that they have the right amount of product available to meet customer sales, avoid overproduction and excess inventory, and plan for future growth. The use cases presented were forecasting beer sales, and we have used arima to analyze seasonal sales data. </p>



<p>In the first part, we have learned how ARIMA works, what Stationarity is and how to check if a time series is stationary. In the second part, we developed an ARIMA model in Python to create a forecast for US beer sales. For this purpose, we created an in-sample forecast and used Auto-tARIMA to find the optimal parameters for our sales forecasting model.</p>



<p>If you have any questions or suggestions, please let me know in the comments, and I will do my best to answer. </p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"><div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" width="506" height="496" data-attachment-id="12603" data-permalink="https://www.relataly.com/brewery-arima-beer-sales-forecasting-python-tutorial-machine-learning-relataly2-min/" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/03/brewery-arima-beer-sales-forecasting-python-tutorial-machine-learning-relataly2-min.png" data-orig-size="506,496" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="Now that we have learned to use ARIMA to forecast beer sales, you really deserved yourself a beer. Cheers! Image created with Midjourney" data-image-description="&lt;p&gt;Now that we have learned to use ARIMA to forecast beer sales, you really deserved yourself a beer. Cheers! Image created with Midjourney&lt;/p&gt;
" data-image-caption="&lt;p&gt;Now that we have learned to use ARIMA to forecast beer sales, you really deserved yourself a beer. Cheers! Image created with Midjourney&lt;/p&gt;
" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/brewery-arima-beer-sales-forecasting-python-tutorial-machine-learning-relataly2-min.png" src="https://www.relataly.com/wp-content/uploads/2023/03/brewery-arima-beer-sales-forecasting-python-tutorial-machine-learning-relataly2-min.png" alt="Now that we have learned to use ARIMA to forecast beer sales, you really deserved yourself a beer. Cheers! Image created with Midjourney" class="wp-image-12603" srcset="https://www.relataly.com/wp-content/uploads/2023/03/brewery-arima-beer-sales-forecasting-python-tutorial-machine-learning-relataly2-min.png 506w, https://www.relataly.com/wp-content/uploads/2023/03/brewery-arima-beer-sales-forecasting-python-tutorial-machine-learning-relataly2-min.png 300w" sizes="(max-width: 506px) 100vw, 506px" /><figcaption class="wp-element-caption">Now that you have learned to use ARIMA to forecast beer sales, you really earned yourself a beer. Cheers! Image created with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a></figcaption></figure>
</div></div>
</div>



<p></p>



<h2 class="wp-block-heading" id="h-sources-and-further-reading">Sources and Further Reading</h2>



<ol class="wp-block-list"><li><a href="https://amzn.to/3MyU6Tj" target="_blank" rel="noreferrer noopener">Charu C. Aggarwal (2018) Neural Networks and Deep Learning</a></li><li><a href="https://amzn.to/3yIQdWi" target="_blank" rel="noreferrer noopener">Jansen (2020) Machine Learning for Algorithmic Trading: Predictive models to extract signals from market and alternative data for systematic trading strategies with Python</a></li><li><a href="https://amzn.to/3S9Nfkl" target="_blank" rel="noreferrer noopener">Aurélien Géron (2019) Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems </a></li><li><a href="https://amzn.to/3EKidwE" target="_blank" rel="noreferrer noopener">David Forsyth (2019) Applied Machine Learning Springer</a></li><li><a href="https://amzn.to/3MAy8j5" target="_blank" rel="noreferrer noopener">Andriy Burkov (2020) Machine Learning Engineering</a></li></ol>



<p class="has-contrast-2-color has-base-3-background-color has-text-color has-background"><em>The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.</em></p>



<p>Want to learn more about time series analysis and prediction?<br>Check out these recent relataly tutorials:</p>



<ul class="wp-block-list">
<li><a href="https://www.relataly.com/stock-market-prediction-using-a-recurrent-neural-network/122/" target="_blank" rel="noreferrer noopener">Stock Market Prediction &#8211; Building a&nbsp;Univariate Model using Keras Recurrent Neural Networks in Python</a>.</li>



<li><a href="https://www.relataly.com/stock-market-prediction-with-multivariate-time-series-in-python/1815/" target="_blank" rel="noreferrer noopener">Building Multivariate Time Series Models for Stock Market Prediction with Python</a></li>



<li><a href="https://www.relataly.com/multi-step-time-series-forecasting-a-step-by-step-guide/275/" target="_blank" rel="noreferrer noopener">Time Series Forecasting &#8211; Creating a Multi-Step Forecast in Python</a></li>



<li><a href="https://www.relataly.com/measuring-prediction-errors-in-time-series-forecasting/809/" target="_blank" rel="noreferrer noopener">Python Cheat Sheet: Measuring Prediction Errors in Time Series Forecasting</a></li>



<li><a href="https://www.relataly.com/evaluating-time-series-forecasting-models/923/" target="_blank" rel="noreferrer noopener">Evaluate Time Series Forecasting Models with Python</a></li>
</ul>
<p>The post <a href="https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/">Forecasting Beer Sales with ARIMA in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.relataly.com/forecasting-beer-sales-with-arima-in-python/2884/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">2884</post-id>	</item>
		<item>
		<title>Image Classification with Convolutional Neural Networks &#8211; Classifying Cats and Dogs in Python</title>
		<link>https://www.relataly.com/image-classification-with-deep-learning/2485/</link>
					<comments>https://www.relataly.com/image-classification-with-deep-learning/2485/#respond</comments>
		
		<dc:creator><![CDATA[Florian Follonier]]></dc:creator>
		<pubDate>Sun, 13 Dec 2020 14:09:31 +0000</pubDate>
				<category><![CDATA[Classification (two-class)]]></category>
		<category><![CDATA[Convolutional Neural Network (CNN)]]></category>
		<category><![CDATA[Data Sources]]></category>
		<category><![CDATA[Image Recognition]]></category>
		<category><![CDATA[Keras]]></category>
		<category><![CDATA[Neural Networks]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Tensorflow]]></category>
		<category><![CDATA[Use Cases]]></category>
		<category><![CDATA[Beginner Tutorials]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[Deep Learning]]></category>
		<category><![CDATA[Image Dataset]]></category>
		<category><![CDATA[Supervised Learning]]></category>
		<category><![CDATA[Two-Label Classification]]></category>
		<guid isPermaLink="false">https://www.relataly.com/?p=2485</guid>

					<description><![CDATA[<p>This tutorial shows how to use Convolutional Neural Networks (CNNs) with Python for image classification. CNNs belong to the field of deep learning, a subarea of machine learning, and have become a cornerstone to many exciting innovations. There are endless applications, from self-driving cars over biometric security to automated tagging in social media. And the ... <a title="Image Classification with Convolutional Neural Networks &#8211; Classifying Cats and Dogs in Python" class="read-more" href="https://www.relataly.com/image-classification-with-deep-learning/2485/" aria-label="Read more about Image Classification with Convolutional Neural Networks &#8211; Classifying Cats and Dogs in Python">Read more</a></p>
<p>The post <a href="https://www.relataly.com/image-classification-with-deep-learning/2485/">Image Classification with Convolutional Neural Networks &#8211; Classifying Cats and Dogs in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>This tutorial shows how to use Convolutional Neural Networks (CNNs) with Python for image classification. CNNs belong to the field of deep learning, a subarea of machine learning, and have become a cornerstone to many exciting innovations. There are endless applications, from self-driving cars over biometric security to automated tagging in social media. And the importance of CNNs grows steadily! So there are plenty of reasons to understand how this technology works and how we can implement it. </p>



<p>This article proceeds as follows: The first part introduces the core concepts behind CNNs and explains their use in image classification. The second part is a hands-on tutorial in which you will build your own CNN to distinguish images of cats and dogs. This tutorial develops a model that achieves around 82% validation accuracy. We will work with TensorFlow and Python to integrate different layers, such as Convolution Layers, Dense layers, and MaxPooling. Furthermore, we will prevent the network from overfitting the training data by using Dropout between the layers. We will also load the model and make predictions on a fresh set of images. Finally, we analyze and illustrate the performance of our image classifier. </p>



<p>Also: <a href="https://www.relataly.com/automated-prompt-generation-for-dall-e-using-chatgpt-in-python-a-step-by-step-api-tutorial/12143/" target="_blank" rel="noreferrer noopener">Generating Detailed Images with OpenAI DALL-E and ChatGPT in Python: A Step-By-Step API Tutorial</a></p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h2 class="wp-block-heading" id="h-image-classification-with-convolutional-neural-networks">Image Classification with Convolutional Neural Networks</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>The history of image recognition dates back to the mid-1960s when the first attempts were made to identify objects by coding their characteristic shapes and lines. However, this task turned out to be incredibly complex. Our human brain is trained so well to recognize things that one can easily forget how diverse the observation conditions can be. Here are some examples:</p>



<ul class="wp-block-list">
<li>Fotos can be taken from various viewpoints</li>



<li>Living things can have multiple forms and poses</li>



<li>Objects come in different forms, colors, and sizes</li>



<li>The picture may hide parts of the things in the picture</li>



<li>The light conditions vary from image  to image</li>



<li>There may be one or multiple objects in the same image</li>
</ul>



<p>At the beginning of the 1990s, the focus of research shifted to statistical approaches and learning algorithms.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-large"><img decoding="async" width="512" height="512" data-attachment-id="13345" data-permalink="https://www.relataly.com/image-classification-with-deep-learning/2485/machine_learning_computer_vision_dazzling_magic_neural_network-min/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/03/machine_learning_computer_vision_dazzling_magic_neural_network-min.png" data-orig-size="1024,1024" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="machine_learning_computer_vision_dazzling_magic_neural_network-min" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/machine_learning_computer_vision_dazzling_magic_neural_network-min.png" src="https://www.relataly.com/wp-content/uploads/2023/03/machine_learning_computer_vision_dazzling_magic_neural_network-min-512x512.png" alt="The idea of computer vision is inspired by the fact that the visual cortex has cells activated by specific shapes and their orientation in the visual field. " class="wp-image-13345" srcset="https://www.relataly.com/wp-content/uploads/2023/03/machine_learning_computer_vision_dazzling_magic_neural_network-min.png 512w, https://www.relataly.com/wp-content/uploads/2023/03/machine_learning_computer_vision_dazzling_magic_neural_network-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/03/machine_learning_computer_vision_dazzling_magic_neural_network-min.png 140w, https://www.relataly.com/wp-content/uploads/2023/03/machine_learning_computer_vision_dazzling_magic_neural_network-min.png 768w, https://www.relataly.com/wp-content/uploads/2023/03/machine_learning_computer_vision_dazzling_magic_neural_network-min.png 1024w" sizes="(max-width: 512px) 100vw, 512px" /><figcaption class="wp-element-caption">The idea of computer vision is inspired by the fact that the visual cortex has cells activated by specific shapes and their orientation in the visual field. </figcaption></figure>
</div>
</div>



<p></p>



<h3 class="wp-block-heading" id="h-the-emergence-of-cnns">The Emergence of CNNs</h3>



<p>The basic concept of a neural network in computer vision has existed since the 1980s. It goes back to research from Hubel and Wiesel on the emergence of a cat&#8217;s visual system. They found that the visual cortex has cells activated by specific shapes and their orientation in the visual field. Some of their findings inspired the development of crucial computer vision technologies, such as, for example, hierarchical features with different levels of abstraction [1, 2]. However, it took another three decades of research and the availability of faster computers before the emergence of modern CNNs.</p>



<p>The year 2012  was a defining moment for the use of CNNs in image recognition. This year, for the first time, CNN won the <a href="http://www.image-net.org/challenges/LSVRC/" target="_blank" rel="noreferrer noopener">ILSVRC </a>competition for computer vision. The challenge was classifying more than a hundred thousand images into 1000 object categories. With an error rate of only 15,3%, the succeeding model was a CNN called &#8220;AlexNet.&#8221;.</p>



<p>AlexNet was the first model to achieve more than 75% accuracy. In the same year, CNNs succeeded in several other competitions. For example, in 2015, the CNN ResNet exceeded human performance in the ILSVRC competition. Only a decade ago, this achievement was considered almost impossible. So how was this performance increase possible? To understand this surge in performance, let us first look at what a picture is.</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="2653" data-permalink="https://www.relataly.com/image-classification-with-deep-learning/2485/image-15-5/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2020/12/image-15.png" data-orig-size="1081,506" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-15" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2020/12/image-15.png" src="https://www.relataly.com/wp-content/uploads/2020/12/image-15-1024x479.png" alt="" class="wp-image-2653" width="848" height="395" srcset="https://www.relataly.com/wp-content/uploads/2020/12/image-15.png 300w, https://www.relataly.com/wp-content/uploads/2020/12/image-15.png 768w" sizes="(max-width: 848px) 100vw, 848px" /><figcaption class="wp-element-caption">Top-performing models in the ImageNet image classification challenge (Alyafeai &amp; Ghouti, 2019)</figcaption></figure>



<h3 class="wp-block-heading" id="h-what-is-an-image">What is an Image?</h3>



<p>A digital image is a three-dimensional array of integer values. One dimension of this array represents the pixel width, and one dimension represents the height of the picture. The third dimension contains the color depth, defined by the image format. As shown below, we can thus represent the format of a digital image as &#8220;width x height x depth.&#8221; Next, let&#8217;s have a quick look at different image formats.</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="2649" data-permalink="https://www.relataly.com/image-classification-with-deep-learning/2485/image-11-6/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2020/12/image-11.png" data-orig-size="1152,437" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-11" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2020/12/image-11.png" src="https://www.relataly.com/wp-content/uploads/2020/12/image-11-1024x388.png" alt="an image is a multidimensional integer array" class="wp-image-2649" width="861" height="326" srcset="https://www.relataly.com/wp-content/uploads/2020/12/image-11.png 1024w, https://www.relataly.com/wp-content/uploads/2020/12/image-11.png 300w, https://www.relataly.com/wp-content/uploads/2020/12/image-11.png 768w, https://www.relataly.com/wp-content/uploads/2020/12/image-11.png 1152w" sizes="(max-width: 861px) 100vw, 861px" /><figcaption class="wp-element-caption">A digital image is a multidimensional integer array.</figcaption></figure>



<h3 class="wp-block-heading" id="h-overview-of-different-image-formats">Overview of Different Image Formats</h3>



<p>We can train CNNs with different image formats, but the input data are always multidimensional arrays of integer values. One of the most commonly used color formats in deep learning is &#8220;RGB.&#8221; RGB stands for the three color channels: &#8220;Red,&#8221; &#8220;Green,&#8221; and &#8220;Blue.&#8221; RGB images are divided into three layers of integer values, one layer for each color channel—the integer values of a 16-bit RGB image in each layer range from 1 to 255. Together, the three layers can reproduce 65,536 different colors. </p>



<p>In contrast to RGB images, grey-scale images only have a single color layer. This layer resembles the brightness of each pixel in the image. Consequently, the format of a grey-scale image is width x height x 1. Using grey-scale images or images with black and white shades instead of RGB images can speed up the training process because less data needs to be processed. However, image data with multiple color channels provide the model with more information, leading to better predictions. The RGB format is often a good choice between prediction quality and performance. Next, let&#8217;s look at how CNNs handle digital images in the learning process.</p>



<h3 class="wp-block-heading" id="h-convolutional-neural-networks">Convolutional Neural Networks</h3>



<p>As mentioned before, a CNN is a specific form of an artificial neural network. The main difference between the CNN and the standard multi-layer perceptron is their convolutional layers. CNNs can have other layers, but the convolutions make a CNN so good at detecting objects. They allow the network to identify patterns based on features that work regardless of where in the image they occur. Let&#8217;s see how this works in more detail.</p>



<h4 class="wp-block-heading" id="h-convolutional-layers">Convolutional Layers</h4>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<p>Convolutional layers use a rasterizing technique that breaks down an image into smaller groups of pixels called filters. Filters act as feature detectors from the original image. The primary purpose is to extract meaningful features from the input images.</p>



<p>During the training, the CNN slides the filter over image locations and calculates the dot product for each feature at a time. The results of these calculations are stored in a so-called feature map (sometimes called an activation map). A feature map represents where in the image a particular feature was identified. Subsequently, the values from the feature map are transformed with an activation function (usually ReLu), and the algorithm uses them as input to the next layer.</p>


<div class="wp-block-image">
<figure class="alignleft size-large is-resized"><img decoding="async" data-attachment-id="6596" data-permalink="https://www.relataly.com/image-classification-with-deep-learning/2485/image-1/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/04/image-1.png" data-orig-size="1237,502" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-1" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/04/image-1.png" src="https://www.relataly.com/wp-content/uploads/2022/04/image-1-1024x416.png" alt="Illustration of operations in the convolutional layers" class="wp-image-6596" width="811" height="332"/><figcaption class="wp-element-caption">Illustration of operations in the convolutional layers</figcaption></figure>
</div></div>
</div>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<p>Features become more complex with the increasing depth of the network. In the first layer of the network, convolutions will detect generic geometric forms and low-level features based on edges, corners, squares, or circles. The subsequent layers of the network will look at more sophisticated shapes and may, for example, include features that resemble the form of an eye of a cat or the nose of a dog. In this way, convolutions provide the network with features at different levels of detail that enable powerful detection patterns.</p>


<div class="wp-block-image">
<figure class="alignleft size-large is-resized"><img decoding="async" data-attachment-id="2661" data-permalink="https://www.relataly.com/image-classification-with-deep-learning/2485/image-16-4/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2020/12/image-16.png" data-orig-size="1908,819" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-16" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2020/12/image-16.png" src="https://www.relataly.com/wp-content/uploads/2020/12/image-16-1024x440.png" alt="Convolutions at the example of an image that contains the number &quot;3&quot;" class="wp-image-2661" width="874" height="374" srcset="https://www.relataly.com/wp-content/uploads/2020/12/image-16.png 300w, https://www.relataly.com/wp-content/uploads/2020/12/image-16.png 768w, https://www.relataly.com/wp-content/uploads/2020/12/image-16.png 1536w, https://www.relataly.com/wp-content/uploads/2020/12/image-16.png 1908w" sizes="(max-width: 874px) 100vw, 874px" /><figcaption class="wp-element-caption">Exemplary convolutions of an image that contains the number &#8220;3.&#8221;</figcaption></figure>
</div></div>
</div>



<h4 class="wp-block-heading" id="h-pooling-downsampling">Pooling / Downsampling</h4>



<p>A convolutional layer is usually followed by a pooling operation, which reduces the amount of data by filtering unnecessary information. This process is also called downsampling or subsampling. There are various forms of pooling. In the most common variant &#8211; max-pooling &#8211; only the highest value in a predefined grid (e.g., 2&#215;2) is processed, and the remaining values are discarded. For example, imagine a 2&#215;2 grid with values 0.1, 0.5, 0.4, and 0.8. The algorithm would only process the 0,8 further for this grid and use it as part of the input to the next layer. The advantages of pooling are reduced data and faster training times. Because pooling minimizes the complexity of the network, it allows for the construction of deeper architectures with more layers. In addition, pooling offers a certain protection against overfitting during training.</p>



<h4 class="wp-block-heading" id="h-dropout">Dropout</h4>



<p>Dropout is another technique that helps prevent the network from overfitting the training data. When we activate Dropout for a layer, the algorithm will remove a random number of neurons from the layer per training step. As a result, the network needs to learn patterns that give less weight to individual layers and thus generalize better. The dropout rate controls the percentage of switched-off neurons in each training iteration. We can configure Dropout for each layer separately. </p>



<p>CNNs with many layers and training epochs tend to overfit the training data. Especially here, Dropout is crucial to avoid overfitting and to achieve good prediction results with data that the network does not know yet. A typical value for the rate lies between 10% to 30%.</p>



<h4 class="wp-block-heading" id="h-multi-layer-perceptron-mlp">Multi-Layer Perceptron (MLP)</h4>



<p>The CNN architecture ends with multiple dense layers that are fully connected. The layers are part of a Multilayer Perception (MLP), which has the task of dense down the results from the previous convolutions and outputting one of the multiple classes. Consequently, the number of neurons in the final dense layer usually corresponds to the number of different classes to be predicted. It is also possible to use a single neuron in the final layer for two-class prediction problems. In this case, the last neuron outputs a binary label of 0 or 1.</p>



<h2 class="wp-block-heading" id="h-building-a-cnn-with-tensorflow-that-classifies-cats-and-dogs">Building a CNN with Tensorflow that Classifies Cats and Dogs</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Now that you are familiar with the basic concepts behind convolutional neural networks, we can commence with the practical part and build an image classifier. In the following, we will train a CNN to distinguish images of cats and dogs. We first define a CNN model and then feed it a few thousand photos from a public dataset with labeled images of cats and dogs.</p>



<p>Distinguishing cats and dogs may not sound difficult, but many challenges exist. Imagine the almost infinite circumstances in which animals can be photographed, not to mention the many forms a cat can take. These variations lead to the fact that even humans sometimes confuse a cat with a dog or vice versa. So don&#8217;t expect our model to be perfect right from the start. Our model will score around 82% accuracy on the validation dataset.</p>



<p>The code is available on the GitHub repository.</p>



<div class="wp-block-kadence-advancedbtn kb-buttons-wrap kb-btns_d70aa4-6e"><a class="kb-button kt-button button kb-btn_b926ba-d4 kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit  kt-btn-has-text-true kt-btn-has-svg-true  wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-tutorials/blob/master/06%20Computer%20Vision/200%20Classifying%20Cats%20%26%20Dogs%20Binary.ipynb" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fe_eye kt-btn-icon-side-left"><svg viewBox="0 0 24 24"  fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M1 12s4-8 11-8 11 8 11 8-4 8-11 8-11-8-11-8z"/><circle cx="12" cy="12" r="3"/></svg></span><span class="kt-btn-inner-text">View on GitHub </span></a>

<a class="kb-button kt-button button kb-btn_80b142-4f kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit  kt-btn-has-text-true kt-btn-has-svg-true  wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-API-tutorials" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fa_github kt-btn-icon-side-left"><svg viewBox="0 0 496 512"  fill="currentColor" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg></span><span class="kt-btn-inner-text">Relataly GitHub Repo </span></a></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image is-resized"><img decoding="async" src="https://www.relataly.com/wp-content/uploads/2022/04/Image-Recognition-Convolutional-Neural-Networks.png" alt="Image Recognition Convolutional Neural Networks - classifying cats and dogs python " width="382" height="125"/><figcaption class="wp-element-caption">Cat or Dog? That&#8217;s what our CNN will predict.</figcaption></figure>
</div>
</div>



<div style="height:29px" aria-hidden="true" class="wp-block-spacer"></div>



<h3 class="wp-block-heading" id="h-prerequisites">Prerequisites</h3>



<p>Before starting the coding part, make sure that you have set up your <a href="https://www.python.org/downloads/" target="_blank" rel="noreferrer noopener">Python 3</a> environment and required packages. If you don&#8217;t have an environment, you can follow&nbsp;<a href="https://www.relataly.com/anaconda-python-environment-machine-learning/1663/" target="_blank" rel="noreferrer noopener">this tutorial</a>&nbsp;to set up the&nbsp;<a href="https://www.anaconda.com/products/individual" target="_blank" rel="noreferrer noopener">Anaconda environment</a>.</p>



<p>Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:&nbsp;</p>



<ul class="wp-block-list">
<li><em><a href="https://pandas.pydata.org/" target="_blank" rel="noreferrer noopener">pandas</a></em></li>



<li><em><a href="https://numpy.org/" target="_blank" rel="noreferrer noopener">NumPy</a></em></li>



<li><a href="https://docs.python.org/3/library/math.html" target="_blank" rel="noreferrer noopener">math</a></li>



<li><em><a href="https://matplotlib.org/" target="_blank" rel="noreferrer noopener">matplotlib</a></em></li>
</ul>



<p>In addition, we will be using <em><a href="https://keras.io/" target="_blank" rel="noreferrer noopener">Keras&nbsp;</a></em>(2.0 or higher) with <a href="https://www.tensorflow.org/" target="_blank" rel="noreferrer noopener"><em>Tensorflow</em> </a>backend and the machine learning library <a href="https://scikit-learn.org/stable/" target="_blank" rel="noreferrer noopener">Scikit-learn</a>.</p>



<p>You can install packages using console commands:</p>



<ul class="wp-block-list">
<li><em>pip install &lt;package name&gt;</em></li>



<li><em>conda install &lt;package name&gt;</em>&nbsp;(if you are using the anaconda packet manager)</li>
</ul>



<h4 class="wp-block-heading" id="h-download-the-dataset">Download the Dataset</h4>



<p>We will train our image classification model with a public dataset from <a href="http://www.kaggle.com" target="_blank" rel="noreferrer noopener">Kaggle.com</a>. The dataset contains more than 25.000 JPG pictures of cats and dogs. The images are uniformly named and numbered, for example, dog.1.jpg, dog.2.jpg, dog.3.jpg, cat.1.jpg, cat.2.jpg, and so on. You can download the picture set directly from Kaggle: <a href="https://www.kaggle.com/c/dogs-vs-cats/overview" target="_blank" rel="noreferrer noopener">cats-vs-dogs</a>. </p>



<h4 class="wp-block-heading" id="h-setup-the-folder-structure">Setup the Folder Structure</h4>



<p>There are different ways data can be structured and loaded during model training. One approach (1) is to split the images into classes and create a separate folder for each class, class_a, class_b, etc. Another method (2) is to put all images into a single folder and define a DataFrame that splits the data into test and train. Because the cats and dogs dataset files already contain the classes in their name, I decided to go for the second approach. </p>



<p>Before we begin with the coding part, we create a folder structure that looks as follows:</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="2676" data-permalink="https://www.relataly.com/image-classification-with-deep-learning/2485/image-17-4/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2020/12/image-17.png" data-orig-size="532,286" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-17" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2020/12/image-17.png" src="https://www.relataly.com/wp-content/uploads/2020/12/image-17.png" alt="structure of the data that we will use to train the convolutional neural network" class="wp-image-2676" width="409" height="220" srcset="https://www.relataly.com/wp-content/uploads/2020/12/image-17.png 532w, https://www.relataly.com/wp-content/uploads/2020/12/image-17.png 300w" sizes="(max-width: 409px) 100vw, 409px" /><figcaption class="wp-element-caption">The folder structure of our cats and dogs prediction project</figcaption></figure>



<p>If you want to use the standard pathways given in the python tutorial, make sure that your notebook resides in the parent folder of the &#8220;data&#8221; folder.</p>



<p>After you have created the folder structure, open the cats-vs-dogs zip file. The ZIP file contains the folders &#8220;train,&#8221; &#8220;test,&#8221; and &#8220;sample.&#8221; Unzip the JPG files from the &#8220;train&#8221; (20.000 images) and the &#8220;test&#8221; folder (5.000 pictures) to the &#8220;train&#8221; folder of your project. Afterward, the train folder should contain 25.000 images. The sample folder is intended to include your sample images, for example, of your pet. We will later use the images from the sample folder to test the model on new real-world data. </p>



<p>We have fulfilled all requirements and can start with the coding part.</p>



<h3 class="wp-block-heading" id="h-step-1-make-imports-and-check-training-device">Step #1 Make Imports and Check Training Device</h3>



<p>We begin by setting up the imports for this project. I have put the package imports at the beginning to give you a  quick overview of the packages you need to install.</p>



<p>Using the GPU instead of the CPU allows for faster training times. However, setting up Tensorflow to work with the GPUs can cause problems. Not everyone has a GPU; in this case, TensorFlow should usually automatically run all code on the CPU. However, should you for any reason prefer to manually switch to CPU training, change [&#8220;CUDA_VISIBLE_DEVICES&#8221;]= &#8220;1&#8221; to &#8220;-1&#8221;. As a result, Tensorflow will run all code on the CPU and ignore all available GPUs. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">import os
#os.environ[&quot;CUDA_VISIBLE_DEVICES&quot;]=&quot;-1&quot; 

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D
from tensorflow.keras.layers import Conv2D, Activation, Dropout, Flatten, Dense, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.metrics import Accuracy
from tensorflow.keras import regularizers
from tensorflow.keras.optimizers import SGD, Adam
from tensorflow.python.client import device_lib
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score

tf.config.allow_growth = True
tf.config.per_process_gpu_memory_fraction = 0.9

from random import randint
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import seaborn as sns
from PIL import Image
import random as rdn</pre></div>



<p>Running the command below checks the TensorFlow version and the number of available GPUs in our system. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># check the tensorflow version
print('Tensorflow Version: ' + tf.__version__)

# check the number of available GPUs
physical_devices = tf.config.list_physical_devices('GPU')
print(&quot;Num GPUs:&quot;, len(physical_devices))</pre></div>



<pre class="wp-block-preformatted">Tensorflow Version: 2.4.0-rc3
Num GPUs: 1</pre>



<p>My GPU is an RTX 3080. When I wrote this article, the GPU was not yet supported by the standard TensorFlow release. I have therefore used the pre-release version of TensorFlow (2.4.0-rc3). I expect the following standard release (2.3) to work fine. </p>



<p>In my case, the GPU check returns one because I have a single GPU on my computer. If TensorFlow doesn&#8217;t recognize any GPU, this command will return 0. Tensorflow will then run on the CPU.</p>



<h3 class="wp-block-heading" id="h-step-2-define-the-prediction-classes">Step #2 Define the Prediction Classes</h3>



<p>Next, we will define the path to the folders that contain our train and validation images. In addition, we will define a Dataframe &#8220;image_df,&#8221; which has all the pictures from the &#8220;train&#8221; folder. With the help of this Dataframe, we can later split the data simply by defining which images from the train folder contain the training dataset and which belong to the test dataset. Important note: the dataframe &#8220;image_df&#8221; only includes the names of the images and the classes, but not the photos themselves.</p>



<p>It&#8217;s good to check the distribution of classes in the training data set. For this purpose, we create a bar plot, which illustrates the number of both classes in the image data. And yes, I admit, I choose some custom colors to make it look fancy.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># set the directory for train and validation images
train_path = 'data/images/cats-and-dogs/train/'
#test_path = 'data/cats-and-dogs/test/'

# function to create a list of image labels 
def createImageDf(path):
    filenames = os.listdir(path)
    categories = []

    for fname in filenames:
        category = fname.split('.')[0]
        if category == 'dog':
            categories.append(1)
        else:
            categories.append(0)
    df = pd.DataFrame({
        'filename':filenames,
        'category':categories
    })
    return df

# display the header of the train_df dataset
image_df = createImageDf(train_path)
image_df.head(5)

sns.countplot(y='category', data=image_df, palette=['#2FE5C7',&quot;#2F8AE5&quot;], orient=&quot;h&quot;)</pre></div>



<p></p>



<figure class="wp-block-image size-full"><img decoding="async" width="376" height="262" data-attachment-id="11572" data-permalink="https://www.relataly.com/image-classification-with-deep-learning/2485/image-9-2/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-9.png" data-orig-size="376,262" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-9" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-9.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-9.png" alt="" class="wp-image-11572" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-9.png 376w, https://www.relataly.com/wp-content/uploads/2022/12/image-9.png 300w" sizes="(max-width: 376px) 100vw, 376px" /></figure>



<p>The number of images in the two classes is balanced, so we don&#8217;t need to rebalance the data. That&#8217;s nice!</p>



<h3 class="wp-block-heading" id="h-step-3-plot-sample-images">Step #3 Plot Sample Images</h3>



<p>I prefer not to jump directly into preprocessing and check that the data has been correctly loaded. We will do this by plotting some random images from the train folder. This step is not necessary, but it&#8217;s a best practice.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">n_pictures = 16 # number of pictures to be shown
columns = int(n_pictures / 2)
rows = 2
plt.figure(figsize=(40, 12))
for i in range(n_pictures):
    num = i + 1
    ax = plt.subplot(rows, columns, i + 1)
    if i &lt; columns:
        image_name = 'cat.' + str(rdn.randint(1, 1000)) + '.jpg'
    else: 
        image_name = 'dog.' + str(rdn.randint(1, 1000)) + '.jpg'
    plt.xlabel(image_name)    
    plt.imshow(load_img(train_path + image_name)) 

#if you get a deprecated warning, you can ignore it</pre></div>



<figure class="wp-block-image size-full"><img decoding="async" width="1024" height="315" data-attachment-id="7123" data-permalink="https://www.relataly.com/image-classification-with-deep-learning/2485/cats-and-dogs-neural-networks-classification/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/04/cats-and-dogs-neural-networks-classification.png" data-orig-size="1024,315" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="cats-and-dogs-neural-networks-classification" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/04/cats-and-dogs-neural-networks-classification.png" src="https://www.relataly.com/wp-content/uploads/2022/04/cats-and-dogs-neural-networks-classification.png" alt="classifying cats and dogs convolutional neural networks" class="wp-image-7123" srcset="https://www.relataly.com/wp-content/uploads/2022/04/cats-and-dogs-neural-networks-classification.png 1024w, https://www.relataly.com/wp-content/uploads/2022/04/cats-and-dogs-neural-networks-classification.png 300w, https://www.relataly.com/wp-content/uploads/2022/04/cats-and-dogs-neural-networks-classification.png 768w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>I never expected to have so many pictures of cats and dogs one day, but I guess neither did you 🙂 Neural networks require a fixed input shape where each neuron corresponds to a pixel value. </p>



<p>As we can see from the sample images, the images in our dataset have different sizes and aspect ratios. For the images to fit into the input shape of our neural network, we need to put the images into a standard format. But before that, we split the data into two datasets for train and test.</p>



<h3 class="wp-block-heading" id="h-step-4-split-the-data">Step #4 Split the Data</h3>



<p>Image classification requires splitting the data into a train and a validation set. We define a split ratio of 1/5 so that 80% of the data goes into the training dataset and 20% goes into the validation dataframe. We shuffle the data to create two DataFrameswith a mix of random cat and dog pictures. In addition, we transform the classes of the images into categorical values 0-&gt;&#8221;cat&#8221; and 1-&gt;&#8221;dog&#8221;. The result is two new DataFrames: train_df (20.000 images) and validate_df (5.000 images).</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">image_df[&quot;category&quot;] = image_df[&quot;category&quot;].replace({0:'cat',1:'dog'})

train_df, validate_df = train_test_split(image_df, test_size=0.20, random_state=42)
train_df = train_df.reset_index(drop=True)
total_train = train_df.shape[0]

validate_df = validate_df.reset_index(drop=True)
total_validate = validate_df.shape[0]
train_df.head()

print(len(train_df), len(validate_df))</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Output: 20000 5000</pre></div>



<h3 class="wp-block-heading" id="h-step-5-preprocess-the-images">Step #5 Preprocess the Images</h3>



<p>The next step is to define two data generators for these DataFrames, which use the names given in the train and validation DataFrames to feed the images from the &#8220;train&#8221; path into our neural network. The data generator has various configuration options. We will perform the following operations:</p>



<ul class="wp-block-list">
<li>Rescale the image by dividing their RGB color values (1-255) by 255</li>



<li>Shuffle the images (again)</li>



<li>Bring the images into a uniform shape of 128 x 128 pixels</li>



<li>We define a batch size of 32, which processes the 32 images simultaneously.</li>



<li>The class mode is &#8220;binary&#8221; so our two prediction labels are encoded as&nbsp;float32&nbsp;scalars with values 0 or 1. As a result, we will only have a single end neuron in our network.</li>



<li>We perform some data augmentation techniques on the training data (incl. horizontal flip, shearing, and zoom). In this way, the model never sees different variants of the images, which helps to prevent overfitting.</li>
</ul>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="2700" data-permalink="https://www.relataly.com/image-classification-with-deep-learning/2485/image-19-4/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2020/12/image-19.png" data-orig-size="833,262" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-19" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2020/12/image-19.png" src="https://www.relataly.com/wp-content/uploads/2020/12/image-19.png" alt="" class="wp-image-2700" width="753" height="236" srcset="https://www.relataly.com/wp-content/uploads/2020/12/image-19.png 833w, https://www.relataly.com/wp-content/uploads/2020/12/image-19.png 300w, https://www.relataly.com/wp-content/uploads/2020/12/image-19.png 768w" sizes="(max-width: 753px) 100vw, 753px" /><figcaption class="wp-element-caption">Some augmentation techniques</figcaption></figure>



<p>It is essential to mention that the input shape of the first layer of the neural network must correspond to the image shape of 128 x 128. The reason is that each pixel becomes an input to a neuron.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># set the dimensions to which we will convert the images
img_width, img_height = 128, 128
target_size = (img_width, img_height)
batch_size = 32
rescale=1.0/255

# configure the train data generator
print('Train data:')
train_datagen = ImageDataGenerator(rescale=rescale)
train_generator = train_datagen.flow_from_dataframe(
    train_df, 
    train_path,
    shear_range=0.2, #
    zoom_range=0.2, #
    horizontal_flip=True, # 
    shuffle=True, # shuffle the image data
    x_col='filename', y_col='category',
    classes=['dog', 'cat'],
    target_size=target_size,
    batch_size=batch_size,
    color_mode=&quot;rgb&quot;,
    class_mode='binary')

# configure test data generator
# only rescaling
print('Test data:')
validation_datagen = ImageDataGenerator(rescale=rescale)
validation_generator = validation_datagen.flow_from_dataframe(
    validate_df, 
    train_path,    
    shuffle=True,
    x_col='filename', y_col='category',
    classes=['dog', 'cat'],
    target_size=target_size,
    batch_size=batch_size,
    color_mode=&quot;rgb&quot;,
    class_mode='binary')</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Train data:
Found 20000 validated image filenames belonging to 2 classes.
Test data:
Found 5000 validated image filenames belonging to 2 classes.</pre></div>



<p>At this point, we have already completed the data preprocessing part. The next step is to define and compile the convolutional neural network.</p>



<h3 class="wp-block-heading" id="h-step-6-define-and-compile-the-convolutional-neural-network">Step #6 Define and Compile the Convolutional Neural Network</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>The architecture of our image classification CNN is inspired by the famous VGGNet. In this section, we will define and compile our CNN model. We do this by defining multiple layers and stacking them on top of each other. However, to lower the amount of time needed to train the network, I reduced the number of layers.</p>



<p>The initial layer of our network is the initial input layer, which receives the preprocessed images. As already noted, the shape of the input layer needs to match the shape of our images. Considering how we have defined the format of the images in our data generators, the input shape is defined as 128 x 128 x 3. </p>



<p>The subsequent layers are four convolutional layers. Each of these layers is followed by a pooling layer. In addition, we define a Dropoutrate of 20% for each convolutional layer. </p>



<p>Finally, a fully connected output layer with 128 neurons and a binary layer for the output complete the structure of the CNN.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="2608" data-permalink="https://www.relataly.com/image-classification-with-deep-learning/2485/image-8-7/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2020/12/image-8.png" data-orig-size="539,493" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-8" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2020/12/image-8.png" src="https://www.relataly.com/wp-content/uploads/2020/12/image-8.png" alt="3-dimensional Input Shape of our Neural Network " class="wp-image-2608" width="423" height="386" srcset="https://www.relataly.com/wp-content/uploads/2020/12/image-8.png 539w, https://www.relataly.com/wp-content/uploads/2020/12/image-8.png 300w" sizes="(max-width: 423px) 100vw, 423px" /><figcaption class="wp-element-caption">3-dimensional Input Shape of our Neural Network </figcaption></figure>



<p></p>
</div>
</div>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<div class="wp-block-kadence-infobox kt-info-box_4a47ba-1f"><span class="kt-blocks-info-box-link-wrap info-box-link kt-blocks-info-box-media-align-top kt-info-halign-left"><div class="kt-blocks-info-box-media-container"><div class="kt-blocks-info-box-media kt-info-media-animate-drawborder"><div class="kadence-info-box-icon-container kt-info-icon-animate-drawborder"><div class="kadence-info-box-icon-inner-container"><span class="kb-svg-icon-wrap kb-svg-icon-fe_cpu kt-info-svg-icon"><svg viewBox="0 0 24 24"  fill="none" stroke="currentColor" stroke-width="1" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><rect x="4" y="4" width="16" height="16" rx="2" ry="2"/><rect x="9" y="9" width="6" height="6"/><line x1="9" y1="1" x2="9" y2="4"/><line x1="15" y1="1" x2="15" y2="4"/><line x1="9" y1="20" x2="9" y2="23"/><line x1="15" y1="20" x2="15" y2="23"/><line x1="20" y1="9" x2="23" y2="9"/><line x1="20" y1="14" x2="23" y2="14"/><line x1="1" y1="9" x2="4" y2="9"/><line x1="1" y1="14" x2="4" y2="14"/></svg></span></div></div></div></div><div class="kt-infobox-textcontent"><h4 class="kt-blocks-info-box-title">Additional Info</h4><p class="kt-blocks-info-box-text"><strong><em>Loss function</em>:</strong> measures model accuracy during training. We try to minimize this function to &#8220;steer&#8221; the model in the right direction. We use binary_crossentropy.<br/><strong><em>Optimizer</em>:</strong> defines how the model weights are updated based on the data it sees and its loss function.<br/><strong><em>Metrics</em></strong> are<strong> </strong>used to monitor the steps during training and testing. The following example uses <em>accuracy</em>, which is the fraction of the correctly classified images.</p></div></span></div>
</div>
</div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># define the input format of the model
input_shape = (img_width, img_height, 3)
print(input_shape)

# define  model
model = Sequential()
model.add(Conv2D(32, (3, 3), strides=(1, 1), activation='relu', kernel_initializer='he_uniform', padding='same', input_shape=input_shape))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.20))
model.add(Conv2D(64, (3, 3), strides=(1, 1), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.20))
model.add(Conv2D(64, (3, 3), strides=(1, 1), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.20))
model.add(Conv2D(128, (3, 3),  strides=(1, 1),activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.20))
model.add(Flatten())
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

# compile the model and print its architecture
opt = SGD(lr=0.001, momentum=0.9)
history = model.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])
print(model.summary())</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">input_shape: (100, 100, 3)
Model: &quot;sequential&quot;
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 100, 100, 32)      896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 50, 50, 32)        0         
_________________________________________________________________
dropout (Dropout)            (None, 50, 50, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 50, 50, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 25, 25, 64)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 25, 25, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 25, 25, 64)        36928     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 12, 12, 64)        0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 12, 12, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 12, 12, 128)       73856     
_________________________________________________________________
...
Trainable params: 720,257
Non-trainable params: 0
_________________________________________________________________
None</pre></div>



<p>At this point, we have defined and assembled our convolutional neural network. Next, it is time to train the model.</p>



<h3 class="wp-block-heading" id="h-step-7-train-the-model">Step #7 Train the Model</h3>



<p>Before we train the image classifier, we still have to choose the number of epochs. More epochs can improve the model performance and lead to longer training times. In addition, the risk increases that the model overfits. Finding the optimal number of epochs is difficult and often requires a trial-and-error approach. I typically start with a small number of 5 epochs and then increase this number until increases do not lead to significant improvements.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># train the model
epochs = 40
early_stop = EarlyStopping(monitor='loss', patience=6, verbose=1)

history = model.fit(
    train_generator,
    epochs=epochs,
    callbacks=[early_stop],
    steps_per_epoch=len(train_generator),
    verbose=1,
    validation_data=validation_generator,
    validation_steps=len(validation_generator))</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">Epoch 1/35
625/625 [==============================] - 121s 194ms/step - loss: 0.7050 - accuracy: 0.5282 - val_loss: 0.6902 - val_accuracy: 0.5824
Epoch 2/35
625/625 [==============================] - 115s 183ms/step - loss: 0.6853 - accuracy: 0.5469 - val_loss: 0.6856 - val_accuracy: 0.5806
Epoch 3/35
625/625 [==============================] - 115s 184ms/step - loss: 0.6744 - accuracy: 0.5752 - val_loss: 0.6746 - val_accuracy: 0.5806
Epoch 4/35
625/625 [==============================] - 112s 180ms/step - loss: 0.6569 - accuracy: 0.5987 - val_loss: 0.6593 - val_accuracy: 0.6110
Epoch 5/35
625/625 [==============================] - 115s 185ms/step - loss: 0.6423 - accuracy: 0.6194 - val_loss: 0.6474 - val_accuracy: 0.6134
Epoch 6/35
625/625 [==============================] - 116s 185ms/step - loss: 0.6309 - accuracy: 0.6370 - val_loss: 0.6386 - val_accuracy: 0.6260
Epoch 7/35
625/625 [==============================] - 115s 183ms/step - loss: 0.6139 - accuracy: 0.6539 - val_loss: 0.6082 - val_accuracy: 0.6682</pre></div>



<p>A quick comment on the required time to train the model. Although the model is not overly complex and the size of the data is still moderate, training the model can take some time. I made two training runs &#8211; the first run on my GPU (Nvidia Geforce 3080 RTX) and the second on my CPU (AMD Ryzen 3700x). On the GPU, training took approximately 10 minutes. The CPU training was much slower and took about 30 minutes, three times longer than the GPU.  </p>



<p>After training, you may want to save the classification model and load it at a later time. You can do this with the code below:<br>However, we need to define the model strictly as it was during training before loading.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Safe the weights
model.save_weights('cats-and-dogs-weights-v1.h5')

# Define model as during training
# model architecture

# Loads the weights
model.load_weights('cats-and-dogs-weights-v1.h5')</pre></div>



<h3 class="wp-block-heading" id="h-step-8-visualize-model-performance">Step #8 Visualize Model Performance</h3>



<p>After training the model, we want to check the performance of our image classification model. For this purpose, we can apply the same performance measures as in traditional classification projects. The code below illustrates the performance of our image classifier on the validation dataset. </p>



<p>To learn more about measuring model performance, check out my <a href="https://www.relataly.com/measuring-classification-performance-with-python-and-scikit-learn/846/" target="_blank" rel="noreferrer noopener">previous post on Measuring Model Performance</a>. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">def plot_loss(history, value1, value2, title):
    fig, ax = plt.subplots(figsize=(15, 5), sharex=True)
    plt.plot(history.history[value1], 'b')
    plt.plot(history.history[value2], 'r')
    plt.title(title)
    plt.ylabel(&quot;Loss&quot;)
    plt.xlabel(&quot;Epoch&quot;)
    ax.xaxis.set_major_locator(plt.MaxNLocator(epochs))
    plt.legend([&quot;Train&quot;, &quot;Validation&quot;], loc=&quot;upper left&quot;)
    plt.grid()
    plt.show()

# plot training &amp; validation loss values
plot_loss(history, &quot;loss&quot;, &quot;val_loss&quot;, &quot;Model loss&quot;)
# plot training &amp; validation loss values
plot_loss(history, &quot;accuracy&quot;, &quot;val_accuracy&quot;, &quot;Model accuracy&quot;)
</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="2725" data-permalink="https://www.relataly.com/image-classification-with-deep-learning/2485/image-25-4/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2020/12/image-25.png" data-orig-size="894,333" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-25" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2020/12/image-25.png" src="https://www.relataly.com/wp-content/uploads/2020/12/image-25.png" alt="" class="wp-image-2725" width="865" height="322" srcset="https://www.relataly.com/wp-content/uploads/2020/12/image-25.png 894w, https://www.relataly.com/wp-content/uploads/2020/12/image-25.png 300w, https://www.relataly.com/wp-content/uploads/2020/12/image-25.png 768w" sizes="(max-width: 865px) 100vw, 865px" /><figcaption class="wp-element-caption"><img decoding="async" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAA34AAAFNCAYAAABfWL0+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAABxYUlEQVR4nO3deZyN5f/H8ddl7Ft2kbWItFhTiGiRSlQIbbRY2jctolX7vpNKRUmifFu0R+uvkKyhJNmyC4Nhluv3x+cMY8yMWc5yz8z7+Xicx1nu5bzPPefMnM9c131dznuPiIiIiIiIFFxFYh1AREREREREIkuFn4iIiIiISAGnwk9ERERERKSAU+EnIiIiIiJSwKnwExERERERKeBU+ImIiIiIiBRwKvxERKRQcM7Vc85551zRbKzb3zn3QzRyiYiIRIMKPxERCRzn3HLn3B7nXJV0j88JFW/1YhRNREQkX1LhJyIiQfU30Df1jnPuWKBU7OIEQ3ZaLEVERNJT4SciIkE1Drg0zf1+wNi0KzjnDnHOjXXObXDO/eOcG+6cKxJaFuece8I5t9E5tww4O4NtX3PO/eucW+2ce8A5F5edYM6595xza51zW51z3znnjk6zrJRz7slQnq3OuR+cc6VCy05yzv3knPvPObfSOdc/9Ph059yVafaxX1fTUCvnNc65P4E/Q489G9rHNufcr8659mnWj3PO3emc+8s5tz20vLZz7kXn3JPpXstHzrkbs/O6RUQk/1LhJyIiQfUzUN45d1SoIOsNvJVuneeBQ4DDgZOxQvGy0LIBQFegOdAK6Jlu2zeBJKBBaJ3OwJVkz6dAQ6AaMBt4O82yJ4CWQFugEnAbkOKcqxPa7nmgKtAMmJPN5wM4FzgBaBK6PzO0j0rAeOA951zJ0LKbsdbSs4DywOXATuw1901THFcBTgXeyUEOERHJh1T4iYhIkKW2+p0OLAZWpy5IUwwO9d5v994vB54ELgmtcgHwjPd+pfd+M/Bwmm2rA2cCN3rvd3jv1wNPA32yE8p7Pyb0nLuBe4GmoRbEIliRdYP3frX3Ptl7/1NovYuAr7z373jvE733m7z3c3JwLB723m/23u8KZXgrtI8k7/2TQAmgUWjdK4Hh3vsl3swNrTsD2IoVe4Re73Tv/boc5BARkXxI5wmIiEiQjQO+A+qTrpsnUAUoDvyT5rF/gMNCt2sCK9MtS1UXKAb865xLfaxIuvUzFCo4HwR6YS13KWnylABKAn9lsGntTB7Prv2yOeduwQq8moDHWvZSB8PJ6rneBC4GvgxdP5uHTCIikk+oxU9ERALLe/8PNsjLWcD76RZvBBKxIi5VHfa1Cv6LFUBpl6VaCewGqnjvK4Qu5b33R3NwFwLdgdOwbqb1Qo+7UKYE4IgMtluZyeMAO4DSae4fmsE6PvVG6Hy+27FWzYre+wpYS15qFZvVc70FdHfONQWOAqZksp6IiBQgKvxERCTorgBO8d7vSPug9z4ZmAg86Jwr55yri53blnoe4ETgeudcLedcReCONNv+C3wBPOmcK++cK+KcO8I5d3I28pTDisZNWLH2UJr9pgBjgKecczVDg6y0cc6VwM4DPM05d4FzrqhzrrJzrllo0znA+c650s65BqHXfLAMScAGoKhz7m6sxS/Vq8AI51xDZ45zzlUOZVyFnR84Dpic2nVUREQKNhV+IiISaN77v7z3szJZfB3WWrYM+AEb5GRMaNkrwOfAXGwAlvQthpdiXUV/B7YAk4Aa2Yg0Fus2ujq07c/plg8B5mPF1WbgUaCI934F1nJ5S+jxOUDT0DZPA3uAdVhXzLfJ2ufYQDF/hLIksH9X0KewwvcLYBvwGvtPhfEmcCxW/ImISCHgvPcHX0tEREQKDOdcB6xltF6olVJERAo4tfiJiIgUIs65YsANwKsq+kRECg8VfiIiIoWEc+4o4D+sS+szMQ0jIiJRpa6eIiIiIiIiBZxa/ERERERERAo4FX4iIiIiIiIFXNFYBwinKlWq+Hr16u29v2PHDsqUKRO7QAHKEYQMQckRhAxByRGEDMoRvAxByRGEDEHJEYQMyhG8DEHJEYQMQckRhAzKEbwM0c7x66+/bvTeVz1ggfe+wFxatmzp05o2bZoPgiDkCEIG74ORIwgZvA9GjiBk8F45gpbB+2DkCEIG74ORIwgZvFeOoGXwPhg5gpDB+2DkCEIG75UjaBm8j24OYJbPoFZSV08REREREZECToWfiIiIiIhIAafCT0REREREpIArUIO7ZCQxMZFVq1aRkJAQswyHHHIIixYtitnzhzNDyZIlqVWrFsWKFQtDKhERERERiYYCX/itWrWKcuXKUa9ePZxzMcmwfft2ypUrF5PnDmcG7z2bNm1i1apV1K9fP0zJREREREQk0gp8V8+EhAQqV64cs6KvIHHOUbly5Zi2noqIiIiISM4V+MIPUNEXRjqWIiIiIiL5T6Eo/GJl06ZNNGvWjHbt2nHooYdy2GGH0axZM5o1a8aePXuy3HbWrFlcf/31UUoqIiIiIiIFWYE/xy+WKleuzJw5c9i+fTtPPvkkZcuWZciQIXuXJyUlUbRoxj+CVq1a0apVq2hFFRERERGRAiyiLX7OuS7OuSXOuaXOuTsyWH6Ic+4j59xc59xC59xlaZYtd87Nd87Ncc7NimTOaOrfvz8333wznTp14vbbb2fGjBm0bduW5s2b07ZtW5YsWQLA9OnT6dq1KwD33nsvl19+OR07duTwww/nueeei+VLEBEREREpVBIS4J9/4Jdf4MMPYfRoWL061qlyJmItfs65OOBF4HRgFTDTOfeh9/73NKtdA/zuvT/HOVcVWOKce9t7n9oPspP3fmOkMsbKH3/8wVdffUVcXBzbtm3ju+++o2jRonz11VfceeedTJ48+YBtFi9ezLRp09i+fTuNGjXiqquu0pQKIiIiIiK5lJgIGzbA2rV2Wbdu3+3097duPXD7jz+Gww6Lfu7cimRXz9bAUu/9MgDn3ASgO5C28PNAOWcjhpQFNgNJkQp0440wZ05499msGTzzTM626dWrF3FxcQBs3bqVfv368eeff+KcIzExMcNtzj77bEqUKEGJEiWoVq0a69ato1atWnkLLyIiIiJSwCQlwapVsHw5fPFFdWbO3L+IS729MZPmpfLl4dBDoXp1OO446Nx53/1DD913u3r1qL6sPItk4XcYsDLN/VXACenWeQH4EFgDlAN6e+9TQss88IVzzgMve+9HRzBrVJUpU2bv7bvuuotOnTrxwQcfsHz5cjp27JjhNiVKlNh7Oy4ujqSkiNXHIiIiIiL78x5WroQ6dWKdhJQUK9z+/tsuy5fvf3vFCkhOTl37KABKldpXtDVoACeddGAxl3q/VKkYvbAIi2Thl9G4/z7d/TOAOcApwBHAl865773324B23vs1zrlqoccXe++/O+BJnBsIDASoXr0606dP37ssPj6eQw45hO3btwMwYkSeX1OGQrvPVHJyMrt376ZYsWIkJiaya9euvZk2bdpEpUqV2L59Oy+//DLee7Zv387OnTtJSkpi+/bte7dN3SYlJYX4+Pi997MjOTk5R+tnJSEhYb/jnBPx8fG53jZcgpAhKDmCkEE5gpchKDmCkCEoOYKQQTmClyEoOYKQISg5gpAhIjm858gnn6TmJ5/wz8UX8/dll0GRgw8Vktsc3sPWrcVYu7Yk//5bkrVrS4Zul9p7OzFx/+evVGk3NWokUL9+Am3bJlCjRgKHHppA2bKbqVWrKKVLJ5PVrGQJCVY0Ll+e47jZEoT3RiQLv1VA7TT3a2Ete2ldBjzivffAUufc30BjYIb3fg2A9369c+4DrOvoAYVfqCVwNECrVq182haz6dOnU7JkScqVKxe2F5Ub27dv39tNs1ixYpQqVWpvpjvvvJN+/foxcuRITjnlFJxzlCtXjtKlS1O0aFHKlSu3d9vUbYoUKULZsmVz9Lq2b98etuNQsmRJmjdvnqttp0+fnmmrZrQEIUNQcgQhg3IEL0NQcgQhQ1ByBCGDcgQvQ1ByBCFDUHIEIUNEctx5J3zyCTRvTt233qJuQgK8+SaULp3rHNu372ulW7bswNa7HTv2X79yZahfH044wa7r14d69ey6bl0oVaoEUAI4JIMM7XP7ysMmCO+NSBZ+M4GGzrn6wGqgD3BhunVWAKcC3zvnqgONgGXOuTJAEe/99tDtzsD9Ecwacffee2+Gj7dp04Y//vhj7/0RoWbJjh077n1zpN92wYIFkYgoIiIiIrK/J5+Ehx+GQYNg5Eh4+mkYMsQqtP/9D2rWzHCzxERYs6YkX321f3GXep3+/Lpy5ayIO+IIOO20fUVdaoEX43acAiFihZ/3Psk5dy3wORAHjPHeL3TODQ4tHwWMAN5wzs3Huobe7r3f6Jw7HPjAxnyhKDDee/9ZpLKKiIiIiEg6b7xhRV6vXvDii+Ac3HwzNGyI79uXlFatWfjwRywo1vyAlrsVKyAl5cS9uypadF8x16OHXR9++L7irlIlsuyKKXkX0QncvfdTganpHhuV5vYarDUv/XbLgKaRzCYiIiIiIvskJMCWLfDff8CH/6PRnVey9pjTmdJmHJseimPDBmvoW7bsHMom/8jEf8/hiP4ncTdv8z/O5dBDrYhr1w4uvhj27FnMmWc25vDDbdqD0KD2EiMRLfxERERERCQ6vIf4eFi/vgTz5lkBt2XL/pf0j6W9n5Bg++nAt3xOb2bSklMXvM+Om210+XLlrNWuQQOof3pTvqwyg/PHdueDP88n8b6HKT78tv2a7aZPX0vHjo2jfRgkEyr8RERERERiLDERtm2zicK3bt3/dvr7md3evt2mOoA2GT6Hc3DIIVChAlSsaJejjtr/fsP43zj3mXPYXeVwir46ldn1ylKhgq1TvHj6PR4KN0+Hyy6j+N13wLLF8PLLGa0oAaDCT0REREQiY88emD8fZs2CmTNhzRq4/35o1SrWyaJm1y473y11xMrUyz//wObN+wq3XbsOvq/ixa1wS72UL2+DoaTeTn187dolnHhio73FXMWKVriVL3+Q7pZ//AEnnQFVK1Lsxy9oWavywUOVKgXvvAONG8N999mJfpMnQ5Uq2To+Ej0q/EREREQk75KS4PffrchLLfTmzbPiD2w8/iJF4OST4d13oWvX2OYNk9279xV2n39egy++2DclwfLlNtF4WsWK2fQDdevaXOjpi7a0t9PfL1Eie5mmT/+Xjh0b5eyFrF4NnUNDb3z5JdSqlf1tnYN774Ujj4TLL7c5Fz7+OGfPLxGnwi/COnbsyA033MB5552397FnnnmGP/74g5deeinD9Z944glatWrFWWedxfjx46lQocJ+69x7772ULVuWIUOGZPq8U6ZM4cgjj6RJkyYAPPDAA5x++umcdtpp4XlhIiIiUnilpFjr0MyZNJgyBYYNg99+29dsVb68terdeKNdH3+8VTrr1lnB1707vPACXHVVLF9FtuzZAytX7l/Mpb29Zr9ZqhtRtKgVdPXqwVln7RvJsl49u9SoEcBBTjZvtqJv82aYNs0KuNy48EJ7seeeC23aUHH4cIjl3HWLFnHI3LnQoUO2Jpwv6FT4RVjfvn2ZPHnyfoXfhAkTePzxxw+67dSpUw+6TmamTJlC165d9xZ+w4cPj/lE9iIiIpIPeW/d91Jb8WbNgtmz7YQyoEbJklbcDR5s161a2egfGX3RPvRQ+PZb6NMHrr7aKqeHH475l/KdO+Gvv+yydOn+l5UrU8+bM3FxULu2FXGdO+9f2P377//Ro0cbiuanb9jx8Vah/vUXfPYZtGyZt/21aQMzZkDXrhx3++1Qtqy9N6IlIQEmTbJzDX/4geZg/2S4+mq47DLr81pI5ae3Zb7Us2dPhg0bxu7duylRogTLly9nzZo1jB8/nptuuoldu3bRs2dP7rvvvgO2rVevHrNmzaJKlSo8+OCDjB07ltq1a1O1alVahj6Ur7zyCqNHj2bPnj00aNCAcePGMWfOHD788EO+/fZbHnjgASZPnszdd9/NeeedR8+ePfn6668ZMmQISUlJHH/88YwcOZISJUpQr149+vXrx0cffURiYiLvvfcejRtrJCYREZFCZdMmK85Su2zOmmVDPoL1NWzWDC69dG9L3vdr19Lx1FMz3d2//8JPP8GPP9rpfuXLl+HQmh/Qr9X1tH7sMVb8sILl975B1VolqF7dvpdHog7cti3jwm7p0vStdtYrtUEDOOmk/eeaq1fPpiUoVizj55g+fXf+Kvr27LFJ9WbOtPPywtU6V7cu/PQTmzt3pvJVV8GiRTYRfCQPzuLFMHo0vPmmtVw2bAiPP87vW7bQ5Ntvbf7B4cNtnolrroHjjotcloDKT2/NfKly5cq0bNmSzz77jO7duzNhwgR69+7N0KFDqVSpEsnJyZx66qnMmzeP4zJ5A/76669MmDCB3377jaSkJFq0aLG38Dv//PMZMGAAYK16r732Gtdddx3dunWja9eu9OzZc799JSQk0L9/f77++muOPPJILr30UkaOHMmNN94IQJUqVZg9ezYvvfQSTzzxBK+++mrkDo6IiIgEw5498Omn9qX5449tiMmiReHYY23y7tSWvGOOObDq2bBh783kZFiwwIq8n36yy99/27KSJW3zNWvgu/VFGbnxRW6hPo//dBvLO6+mHVPYQiWKFoWqVaFatexdSpfeF2Xz5gOLutRib/36/WMfeqgVd6efbteplyOOsMFQCrzkZCvgv/gCxoyx7pnhVK4c8x94gI6ffAJPPw1//gkTJlg34HDZvdsK1pdfhu++s/fmeefBoEHQqRM4x/rp02ny4IPWFfnFF2HsWCsQO3SwAvC88zKv5AuYwlX43XgjzJkT3n02awbPPJPlKj179mTChAl7C78xY8YwceJERo8eTVJSEv/++y+///57poXf999/z3nnnUfp0G+2bt267V22YMEChg8fzn///Ud8fDxnnHFGllmWLFlC/fr1OTLUd7tfv368+OKLewu/888/H4CWLVvy/vvvZ+MAiIiISL7kvXXZfPNNG5Vx40arpK691oq95s2tWsvCtm0wc2ZFpk+3Yu/nn63nIFhh1a4dXHcdtG1ru0s7yn9SkmPTpltZMbYOJw27lOVV2jL5yk9Zmlyf9evZe/nrL7tO3W96ZcpYobhpU7vU3qd71aplxVy3bvsXd4cfbnPSFVre28/53Xfh8cetC2QkxMXBU09Bo0ZWZLVtCx99ZM2nefHHH1a8vfGGtVAffjg88oi9jmrVMt6meXN49VV47DF4/XUrAnv3hpo1rVAcONDetAVY4Sr8YqRr164MGzaM2bNns2vXLipWrMgTTzzBzJkzqVixIv379ychdcbMTLg0k2Gm1b9/f6ZMmULTpk154403mD59epb78d5nubxEaLiouLg4kpKSslxXRERE8qE1a+Ctt6zlY+FCq8a6d4d+/eCMMzLtjue9td6ldtv86Sfruul9U4oUscbBSy+17/bt2llvv0y+vgD2NNWrA7f2hhNrUr57dy57+URrcTz++APW37nTGhfXrWO/wjD1sn37ejp0OGy/4q5UqTAds4Lmnntg1Ci4/XbIYrDAsBk0yJpSe/WyET+nTLE3Sk7s2QMffGCte9Om2Ruoe3fb96mnZr9/cKVKcMst1iD02Wd2/t8998ADD0DPnlYQt2mT9Zs3nypchd9BWuYipWzZsnTs2JHLL7+cvn37sm3bNsqUKcMhhxzCunXr+PTTT+mYRZ/qDh060L9/f+644w6SkpL46KOPGDRoEADbt2+nRo0aJCYm8vbbb3PYYYcBUK5cOban/7cX0LhxY5YvX87SpUv3nhN48sknR+R1i4iISEDs3An/+5+17n35pY1W0qaNffm/4IIM+zbu3m2949J220ydmqBcOdv8/POhbNm5DBjQNG89+Nq3tyc46yw7z+ydd6yZLo3SpfdNg5CR6dP/pGPHw/IQopB49lkYMQKuuMIG1omW006zJuGuXa0b5pgxcNFFB99u6VJ45RVrpduwwU60fOgha93LSwtdXBycfbZd/vwTRo60TO+8Y62D11wDffvu35c4nytchV8M9e3bl/PPP58JEybQuHFjmjdvztFHH83hhx9Ou3btsty2RYsW9O7dm2bNmlG3bl3at2+/d9mIESM44YQTqFu3Lscee+zeYq9Pnz4MGDCA5557jkmTJu1dv2TJkrz++uv06tVr7+Aug6M50pKIiIhEh/fwww+kvP4mbtJ7uO3b2FOjDusvHco/HS5l3SFHsn07bH/bBuhMe1m61MZ02b3bdnX44fa9vV07a6g5+uh9UxJMn74lPKdtNW4M//d/cM45dt7Vc8/Zl28Jn7fespau88+3oj/arVqNGlnx16OHDbKyeLFN+p6+tW7PHvtHxejR8NVX9mbr1s1a904/Pfyj/zRsaF1SR4yAt9+2VsArr4Rbb7UC+aqr7EOQz6nwi5Lzzjtvv26Wb7zxRobrpe2quXz58r23hw0bxrBhww5Y/6qrruKqDObAadeuHb///vve+6NGjdo7ncOpp57Kb7/9dsA2aZ+vVatWB+02KiIiItG1e7c1TixebKc5zZlzBG+nK9zKb1xG57Vj6b51LHVT/mYnZZhET96kH9/+ezL+jSLwxoH7jouzlrzy5W3kymuvtSKvbdsonvpUvbp14+vb1wL884+du6U52PLuk0+gf3845RQrbmI1/GjlyjagzFVXWffKxYutJbp0aZs2JLV1b906mxBxxAibFL5mzchnK1PGzvUbMAC+/94KwKefthFJzzrL3pOdO+fb96MKPxEREZGA2bjRvg+nv/z99/5zypUsWZMKFaBG6a2cl/Qe3baOpenW70nBseSwU3jnuPtY3uJ8SlYuw0XlYHA5K+4yupQsGZDTmsqUsXO5brjBBh755x8rDA4y0Ixk4fvv7fy15s3t/LpYH8vixW2glaOOgttuszd2akFYpIh1Bx00yM45jcVs987ZqJ8dOsDq1dby+PLLcOaZdgLpNddYEZ3P5gRU4SciIiISA8nJ9n03owJv06Z965UoYT3kWra0U6IaN7bLkUck89eoJzluzhwrlBIS4Mgj4bYHKXLxxRxVpw5HxezV5VFcHDz/vI3+OGSIDUgzZYoVB5Izc+da99m6dWHq1OAMZ+qc/WyPPBIuvNDOM733XutaWatWrNPtc9hh1h112DCbOuKFF+Cmm+z+pElWDOYTKvxEREREIig+HpYsObC4++MPO5UpVbVqVtD16LGvuGvc2Hq7HdDosWwZtDyD45YutS/Ml11mo3K2bh2QZrswcM5GX6xTBy65xPqcfvppgTjXKmr++stazcqVs9a0qlVjnehA3bpZYV+mTGxa97KreHHrgty3r4169NJLNrdlPlIoCj/vfabTIUjOHGw6CBERkXwnKckGEkktoHJpzx4r5ubP3//yzz/71omLs1HtGze2U4ZSi7tGjWyU+WxZvtxGRYyPZ+Hdd3P0nXdas2BB1asX1KhhQ/efGJruoXXrWKcKvn//tYFQkpLsvMk6dWKdKHPhnNQ9Gpo3t3MR85kCX/iVLFmSTZs2UblyZRV/eeS9Z9OmTZSMdb9wERGRcFm2zFqTfvrJ7q9ZY124suA9rF1bgo8/3r/AW7IEEhNtnaJFrZhr08bGiTjqKLscccT+k5jn2D//WNG3fTt8/TUbtm4t2EVfqpNOsp/RmWfum+6he/dYpwqsotu3W0vf+vVW9B2Vbzv9ShgV+MKvVq1arFq1ig0bNsQsQ0JCQsyLpXBlKFmyJLWC1O9aREQkN7yHceNslD7n7PYXX8Dw4bBjBzz4IDjHli0HtuAtWADbtrXZu6s6dWzy8q5d7frYY63oy1OBl5GVK21Exi1b4OuvrdWhMI3AnToVQNrpHq69NtapgmfnTo4dOtSGf/3kEzj++FgnkoAo8IVfsWLFqF+/fkwzTJ8+nebNmxf6DCIiIkHgN2/BDx5Mkfcmkty2PfEjx5FYsy4rj7qQ0qtK0+jhh5ny9g6uSXyGNf/u6y1UoYIVdRdfDCVK/EGPHkdyzDFwyCFRCL16tRV9GzfaBOwtW0bhSQOoWjVrwbrwQrjuOuv2+thj+XZ4/RzbuRPWrrVunJldr1hB+S1bYOJEm3xRJKTAF34iIiKSv3kPixbZuB7ffgurVjWlbFk7dSkx0a6ze2m3Zxqvp1zKoaxlOA/y6E+3k9I0dUCJIsBInilSmhtWPE2FBjuZ9cgojmkaxzHH2OB+qWeNTJ++hnbtjozOAfj3Xyv61q2zVsnCfn5b6dI2uuJNN9n8aitWwNixsU6VeykpNozrwQq6tWth27YDty9SxOY/rFHDLi1asKB+fY7t0SP6r0UCTYWfiIiIBE58vPVm/PRTu6xYYY83bgzFizuKFYNSpexcuswuxYrtu13C7eHM/7uLDr88zpZKDRh7wf9Rvm4rHk2zfrVqcOyxjoYNnoT7y9DxgQfoOG8n3PJm7Ca7XrvWir41a+Dzz21wE7FRcp59FurVs5E/V6+m2JAhsU6VPQkJMH48vPGGnWO6bp39VyK9smXh0EOtmGvaFLp02Xc/7XWVKgeMhrmpMHUBlmxT4SciIiIx5z38/vu+Qu/77601r2xZ6602bJh9761TB6ZPn0PHjh2zv/PFi20CvNmzYeBAKj/1FFeUKZPFBg5GjLDh5YcOhV27bDCRaA+isn49nHqqVb2ffWbTGcg+zsHNN9ub4uKLObFPHxtq/6qrgjmtxZo1NgXAyy9bl90mTWzUzfTFXOrtsmVjnVgKGBV+IiIiEhOZteodcwzceKMN4NiuXR4GSfEeRo2yFqHSpW0C8JyMBHnHHbbdDTfAuefC++9bM2M0bNhgRd/ff9uk2+3bR+d586OePeHoo1l7xx0cNnkyvPkmNGtmBeCFF8a+gJoxw1onJ06E5GQbnOaGG2x01qAVp1KgFZIzYUVERCTWvIeFC+GJJ6ymqVTJ6qm334YWLawhZMUKGznzscfse3Gui771621i6Kuvhg4dbKe5Gf7/+uvh1Vetm+VZZ9k0CpG2aZM1cy5danPW5aR1s7A66ij+vOkma1UbNcrebIMGQc2acM019vOPpsREmDDB5vM44QT46CPL8eef8L//WfddFX0SZWrxExERKYz++QdGjrTWkJ497eS5CAhNN8enn1pvxbC36mVk6lSbiH3rVmtpufbavI36eMUV1tJ36aXQubO9mEjZvNmKviVLrFg45ZTIPVdBVK6cFXwDB9rUD6NGwWuvWRfLdu1g8GB7v0domq1iW7fCQw/Z861eDQ0a2Huwf//8N0m5FDgq/ERERAqTpUvh4Yf3jYKYlAR33WXnG/XsCT162JwFuWyN2LMH5s610TczOldv+HA7V6927TC+plS7dsGtt8KLL9pr+Ooruw6HCy+04q93bzjlFIrdc0949pvWli1WWP7+u7UKnX56+J+jsHDOWtvatIGnnrLun6NGwSWX2H8cLr/cisMGDcLzfPPnw7PPcuK4cfYhOO00e76zzio8U01I4EX0neic6+KcW+KcW+qcuyOD5Yc45z5yzs11zi10zl2W3W1FREQkB37/3Saga9TIRhS86iobUXDVKnj+eRvS8oEHbPTAI4+089tmzbIuc5nw3nquvfWW9Yg88URrcGnd2uqv9evtO/Y331jvxQ8+gAEDIlT0zZkDrVpZ0XfTTXZeVbiKvlTnnQcffgiLFtHshhtsiP1w2boVzjgD5s2zA9WlS/j2XdhVrmyDwCxebP8M6NjRisGGDe2Yf/BBxqNqHkxy8r5um8cdB2+/zbrOnWHBAptrsWtXFX0SKBFr8XPOxQEvAqcDq4CZzrkPvfe/p1ntGuB37/05zrmqwBLn3NtAcja2FRERkYP57Td48EGb96xMGRvo5OabbdTAVNdea5f1620AlMmT7US8Rx+FunXh/POhZ0+2bIrj44+tpkq9bNliuyhd2uqu66+3U5pOPBFq1YrC60tJsS/xw4bZF/zPP7dWs0jp0gU+/ZSSZ55p5w5+/bWNKpkX27bZfufMsWN/1llhiSrpFCliJ5eeeqqdC/jaazB6tL2/a9a0/0oMGGATNmZl61Z4/XX7h8myZfZGf/hhGDCAP+bPp+bRR0fn9YjkUCT/DdEaWOq9X+a93wNMANKfVe2Bcs45B5QFNgNJ2dxWREREMvPzz9bi0KKFtXLcdZed1/fYY/sXfWlVq2bd3z7/nJ3L1/PH0NdZVvoYEp99Edq1o3XPPvx9znX88MB01v+bTM+e8Mor1ki1dat173z8cesxGpWib/VqK/JuvRXOPtuCRLLoS9WxI3OffNJG3mzf3rrP5tb27Xai46xZNurjOeeEL6dkrmZN+0z8/be12h13HNx/v/2j47zz4Isv7J8Kaf35p/1no1Yta1U+9FB4910r/u64w/7xIBJgkTzH7zBgZZr7q4AT0q3zAvAhsAYoB/T23qc457KzrYiIiKTlPXz3nc1B9/XX9kX0gQdsNMEKFTLdLDkZFi2CX37Z15I3f34lkpP7A/05pvZWrqz5CSevf4Or17zGdbtfgH+rgjsP6vSAxp2gaLFovUozaZIVqbt3W/V5xRVRHSVxW5MmMG2aFZodOlhx3aRJznYSH2+te7/8YgXEuedGJKtkoWhRG/21Wzcr4EaPhjFjrOX7iCNsoJgmTWwgpKlTbf3evW06hlatYp1eJEecz6Lvfp527Fwv4Azv/ZWh+5cArb3316VZpyfQDrgZOAL4EmgKnHGwbdPsYyAwEKB69eotJ0yYsHdZfHw8ZWM9d0tAcgQhQ1ByBCFDUHIEIYNyBC9DUHIEIUNQchw0g/dUnDmTum+9RYX589lTsSIr+vTh33POITk071xyMmzeXJwNG0qyfn2J0KUkf/1VhiVLyrFrl/0vuGzZRBo33s5RR22jcePtNG68jUqVEvfmKB8XR+UZM6j67bdU+vlniu7aRWK5cmxs146NHTqwuWVLfFiH6Nxf3M6d1H36aep89RXbGjdm0bBh7IpK8+L+Un8mpZcvp+ktt+CSk5n3+OPEN2yYre2L7NrFcUOHcsj8+fw+fDgbOnXKU45YCkKGcOZwe/ZQ9fvvqfnhh1SYNw+APRUrsuacc1jTrRt7smjZK2jHoiDkCEKGaOfo1KnTr977A/4zEcnCrw1wr/f+jND9oQDe+4fTrPMJ8Ij3/vvQ/W+AO4C4g22bkVatWvlZs2btvT99+nQ6BmDumyDkCEKGoOQIQoag5AhCBuUIXoag5AhChqDkyDRDSgp89BH+gQdws2axp3ptFnW7nR8bXc7ydaVYuZK9lzVrDhy/okwZOPpoG4zlhBPsukGDzMejOCBHQoJ1iZs0yQY92brVRnc55xw4+WRrgUxMtEtS0r7bWV0Ott6KFfj163FDh8I990CxKLc0ZnQsli6188a2bbM5K044SCelnTvtGE2fbiPj9O0bnhwxEoQMEcuxcKH9fM84I1tTQBToY5FPcwQhQ7RzOOcyLPwi2dVzJtDQOVcfWA30AS5Mt84K4FTge+dcdaARsAz4LxvbioiIFArx8XHMm7eviFv1TzKH/jiZM2c/yBE75vE3h/MQrzB23aUkvmKtbSVK2KlItWtbDVa79oGXChXy2DuyZMl93eT27LHhOydNsm5y48dnvl1cnBVsqZeiRfe/n9GlZEkrKmvVYs7JJ9P8+uvzEDzMGjSwLrannmrD+H/8sR30jOzaZRPJT5tmU2rkoeiTKDj6aLuIFAARK/y890nOuWuBz7EWvDHe+4XOucGh5aOAEcAbzrn5gANu995vBMho20hlFRERCZrERGtEGzUKvvqqPQBFSaQv73AnD9GYJSwv1ZgXTxzHynZ9aFq3KBPTFHVVq0b1lDebgb1LF7uMGmVNjBkVdEWL5nmI+63Tp4cnczjVrWuTFp52mh2DKVOslSithAQbOOTrr21UyIsvjklUESmcIjqBu/d+KjA13WOj0txeA2Q4/FZG24qIiBR0//xjY5W89hqsXWtF3GUXLuHqst9yzMePUHLN3/imTWH4e9Q77zyuiYuLdeQDFS2a9ykO8qMaNaz7ZufO1gqadsCW3buhRw+bbuK116Bfv1gmFZFCKKKFn4iIiBxccrINGPjyy3YNNjvB4MHQpc7vJJ7akZIbNthJeKOexXXtGuXmPMm2qlWtG+eZZ9q8FuPGWcHXq9e+H/Lll8c6pYgUQir8REREYiR1DulXXrFz9w491OYhv/JK6zlISgp0GEjK7t02iMppp6ngyw8qVLCfV7ducNFFNsH8rFnw0ks2BYWISAyo8BMREYmilBSb8m3UKDuHLzkZTj8dnnnGBnrcb5DKsWPhxx/569ZbaXz66bGKLLlRrhx88om19n32GTz/PFx1VaxTiUghpsJPREQkCtavt/E8Ro+2eaKrVIFbboEBA2xQyANs3gy33gpt27K2SxcaRz2x5Fnp0lbdL1sGjRrFOo2IFHIq/ERERCLEexvlf9QomDzZRurs0AEeeADOP9+mXMjUnXfCli0wcqQVgZI/FSumok9EAkGFn4iISJht3my9NF9+GRYvtlO+rr7aTu9q0iQbO/jlF2savPFGOO44GylSREQkD1T4iYiIhIH38PPP1ro3caJN2Xbiida984ILrNdftiQn27lgNWrAvfdGMrKIiBQiKvxERETSSUy0VrvNm2H+/PJs22a3N23a93jq7bTXO3ZA2bLQvz8MGgTNmuXiyV96CX77zeaAK18+zK9MREQKKxV+IiJS4HlvhdnSpfDXXzbQSkbFW+rt7dvTbt1iv33FxUGlSnapXBlq1bLemJUrWzfO3r1tQMdc+fdfGD7chvns1Su3L1dEROQAKvxERKRA8B42bLDibulS+PPPfbeXLoX//tt//SJFoGJFK9gqVbKelUcfva+gS71euXIunTo13ftY+fIRnEpvyBDrI/rii5qvT0REwkqFn4iI5BveW2td2qIu7e1t2/atW6QI1KtnUyVceCE0bGi3jzjCirzy5W2dg5k+fQutWkXsJe3zzTcwfjzcfbeFFRERCSMVfiIiEjgJCTB//iEsW3ZggRcfv2+9uDgr7ho2hLZtrbBLLfDq1YPixWP1CnJozx4b9vPww+GOO2KdRkRECiAVfiIiEgjx8TB1Krz/PnzyCcTHNwegaFGoX9+KuQ4d7Dq1wKtb16ZJy/eefBKWLLEDUKpUrNOIiEgBpMJPRERiZssW+Ogjm9z8889h926oVs26ZtatO5/evY+lbl0r/gqs5cthxAib0f3MM2OdRkRECqiC/KdUREQCaN06mDLFWva++QaSkqB2bRg82Gqfdu2sC+f06Zs44ohYp42CG26wkw2feSbWSUREpABT4SciIhG3cqUVepMnww8/2CAtDRrALbdAjx7QqlW6QSw/+YTDx42DNm2gRImY5Y64Dz+0y2OPWfUrIiISISr8REQkIv780wq999+HmTPtsWOPhXvusZa9Y47JYMaCtWutBWziROqAzafw4otRTh4lO3fC9dfb5H833hjrNCIiUsCp8BMRkbDwHubP39eyt2CBPX788fDII1bsZTpLQUoKvPoq3HabDek5YgQr586l9ksvWd/PCy+M2uuImgcfhH/+gW+/LSAj1IiISJCp8BMRkVzz3lrzUlv2li61Vrz27eHZZ+Hcc6FOnYPsZNEiGDjQ+oB27AgvvwxHHsmyr76i9tq1MGAANGtmLWMFxeLF8PjjcOmlNlSpiIhIhKnwExGRbEtKgrlz4fvvrU774QcbrKVoUTjlFLj1VujeHapXz8bOEhLg4YftUq4cvP469Ou3t/+nL1oU3n0Xmje3EwFnzLD18jvvbc6+MmWs+BMREYkCFX4iIpKpHTvgl1/2FXn/93/7JlCvXx86d4bTToNzzoGKFXOw42+/hUGDbO66iy6Cp56yeRzSq1kTJkywJxkwAN55J4MTA/OZd96BadNg5MiMX7OIiEgEqPATEZG9Nm60Au+dd47g9tth9mxr5XMOjjvOGuTat7fT7mrVysUTbNli5/G9+qpVjp99BmeckfU2nTrBAw/AnXfCSSfBtdfm6rUFwtatcPPNduLjgAGxTiMiIoWICj8RkULKe/j7732ted9/b6eeARQrdhgnnmg12kkn2awKFSrk8cnefddG7Ny0yXZ8zz1QunT2tr/9dvjpJyuaWrWCE0/MQ5gYuusuWL8ePvnEJisUERGJEhV+IiJBkJgI27fj9uyJ2FMkJ9uom6lF3g8/wJo1tqxCBWvF69/fCr0dO76nc+eTw/PEy5fDVVdZ616rVvD55zZYS04UKQJjx0KLFnDBBdYUWaVKePJFy+zZNjXF1VdDy5axTiMiIoWMCj8Rkdzw3k6A274988u2bVkvT3tJSACgTYUKVhi1bh2WmLt3wwcfwLhxVuht22aP16oFJ59s3TZPOgmOPtpqq1TTp/u8P3lSkg3teffd1lf0mWesm2ZuW7oqVoRJk6BtWzsvcOrU/NNqlpJixW+VKtZtVUREJMpU+ImI5MSePTbQyA8/WPGXHWXL2miU5cvbdblyULfuvtuplzJlSH7iCRse84MP4PTTcx1z8WJ45RV4803rWVmvnk2Fd9JJVuwddIqFvPr1VzuH7bffoGtXa+kKx5O2bAnPP28DwzzwgHUXzQ9efdVGJR03Lo99ZkVERHInooWfc64L8CwQB7zqvX8k3fJbgYvSZDkKqOq93+ycWw5sB5KBJO99q0hmFRHJltdft36S11yTcfGWQTG3X1PaQfxWty5t778fzj7bujb26ZPtbXftsvn0Ro+2iEWL2jx6AwfCqafmKEbuxcdbC9+zz9qIle+9Z1MxhHMkzgEDrPC+7z471+9gg8PE2oYNcMcd1sR60UUHX19ERCQCIlb4OefigBeB04FVwEzn3Ife+99T1/HePw48Hlr/HOAm7/3mNLvp5L3fGKmMIiI5sns3PPigjXTy/PMRmVZgT+XKNtVBt27WRLdx40FHsVywwFr3xo2zQTMbNIBHH7UROLM1n164fPKJnb+2YgUMHmzz80Widcs5GDXKWhMvusiua9cO//OEy+23W3fel17K/1NRiIhIvhXJ//+2BpZ675d57/cAE4DuWazfF3gngnlERPLmtddg5Uq4//7IfoFPPc/vnHPguuusBS1dt9KdO+GNN+x0t2OPtTrojDPgm29sarzbboti0bd2LfTubV06y5Sx5saRIyPbpbF0aWve3LMHevWy6yD64QdrJb7lFmjSJNZpRESkEItk4XcYsDLN/VWhxw7gnCsNdAEmp3nYA1845351zg2MWEoRkexISICHHrKT5E49NfLPV6qUFTaXXw4jRtjAIMnJzJ1rDYA1a8Jll8HmzfDEE7Bqlc0L3qlTlLp0ghWjr74KRx0FU6ZYQfzbb3aMouHII2HMGJthfsiQ6DxnTiQm2s+tTh2bxkFERCSGnM/u4AQ53bFzvYAzvPdXhu5fArT23l+Xwbq9gYu99+ekeaym936Nc64a8CVwnff+uwy2HQgMBKhevXrLCRMm7F0WHx9P2bJlw/zKci4IOYKQISg5gpAhKDmCkCG/5Djs/fdp+PzzzHnySf5r0SJ6Gbyn9sjXOOK9t/m8XDe6b3+XlGLFOfnkDXTtuobjjtsakcbHg/1MSqxdS6MnnqDSr7/yX9OmLLn5ZnaFecSY7L4vjnjxRWpPmsTCu+5iwymnhDVDTnKkV2viRBqMHMn8ESPYlMdiOD98RgpbjiBkCEqOIGQISo4gZFCO4GWIdo5OnTr9muH4KN77iFyANsDnae4PBYZmsu4HwIVZ7OteYMjBnrNly5Y+rWnTpvkgCEKOIGTwPhg5gpDB+2DkCEIG7/NBjp07va9Rw/uTT/Y+JSVqGX791ftBg7wvW9b7G3nKe/ArGnbym/7eGtEM6XPsJyXF+1GjLFSZMt6/9JL3ycnRzZDenj3et21rmRYtil2OtFautDxdu4blPRP4z0iUBSFHEDJ4H4wcQcjgfTByBCGD98oRtAzeRzcHMMtnUCtFskPQTKChc66+c6440Af4MP1KzrlDgJOB/6V5rIxzrlzqbaAzsCCCWUVEMjd6NPz7r40iGeHBOXbsiOPll23WgpYtbTqGHj2g14834ceOo/bf31Pp/I6wbl1Ec2To779tKovBg22ewQULrCtj1PqWZqJYMZg40brH9uhhI4vG2k032TyGzz2nAV1ERCQQIjaqp/c+yTl3LfA5Np3DGO/9Qufc4NDyUaFVzwO+8N7vSLN5deADZ38siwLjvfefRSqriEimdu600SlPOcWG4w+j3bth0SKYP3/fZfr0tiQk2IAtzz9vg1ZWrBjaoO3FUKWyFTft2sEXX8Dhh4c1U4ZSUmz0mNtusyLm5ZdtSoUgFTSHHQbjx0PnzlaYjhsXu3yffWYTzT/wANSvH5sMIiIi6UR0Hj/v/VRgarrHRqW7/wbwRrrHlgFNI5lNRCRbRo2y1rVJk3K9i5QUWL58/wJv/nz44w9ITrZ1ihe3MVJOO20dw4fXpHXrTOqWM8+Er7+2ef7atbMio2kEf10uWwZXXAHTp9uE8q+8YvMXBtFpp9kAM3fdZcfmqquinyEhwUbfOfLIYA44IyIihVZECz8RkXxtxw545BErKLI5OMeGDQcWeAsX2q5S1a9vLXrnn2/Xxx4LDRtaj8Xp0//ghBNqZv0kbdrYNAFnnAEdOsBHH9l1OKWkwAsv2Bx0cXFW8F1xRbBa+TJy553w009w443QqhUcf3x0n//RR+Gvv+DLL6FEieg+t4iISBZU+ImIZOall6ySu+++Axbt3Am//35gkZf21LsqVayou/zyfQXe0UdDuXJhyNakCfz4oxV/nTvDu+9C96ymSs2Bv/6i2U03wbx5tv/Ro21KgvygSBHr5tmiBfTsCbNnQ+XK0XnupUutW3CfPvbPAhERkQBR4ScikpH4eHjsMSt82rYFrLvm+PFWY82fv29O9ZIlraA788x9Bd6xx9oE6hFtIKtTxyZLP/tsaz585RWrMnMrtZVv6FDKOmcT1l92WfBb+dKrXNm65p50ElxyCXz8ceQGoElMhG++scFl3n/f+uw++WRknktERCQPVPiJiGTkhRdg40a23nwf40fC229bAxtYPXH33fsKvCOOsN6QMVGlip3z17OndcVcv966Z+a0WPvzT9v+++/hzDOZedlltOnVKzKZo+H44+GZZ+Dqq+Ghh2D48PDtOykJvv3W/gPw/vuwaZM14557rp3fV/MgXXVFRERiQIWfiEg6O9duI+7Bx5lX7Szann0CSUnWs/Khh6BvX6hXL9YJ0ylbFj78EPr3h6FDrfh74onstXIlJ9uUA8OGWWvV669Dv37s/vbbiMeOuMGD7VzIu++GE0/MW/fL5GQriidOhMmT7RiXKQPdukHv3tYyXLJk+LKLiIiEmQo/ERGsEWfGjIqMGQNHvPs89+zZzL2l7uXGG21KhaZNA97jsXhxeOstqFoVnn7azk0cM8ZGjMnMH39YV86ffrLuoi+/bNMiFBTO2fmJc+bAhRfCb7/l7PWlpMBPP9Hguees4l+71uYK7NrVir2zzrL7IiIi+YAKPxEptLyHmTOtG+e778K6dU2pXX4rI3mSjW3O4cPvj49dF87cKFLEujdWr24teJs2wXvvWctUWsnJtt7w4dZKNXYsXHxxwCvbXCpTxlroWrWCCy6waSmyKoa9h19+sTfEe+/B6tXUKF7cir0LLrDr9MdTREQkH1DhJyKFzp9/WrE3frzdTv1e37TpAoYmvU+xEVso88K9kJ+KvlTO2ZQGVataV8fTTrPBTVJHtly82Fr5fv4ZzjnHWvlq1Iht5khr3NgGqunTxyahf/rp/Zd7D7NmWTfOiRNhxQp7U3TpAo89xk8VKtD+rLNik11ERCRMVPiJSKGwbp014rz1lrXyOQcdO9o4KD16QIUK8MPHyyl28VM2SEeLFjFOnEcDBtjAL337Qvv2MHWqtWDddReULm0H4sILC2YrX0Z697bReZ55xiZ379HDuoCmFnvLlkHRojY1xogRdu5ehQoAJE+fHsPgIiIi4aHCT0QKrO3bYcoUa9376ivr4di0qc3S0Lcv1Kq1//q13nsPtm6Fe++NRdzwO+88+PxzK2IaNrQTGc89F0aOhEMPjXW66HviCZgxw1o877zTmnvj4qxVdNgwOzaVKsU6pYiISESo8BORAsF7WLnSvtf/8otdz5gBCQlQt6718LvoIptvL0ObN1Nr8mRrCWraNKrZI+rkk23qgSFDbLqGPn0KTytfesWLW6vnqafaHIi33mrFcZUqsU4mIiIScSr8RCRf+u8/67KZWuD98ot15wT7ft+8OQwaZNPbtW2bjZkNnnqKojt2wD33RDp69DVrZk2eArVr22imIiIihYwKPxEJvN27Yd68/VvylizZt7xRI5tGrXVruzRtasVftm3aBM8+y/qTT6basceGPb+IiIhIrKnwE5GDS0mxwUGKF7cqq3bt7E0Ongve26lXaVvy5syBPXtsefXqcMIJcMkldt2q1d4xOHLvySdhxw6W9+tHtTzuSkRERCSIVPiJSNY2bIB+/eDTT/c9VqqUDRbSuLEVgmkv5crlaPfbtsH//V9lvv7aCr2ZM2HLFltWurQVdjfcsK81r3btMJ+itmEDPPcc9O7Nzvr1w7hjERERkeBQ4Scimfv+exv+cuNGeP55OO4462O5eLFdz54NkyZZi2CqGjUyLgjr1iXtbOjx8fDsszbQ4n//HUuRInDMMXZOXmqR16SJjbAfUU88Abt22bl9a9dG+MlEREREYkOFn4gcKCUFHnkE7r4bDj/cJvtu1syWdeiw/7q7d8Nff1khmPby7rv7mu4ASpSAhg1JbtCIWdsbMfaXRsyKb0SXLo1o3Xk5Awc2o0yZqL1Cs349vPCCFbeNG6vwExERkQJLhZ+I7G/9ejuB7osvrCB6+eWsu2+WKGFNc02a7P+499ZSGCoEkxcuZuVXS0j+eD4tk6ZwAsm23mfw3+pjKdPjYyhTJ3KvKyOPPWbzPdx9d3SfV0RERCTKVPiJyD7Tp8OFF1pL3ejRcOWVuT+hzjmoWpXkSlV566+TuG8K/P23Ta3w0H2JnFx7mXUZnT+fsg8/DC1awPjx0LlzOF9R5tauhZdegosvhiOPjM5zioiIiMRIZIblE5H8JTkZ7r/fJrYuX96G0hwwIE+jqKSk2FzZxxwD/ftDxYo2MOgPP8DJpxWz8/66d4fhw/l11Cg7N7BLF8uR9pzBSHn0URsq9K67Iv9cIiIiIjGmwk+ksFu71ibBu+cea+2bNcsGcckl7+Hjj6FlS7jgApv1YdIk2+2ZZ2ZcS+6qXdvOI7z4YsvRtavNrRcpa9bAqFFw6aXQoEHknkdEREQkIFT4iRRiFX791QZt+ekneO01GDsWypbN9f6+/tq6cp5zjk3TMG6cTbzeo0c2Gg/LlIE337SC7OuvrXKcNSvXWbL0yCOQlATDh0dm/yIiIiIBo8JPpDBKToZ77qHprbdCpUo2ed7ll+e6a+f//R+ccgqcdhqsWmXjwSxebA14aWZwODjnYNAg6w/qPbRrZzvzPle5MrRqlZ2/2K+fjVgqIiIiUgio8BMpbNassQrt/vtZe8YZVvQdfXSudvXbb9Yrs21bWLgQnnkG/vwTBg6EYsXykPH4422OwFNOgcGD7STBnTvzsMM0Hn7YCl+19omIiEghosJPpDD54gvr2jljBrzxBktuv53cTJ63aBH06mUDcf74Izz0kE3ld8MNULJkmLJWrgyffAL33Wd9Rk880arKvFixAl591Vo369ULS0wRERGR/ECFn0hhkJQEw4bZqJnVqlkrX79+Od7NsmW22THHwGef2YCYf/8NQ4fm6dTAzBUpYnPsffoprF4NrVrBBx/kfn8PPWTdRocNC19GERERkXxAhZ9IQbdqlXWZfOgha+maMePAydYz4b0Vdu+8Y5s2agQTJ8LNN1sReP/9UKFCZOMDNurob79B48Zw/vlw661WzObE8uUwZozNTVgnyhPFi4iIiMRYRCdwd851AZ4F4oBXvfePpFt+K3BRmixHAVW995sPtq2IZMOnn8Ill0BCArz1Flx0UZarb9tmjYG//GKzK/z8M2zYYMtKl7Zz94YNg5o1o5A9vTp14LvvrOp84gkrYCdMsPn/suPBB23wmDvvjGxOERERkQCKWOHnnIsDXgROB1YBM51zH3rvf09dx3v/OPB4aP1zgJtCRd9BtxWRLCQm2uAljz1mc/JNnGjNdWkkJ8OyZWX48899hd7vv+8bQLNxYzjrLDu17oQTrHtnngZsCYcSJeDFF200mYED7STDd9+FDh2y3u7vv+GNN2ygmFq1ohJVREREJEgi2eLXGljqvV8G4JybAHQHMive+gLv5HJbEUm1YgX07Wtz8w0aBE8/DaVKsXatFXipRd7MmRAffzxgMzqccIJNuH7iiTaoZsWKMX4dWbnoImja1CYIPOUUm5fvllsyn47igQdsXomhQ6ObU0RERCQgIln4HQasTHN/FXBCRis650oDXYBrc7qtiISkpFh3zptuwu/Zw5/3vcMn5frwy2VW6P3zj61WtKgN7NmvHxxyyCL69z+KBg1yPYVf7BxzzL75B2+91Qrd11+HQw7Zf72lS21i+GuvjVEfVREREZHYcz6cEyOn3bFzvYAzvPdXhu5fArT23l+Xwbq9gYu99+fkYtuBwECA6tWrt5wwYcLeZfHx8ZSNyFCDOROEHEHIEJQcQcgQ7hwVZ82izgujqfjPn8wt2ZI+iW+zONm6dlavnsBRR23jqKO20aTJNho2jKdEiZSwZ8iLPOXwnlqTJnHEqFHsqlGDhffdx44jjti7uPEjj1B12jR+GT+ePZUrRy5HmAQhQ1ByBCFDUHIEIYNyBC9DUHIEIUNQcgQhg3IEL0O0c3Tq1OlX732rAxZ47yNyAdoAn6e5PxQYmsm6HwAX5mbbtJeWLVv6tKZNm+aDIAg5gpDB+2DkCEIG78OTY/uPc/3KY87wHvwy6vnevONPapvsb7/d+w8+8H7NmshnCIew5PjuO+9r1PC+VCnvx461x5Ys8b5IEe9vvjl6OfIoCBm8D0aOIGTwPhg5gpDBe+UIWgbvg5EjCBm8D0aOIGTwXjmClsH76OYAZvkMaqVIdvWcCTR0ztUHVgN9gAvTr+ScOwQ4Gbg4p9uKFEbJyfD9O6tw99xF+2VvsocKPFzlSdy11/Bo/xLUrRvrhDHSvj3Mnm3nN156qc0s/99/NiDMbbfFOp2IiIhITEWs8PPeJznnrgU+x6ZkGOO9X+icGxxaPiq06nnAF977HQfbNlJZRfKDBQtg4itbqfLaowzY8TQOz9fNhlDh0aHccXrF/HeOXiQceih8+aWNaProo/bYkCFQvXpsc4mIiIjEWETn8fPeTwWmpntsVLr7bwBvZGdbkcJmwwYYPx7Gv7GH1nNe5m7upyobWdH+Iqq/+iCnH1lYm/eyULSojfLZpo1N2K7WPhEREZHsFX7OuTLALu99inPuSKAx8Kn3PjGi6UQKod274aOPYOxY+HSqp3vyZN4rMZQ6LGVP+1Pgmcep06JFrGMGX/fudhERERERimRzve+Aks65w4CvgcvIoJVORHLHe5ty4aqroEYN6NUL3E8/srR6WybRizoNS8LUqRT/9iubtFxEREREJAey29XTee93OueuAJ733j/mnPstksFECoN//oFx46x1788/oVQpuOa0JQzZPJTqP35g88699ppNuhcXF+u4IiIiIpJPZbvwc861AS4CrsjhtiKSRmIivPsuPPlkU+bMscc6doT7rl5HjwX3UfyN0VC6NDz4INx4o90WEREREcmD7BZvN2Jz6X0QGpnzcGBaxFKJFEC7dlnj3eOPw4oVUKtWCUaMgEt77KDOpKfgrscgIQEGD4a774Zq1WIdWUREREQKiGwVft77b4FvAZxzRYCN3vvrIxlMpKDYuhVeegmeftpG6WzXzu6XLvF/dFr+N5x6N/z7L/ToAQ89BEceGevIIiIiIlLAZGtwF+fceOdc+dDonr8DS5xzt0Y2mkj+tm4dDB0KderAnXdCy5bw3Xfwww9wdqlvaD3gShgwAOrXt8nGJ01S0SciIiIiEZHdUT2beO+3Aedic+vVAS6JVCiR/Gz5crj2WqhXz+YQP+MMmD0bPv0U2rcHfvkFzjwTl5gIkydbJdi2bYxTi4iIiEhBlt3Cr5hzrhhW+P0vNH+fj1gqkXzo99/h0kuhQQMYPRouuggWL4aJE6F589BK69ZZl87DDmP2Sy/B+eeDczHNLSIiIiIFX3YHd3kZWA7MBb5zztUFtkUqlEh+MmMGPPwwTJliA3Bedx3ccgvUqpVuxcRE6N0bNm+Gn34i6b//YpBWRERERAqjbLX4ee+f894f5r0/y5t/gE4RziYSWN7D11/DaafBCSfA9Ok2EOc//9ggLgcUfQC33QbffguvvALNmkU5sYiIiIgUZtlq8XPOHQLcA3QIPfQtcD+wNUK5RAIpJQU+/NBa+GbMgBo1bHqGQYOgXLksNhw/Hp55Bm64wfqAioiIiIhEUXbP8RsDbAcuCF22Aa9HKpRI0CQmwtixcOyxcN55sHEjjBoFy5bBkCEHKfrmzoUrr4QOHaxKFBERERGJsuye43eE975Hmvv3OefmRCCPSKDs2gVjxli99s8/VviNHw+9ekHR7Hx6Nm+2SrFiRRvlpVixiGcWEREREUkvu4XfLufcSd77HwCcc+2AXZGLJRJ7U6fC5ZfbQJxt28ILL8DZZ+dgEM7kZOvWuWqVTeBXvXpE84qIiIiIZCa7hd9gYGzoXD+ALUC/yEQSia2UFLj/frscd5w11LVvn4tZF+65Bz77DF5+GU48MSJZRURERESyI1uFn/d+LtDUOVc+dH+bc+5GYF4Es4lE3ZYtcPHF1tp3ySV2Hl/p0rnY0ZQp8OCDcMUVMGBAuGOKiIiIiORIdgd3Aazg896nzt93cwTyiMTMnDnQqhV8+SW89BK8+WYui77Fi20m9+OPt/6hmqBdRERERGIsR4VfOvo2K8GQkmKjsOTBuHHQpg0kJNhUe1ddlct6bft2G8ylZEmYPNmuRURERERiLC+Fnw9bCpHcSkiAzp3hsMOs0MqhPXvgmmusge7EE2H2bCsAc8V76N8f/vzTTgysXTuXOxIRERERCa8sCz/n3Hbn3LYMLtuBmlHKKJKxxES44AL4+msbMbNnTzufbseObG2+ejWcfLJ167zlFuvimaeBNx95BN5/3+Z+6NgxDzsSEREREQmvLAs/73057335DC7lvPfZHRFUJPxSUqx17aOP4MUXYd48GDoUXnsNWrSAX3/NcvNvv7XV5s+3xrknnsjmvHyZ+eILGDYM+vSBG2/Mw45ERERERMIvL109RWLDe7j2WptJ/aGH4OqrbWL0hx6Cb76BnTutv+Zjj1mBmG7TiRNrceqpNqf6jBk2GXue/P23FXzHHguvvqrBXEREREQkcFT4Sf4zbBiMHAm33QZ33LH/so4dYe5c6N4dbr8dTjvNJlAH4uOtPhs5sgHdu1vR16RJHrPs3Annn28V5fvvQ5kyedyhiIiIiEj4qfCT/OXRR+Hhh2HQIDunLqPWtUqVrP/ma69ZdXfccax+/n1at4ZJk2DgwL+YNAnKl89jFu8tx9y51vp4xBF53KGIiIiISGSo8JP8Y9Qoa+Hr29fO68uqS6VzcPnl8NtvbKl0BIdd34Ohywbw1f920LfvyvD0xnz+eXjrLbj/fjjzzDDsUEREREQkMlT4Sf4wfrydy3f22TazelzcQTdJSoKhYxpS/a8feePQO7h4z2t0uqUFZZcsyXue776zoUC7dYM778z7/kREREREIiiihZ9zrotzbolzbqlz7o5M1unonJvjnFvonPs2zePLnXPzQ8tmRTKnBNxHH9lEex06wHvv2UAuB7FhA3TpYr1BLx9UnL7LH8Z9/TXs2EGLa6/NcOCXbFu92kaEOfxwGDsWiuj/JyIiIiISbBH7xuqciwNeBM4EmgB9nXNN0q1TAXgJ6Oa9PxpIP75iJ+99M+99q0jllICbNs2KrBYt4MMPoVSpg24yYwa0bAk//ABjxlgP0RIlgE6dYN48NrVtawO/nH66FXE5sXu3zRe4cyd88AEcckjuXpeIiIiISBRFsqmiNbDUe7/Me78HmAB0T7fOhcD73vsVAN779RHMI/nNjBnWlfKII+DTTw86Gov3MHo0tG9vjXA//QSXXZZupUqVWHjvvTbtws8/w3HHWQGXXTfcYNu98UYYhgQVEREREYmOSBZ+hwEr09xfFXosrSOBis656c65X51zl6ZZ5oEvQo8PjGBOCaIFC2zAlKpV4csvoXLlLFdPSIArr7RBNjt1svnbW7TIZGXn4Ior4LffoH59m45h4EDYsSPrTK+9Bi+/bAPM9OiRu9clIiIiIhIDznsfmR071ws4w3t/Zej+JUBr7/11adZ5AWgFnAqUAv4PONt7/4dzrqb3fo1zrhrwJXCd9/67DJ5nIDAQoHr16i0nTJiwd1l8fDxly5aNyOvLiSDkCEKG7OYouXo1zW+4AZzjt2efJaFmzSzXX7u2BPfccwx//FGOSy5ZTr9+y7Mc+yVtBpeYSP3XX6f2hAnsqlWL34cNI75RowO2KbdoEc1vuIH/mjZl3iOPZGtwmYMJws8kCBmUI3gZgpIjCBmCkiMIGZQjeBmCkiMIGYKSIwgZlCN4GaKdo1OnTr9meKqc9z4iF6AN8Hma+0OBoenWuQO4N83914BeGezrXmDIwZ6zZcuWPq1p06b5IAhCjiBk8D4bOVat8r5ePe8rV/Z+4cKD7m/RIu8PO8z78uW9/9//8pDhm29sR8WKef/oo94nJ+9btm6d97VqWa6NG7P3JLnNEWVByOC9cgQtg/fByBGEDN4HI0cQMnivHEHL4H0wcgQhg/fByBGEDN4rR9AyeB/dHMAsn0GtFMmunjOBhs65+s654kAf4MN06/wPaO+cK+qcKw2cACxyzpVxzpUDcM6VAToDCyKYVYJg40YbcGXTJvjss4OeQzdvHpx8MiQm2kAu3brl4bk7dbKJ2M85xwZ+6dzZBn5JSoLevS3b++8ftMupiIiIiEgQFY3Ujr33Sc65a4HPgThgjPd+oXNucGj5KO/9IufcZ8A8IAV41Xu/wDl3OPCBs1m2iwLjvfefRSqrBMC2bTb/wt9/W9HXKuuBXGfOhDPOgNKl4euvIYPemTlXuTJMmmTn8t1wgw380r49TJ9u0zY0bx6GJxERERERib6IFX4A3vupwNR0j41Kd/9x4PF0jy0DmkYymwTIrl3W0jZ3LkyZYs14WfjhBzjrLKhSxYq++vXDmMU5GyWmfXu46CL43//guuvgkkvC+CQiIiIiItEV0cJP5KD27LF58b7/HsaPh7PPznL1r76yLp116tjtWrUilKtRI5sPYto0OOWUCD2JiIiIiEh0RPIcP5GsJSfDpZfC1Kk2y3qfPlmu/tFH0LUrNGgA334bwaIvVfHi1p+0WLEIP5GIiIiISGSp8JPY8B6uugrefRcee8zm0cvCxIk23d5xx9kpd9WrRyemiIiIiEhBoMJPos97uO02eOUVGDYMbr01y9XffBP69oUTT7TunZUqRSmniIiIiEgBocJPou/hh+GJJ+Caa2DEiCxXHTkS+ve30+w++wzKl49ORBERERGRgkSFn0TVYR98YK18F18Mzz1no2hm4skn4eqrbcDPjz6CMmWiGFREREREpABR4SfRM24cDZ97Drp3h9dfhyIZv/28h/vvhyFD4IILYPJkKFkyyllFRERERAoQFX4SHT/9BJdfzpbmzWHCBCia8Uwi3sMdd8A990C/fjbDgwbVFBERERHJGxV+EnkbN0Lv3lCnDgvvvz/T5ruUFJsr/bHHbMDPMWMgLi7KWUVERERECiBN4C6RlZICl1wC69fD//0fSdu2ZbhacjIMGGA9QIcMseIvi9P/REREREQkB9TiJ5H16KM2HOczz0CLFhmukpgIF11kRd8996joExEREREJN7X4SeR8+y0MHw59+sDgwRmukpBgvUA//NAKvoNM6SciIiIiIrmgwk8iY906K/gaNIDRozNswtu5E849F778El54wab1ExERERGR8FPhJ+GXnAwXXgj//QdffAHlyh2wyvbt0LUr/PCDDeJy2WXRjykiIiIiUlio8JPwu/9++OYbq+iOPfaAxVu2QJcuMHu2TdfQu3cMMoqIiIiIFCIq/CS8vvwSRoywSfgyaMbbsqUYnTrBokU2MXu3bjHIKCIiIiJSyKjwk/BZvdqG52zSBF58McPFN97YjA0b4KOPoHPnGGQUERERESmENJ2DhEdSkg3msnMnvPcelCmzd5H3Nmpn27awYUMJPvtMRZ+IiIiISDSp8JPwGD7cRmoZPRqOOmrvw4sXw5lnQvfuULYsPP30XDp0iGFOEREREZFCSIWf5N0nn9hE7YMG2WiewNatcMstNrbLzz/b/O1z5kCjRttjGlVEREREpDDSOX6SN//8A5dcAs2awTPPkJICb74Jd9wBGzbAlVfCgw9C1aqxDioiIiIiUnip8JPc27PH5mJISoL33uOXuSW5/nqYMQPatIGpU6Fly1iHFBERERERdfWU3Lv9dvjlF7Y8NYbLHmzAiSfCypUwbhz8+KOKPhERERGRoFCLn+TO++/DM88w+6Tr6XhzTxISrA4cNgzKlYt1OBERERERSUuFn+TcX3+ReOnl/F6yNSf+8Didz4ann4aGDWMdTEREREREMqLCT3Lkr4UJ+HYXUHmH48Z67/LBC8U5++xYpxIRERERkazoHD/Jlvh4uPNO+Oq4m2mwdTbT+r3J50vqqegTEREREckHIlr4Oee6OOeWOOeWOufuyGSdjs65Oc65hc65b3OyrUSe9zB+PDRqBH8//A6DUkYSf9WtnP9GN4oXj3U6ERERERHJjogVfs65OOBF4EygCdDXOdck3ToVgJeAbt77o4Fe2d1WIu+336B9e7joIjix4hLeKj0Q2rWj7LMPxjqaiIiIiIjkQCRb/FoDS733y7z3e4AJQPd061wIvO+9XwHgvV+fg20lQjZsgEGDbDqGP/6A11/cyaQivYgrXRImTIBixWIdUUREREREcsB57yOzY+d6Al2891eG7l8CnOC9vzbNOs8AxYCjgXLAs977sdnZNs0+BgIDAapXr95ywoQJe5fFx8dTtmzZiLy+nAhCjuxmmDPnEO666xh27izK+eevol+/f2j50kMc+tlnzHvkEba0bh2VHJEUhAxByRGEDMoRvAxByRGEDEHJEYQMyhG8DEHJEYQMQckRhAzKEbwM0c7RqVOnX733rQ5Y4L2PyAXrtvlqmvuXAM+nW+cF4GegDFAF+BM4MjvbZnRp2bKlT2vatGk+CIKQIzsZVqzwvkoV7xs18n7hwtCDr7/uPXg/fHjUckRaEDJ4H4wcQcjgvXIELYP3wcgRhAzeByNHEDJ4rxxBy+B9MHIEIYP3wcgRhAzeK0fQMngf3RzALJ9BrRTJ6RxWAbXT3K8FrMlgnY3e+x3ADufcd0DTbG4rYZSQAD16wO7dMGUKNG4MLFgAV18NnTrBvffGOKGIiIiIiORWJM/xmwk0dM7Vd84VB/oAH6Zb539Ae+dcUedcaeAEYFE2t5Uwuv56mDkT3nwzVPTFx0OvXlC+vA3rGRcX64giIiIiIpJLEWvx894nOeeuBT4H4oAx3vuFzrnBoeWjvPeLnHOfAfOAFKx75wKAjLaNVNbC7pVX7DJ0KJx3HjaHw6BBNrLLV1/BoYfGOqKIiIiIiORBJLt64r2fCkxN99iodPcfBx7PzrYSfjNmwLXXwumnw4gRoQdfecVa+UaMsG6eIiIiIiKSr0V0AncJtvXr7by+GjXgnXdCvTkXLLB+n2ecAXfeGeuIIiIiIiISBhFt8ZPgSkqC3r1h40b48UeoXDm04J57oGRJGDcOiuj/AiIiIiIiBYG+2RdSd9wB06fDqFHQokXowd9/h/ffh+uug6pVYxlPRERERETCSIVfITRxIjz5JFxzDfTrl2bBww9D6dJwww0xyyYiIiIiIuGnwq+QWbAALr8c2raFp55Ks2DZMjvRb/BgqFIlZvlERERERCT8VPgVIv/9B+efD+XKwXvvQfHiaRY+9piN7nLLLbGKJyIiIiIiEaLBXQqJlBS49FL4+2/45huoWTPNwtWr4fXX4bLL0i0QEREREZGCQIVfIfHWW3X56CN47jlo3z7dwqeeguRkuP32mGQTEREREZHIUlfPQuDTT+GNN+px8cU2Wft+Nm60oT0vvBDq149JPhERERERiSwVfgXcX39ZTXf44Tt4+WVwLt0Kzz4LO3fa/A4iIiIiIlIgqfArwHbutMFcnIP7719A6dLpVti2DZ5/3lZq0iQmGUVEREREJPJ0jl8B5T0MGADz58PUqVCyZMKBK730EmzdCnfeGf2AIiIiIiISNWrxK6Cefx7Gj4f774cuXTJYYedOG9TljDOgZcuo5xMRERERkehR4VcAffedTcfXrVsWjXmvvQYbNsCwYVHNJiIiIiIi0afCr4BZvRouuMAG6Bw7Fopk9BPes8cmbG/fPoO5HUREREREpKDROX4FyJ490LMnxMfD11/DIYdksuK4cbBqFbzySlTziYiIiIhIbKjwK0Buugl+/hkmToSjj85kpeRkeOQRO6/vjDOimk9ERERERGJDhV8B8cYbNkjnrbdCr15ZrPjee7B0KUyenMGkfiIiIiIiUhDpHL8CYPZsGDwYTjkFHnooixVTUmyFo46Cc8+NVjwREREREYkxtfjlcxs32vzr1arBhAlQNKuf6Mcf28R+mY76IiIiIiIiBZEKv3wsORn69oV//4UffoCqVbNY2Xt48EEb7rNv36hlFBERERGR2FPhl48NHw5ffQWvvgrHH5/1uhVmz4YZM2DUqIM0C4qIiIiISEGj/n751Pvv2+CcAwfCFVccfP26b78NNWpAv36RDyciIiIiIoGipp98aOFCq99at4bnnsvGBv/3f1T87Td48kkoWTLi+UREREREJFjU4pfPbN4M3btD2bLW6leiRDY2eughEsuXt+ZBEREREREpdFT45SNJSTYuy4oVNg3fYYdlY6O5c+Hjj1nVo4dViyIiIiIiUuhEtPBzznVxzi1xzi11zt2RwfKOzrmtzrk5ocvdaZYtd87NDz0+K5I584uhQ+GLL2yi9rZts7nRww9DuXKsPu+8iGYTEREREZHgitg5fs65OOBF4HRgFTDTOfeh9/73dKt+773vmsluOnnvN0YqY37y9tvwxBNw9dVw5ZXZ3OiPP2DiRLj9dpLKlYtoPhERERERCa5Itvi1BpZ675d57/cAE4DuEXy+AuvXX63Y69ABnnkmBxs+8oidBHjjjRFKJiIiIiIi+UEkC7/DgJVp7q8KPZZeG+fcXOfcp865o9M87oEvnHO/OucK7agk69fDeedBtWrw3ntQrFg2N1yxAsaNgwEDoHr1iGYUEREREZFgc977yOzYuV7AGd77K0P3LwFae++vS7NOeSDFex/vnDsLeNZ73zC0rKb3fo1zrhrwJXCd9/67DJ5nIDAQoHr16i0nTJiwd1l8fDxlAzCgSW5zJCY6hgxpypIl5Xj++d9o2DA+29s2eO45an70Eb+8/Ta7q1XL98eioGUISo4gZFCO4GUISo4gZAhKjiBkUI7gZQhKjiBkCEqOIGRQjuBliHaOTp06/eq9b3XAAu99RC5AG+DzNPeHAkMPss1yoEoGj98LDDnYc7Zs2dKnNW3aNB8Euc1x1VXeg/fjx+dww7VrvS9Z0vsrrshzhnALQo4gZPA+GDmCkMF75QhaBu+DkSMIGbwPRo4gZPBeOYKWwftg5AhCBu+DkSMIGbxXjqBl8D66OYBZPoNaKZJdPWcCDZ1z9Z1zxYE+wIdpV3DOHeqcc6HbrbGup5ucc2Wcc+VCj5cBOgMLIpg1cF55BUaOhNtusykccuTpp2HPHrj99ohkExERERGR/CVio3p675Occ9cCnwNxwBjv/ULn3ODQ8lFAT+Aq51wSsAvo4733zrnqwAehmrAoMN57/1mksgbNjz/CNddAly7w0EM53HjLFpvv4YILoGHDiOQTEREREZH8JWKFH4D3fiowNd1jo9LcfgF4IYPtlgFNI5ktqFatgh49oG5dGD8e4uJyuIPnn4ft223SPxERERERESJc+EnO7NplI3ju2AHffAMVK+ZwB/Hx8OyzcM45cNxxEckoIiIiIiL5jwq/gPAeBg2CWbNgyhRo0iQXO3n5Zdi8GYYNC3c8ERERERHJxyI5uIvkwDPP2LR7990H3XMzzX1CAjzxBJx6KpxwQrjjiYiIiIhIPqYWvwD46isYMsS6eQ4fnsudvP46rF0Lb78d1mwiIiIiIpL/qcUvxpYtg9694aij4M03oUhufiKJifDYY3DiidCpU9gzioiIiIhI/qYWvxiKj7dund7D//4H5crlckfvvAPLl9uInjYFhoiIiIiIyF4q/GLEe+jfH37/HT77DI44Ipc7SkmBhx+2UTzPPjucEUVEREREpIBQ4RcjDz4IkyfDk0/C6afnYUcffACLF8OECWrtExERERGRDOkcvxj46CO46y64+GK46aY87Mh7qyCPPBJ69gxbPhERERERKVjU4hdlixbBRRdBq1YwenQeG+k++wx++w3GjIG4uLBlFBERERGRgkUtflH03382mEupUvD++3adJw89BLVrWyUpIiIiIiKSCbX4RUlyMlx4oQ2++c03Vq/lyXffwQ8/2EiexYuHI6KIiIiIiBRQavGLkuHD4dNPrU476aQ87uzzz+GKK6BaNbsWERERERHJggq/KPjmm6o88ggMGmSXXFu0CM46C7p0sYFdJkwIQ39REREREREp6FT4RdicOfDYY4056SR47rlc7mTTJrjuOjj2WPjpJ3jiCVi4EDp1CmdUEREREREpoHSOXwRt2ADnngvlyycyaVJczk/F27MHXnwR7r8ftm2DwYPh3nuhatUIpBURERERkYJKhV8E7doFNWpA//4LqV69ZfY39B4+/hhuuQX+/BM6d4annoKjj45cWBERERERKbDU1TOC6tSxnpmNGm3P/kbz5sHpp0O3bjY33yef2Hx9KvpERERERCSXVPhFWLYnaF+3DgYOhObNbVL255+3IvCss/I4y7uIiIiIiBR26uoZawkJ8Oyz8OCD1jf0+uvhrrugUqVYJxMRERERkQJChV+seA+TJ8Ntt8Hff8M558Djj0OjRrFOJiIiIiIiBYy6esbCr7/CySdDr15Qpgx8+SV8+KGKPhERERERiQgVftG0Zg307w/HHw+LF8OoUXY+32mnxTqZiIiIiIgUYOrqGQVFEhJgxAh45BFISoJbb4U774RDDol1NBERERERKQRU+EVSSgq88w6tb7rJZnPv0QMeewwOPzzWyUREREREpBBR4RdJ8+bBxReT2LAhJSdNgg4dYp1IREREREQKIRV+kdSsGUybxq8pKXRU0SciIiIiIjES0cFdnHNdnHNLnHNLnXN3ZLC8o3Nuq3NuTuhyd3a3zTc6doQiGkNHRERERERiJ2Itfs65OOBF4HRgFTDTOfeh9/73dKt+773vmsttRURERERE5CAi2RTVGljqvV/mvd8DTAC6R2FbERERERERSSOShd9hwMo091eFHkuvjXNurnPuU+fc0TncVkRERERERA7Cee8js2PnegFneO+vDN2/BGjtvb8uzTrlgRTvfbxz7izgWe99w+xsm2YfA4GBANWrV285YcKEvcvi4+MpW7ZsRF5fTgQhRxAyBCVHEDIEJUcQMihH8DIEJUcQMgQlRxAyKEfwMgQlRxAyBCVHEDIoR/AyRDtHp06dfvXetzpggfc+IhegDfB5mvtDgaEH2WY5UCU323rvadmypU9r2rRpPgiCkCMIGbwPRo4gZPA+GDmCkMF75QhaBu+DkSMIGbwPRo4gZPBeOYKWwftg5AhCBu+DkSMIGbxXjqBl8D66OYBZPoNaKZJdPWcCDZ1z9Z1zxYE+wIdpV3DOHeqcc6HbrbGup5uys62IiIiIiIhkT8RG9fTeJznnrgU+B+KAMd77hc65waHlo4CewFXOuSRgF9AnVKVmuG2ksoqIiIiIiBRkEZ3A3Xs/FZia7rFRaW6/ALyQ3W1FREREREQk5zSzuIiIiIiISAGnwk9ERERERKSAi9h0DrHgnNsA/JPmoSrAxhjFSSsIOYKQAYKRIwgZIBg5gpABlCNoGSAYOYKQAYKRIwgZQDmClgGCkSMIGSAYOYKQAZQjaBkgujnqeu+rpn+wQBV+6TnnZvmM5rAohDmCkCEoOYKQISg5gpBBOYKXISg5gpAhKDmCkEE5gpchKDmCkCEoOYKQQTmClyEoOdTVU0REREREpIBT4SciIiIiIlLAFfTCb3SsA4QEIUcQMkAwcgQhAwQjRxAygHKkFYQMEIwcQcgAwcgRhAygHGkFIQMEI0cQMkAwcgQhAyhHWkHIAAHIUaDP8RMREREREZGC3+InIiIiIiJS6BWows8518s5t9A5l+Kcy3TUHOfccufcfOfcHOfcrDA9dxfn3BLn3FLn3B0ZLHfOuedCy+c551qE43nTPccY59x659yCTJZ3dM5tDb3uOc65uyOQoaRzboZzbm7oZ3FfButE/Fikea4459xvzrmPM1gW8eMRep4KzrlJzrnFzrlFzrk26ZZH9Hg45xqleY1znHPbnHM3plsnWsfiBufcgtB748YMlkfkWGT02XDOVXLOfemc+zN0XTGTbcPy+yKTDI+H3hfznHMfOOcqZLJtlr9fwpBjRCjDHOfcF865mplsG8ljca9zbnWa9+BZmWwb0WMRevy60HMsdM49lsm2kTwWzZxzP6fu2znXOpNtI/2+aOqc+7/Q6/zIOVc+k23DdSxqO+emhX5PLnTO3RB6PLt/28NyPLLIkd3Pa56PR2YZ0iwf4pzzzrkqmWwf6WOR3c9rxI6Fc+7dNM+/3Dk3J5PtI30ssvt5DdfnJMPvWS77f9PyfDyyyBDV7+FZ5Mju37SIHYs0yw/2WQ17TZIl732BuQBHAY2A6UCrLNZbDlQJ4/PGAX8BhwPFgblAk3TrnAV8CjjgROCXCLz+DkALYEEmyzsCH0f4Z+CAsqHbxYBfgBOjfSzSPNfNwPiMXnc0jkfoed4ErgzdLg5UiOHxiAPWYvO7RPu9cQywACgNFAW+AhpG41hk9NkAHgPuCN2+A3g0k23D8vsikwydgaKh249mlCE7v1/CkKN8mtvXA6NicCzuBYZk4/0b6WPRKfTeLBG6Xy0Gx+IL4MzQ7bOA6TE6FjOBk0O3LwdGRPhY1ABahG6XA/4AmpCNv+3hPB5Z5Djo5zVcxyOzDKH7tYHPsXmLD3ieKB2Lg35eo3Es0qzzJHB3jI7FQT+vYf6cZPg9i2z8TQvX8cgiQ1S/h2eR46B/0yJ9LEL3s/yshvNYZPdSoFr8vPeLvPdLYvDUrYGl3vtl3vs9wASge7p1ugNjvfkZqOCcqxHOEN7774DN4dxnLjJ473186G6x0CX9iaQRPxYAzrlawNnAq+Hedw4ylMe+UL0G4L3f473/L91qUTkeIacCf3nv/4nQ/rNyFPCz936n9z4J+BY4L906ETkWmXw2umNFOaHrc/P6PDnN4L3/InQsAH4GamWwaXZ+v+Q1x7Y0d8tw4Gc2rPLwuyrixwK4CnjEe787tM763O4/Dxk8kNq6dgiwJoNNo3EsGgHfhW5/CfTI7f6zmeFf7/3s0O3twCLgsGz+bQ/b8cgiR3Y+r2GRWYbQ4qeB28j8cxrxY5GbfeXWwTI45xxwAfBOBptH41hk5/MaNll8z8rO37SwHI/MMkT7e3gWObLzNy2ixyJ0/2Cf1agrUIVfDnjgC+fcr865gWHY32HAyjT3V3HgL8bsrBMNbULN0Z86546OxBM46145B1gPfOm9/yXdKtE6Fs9gH7iULNaJ9PE4HNgAvO6sy+mrzrky6daJ5nujDxn/cYTIH4sFQAfnXGXnXGnsP6O1060TzWNR3Xv/L9gfdKBaJuuF+/dFZi7HWjvTi8oxcc496JxbCVwEZNbVN9LH4tpQ95wxmXRTisaxOBJo75z7xTn3rXPu+EzWi+SxuBF4PPTzeAIYmsE60TgWC4Buodu9OPDzmirsx8I5Vw9ojv33PDsicjyyyJHZ5xXCfDzSZnDOdQNWe+/nZrFJtI7FwT6vEMFjkebh9sA67/2fGWwSjWNxIwf/vEIYj0Um37Oy8zctbMcjG9/1shLpY5Gdv2kRPRbZ/KxC9L5jAPmw8HPOfeXsHKH0l5xU6e289y2AM4FrnHMd8horg8fSV/fZWSfSZmNd/JoCzwNTIvEk3vtk730z7L+hrZ1zx6RbJeLHwjnXFVjvvf81i9WicTyKYt2nRnrvmwM7sC4Y+8XNYLuwvzecc8WxL3HvZbA44sfCe78I6x71JfAZ1q0iKd1qQficpBfu3xcHcM4Nw47F2xktzuCxsB8T7/0w733tUIZrM1ktksdiJHAE0Az4F+u6lV40jkVRoCLWXehWYGKoRSG9SB6Lq4CbQj+Pmwj1GEgnGsficuy1/Yp1bduTyXphPRbOubLAZODGdP+5z3KzDB7L0/HILMdBPq8QxuORNkPoOYeR+T9m9m6WwWPhPhbZ+bxChI5FuvdFXzL/h2Y0jkV2Pq8QxmORje9ZmcbPaHdRzgBROBbZ+JsWyWNxHNn7rEIUvmOkle8KP+/9ad77YzK4/C8H+1gTul4PfIA19+bFKvb/T2gtDmzqz846EeW935baHO29nwoUy+xk0zA9339YP+8u6RZF41i0A7o555ZjzfenOOfeSpcvGsdjFbAqzX/CJmGFYPp1ovHeOBOY7b1fl35BtN4b3vvXvPctvPcdsG5l6f9DG83PybrUbqSh6wy79EXg98V+nHP9gK7ARd77jP7oRPt3x3gy6dIXyWPhvV8X+uOZArySyb6jcSxWAe+Huu/MwHoMHPBZiPD7oh/wfuj2e5nsO+LHwnu/2Hvf2XvfEvti/Vcm64XtWDjnimFfqt/23r9/sPXTCOvxyCxHNj6vYTseGWQ4AqgPzA39basFzHbOHZpu04gfi2x+XiN5LFIfLwqcD7ybyabReF9k5/Makd8Z6b5nZedvWth/b2TxXS+rbSJ9LNLK7G9aJI9Fd7L3WY34d4z08l3hl1fOuTLOuXKpt7GTtTMcBTMHZgINnXP1Q60qfYAP063zIXCpMycCW1Ob5KPFOXdo6n+unY06VQTYFObnqOpCI50550oBpwGL060W8WPhvR/qva/lva+H/Ty+8d5fnC5rxI+H934tsNI51yj00KnA7+lWi9Z7I9P/ikbjWIT2XS10XQf7Y50+TzQ/Jx9if7AJXR/wz6MI/b5Iu/8uwO1AN+/9zkxWy87vl7zmaJjmbjcO/MxG41ikPZfzvEz2HfFjgbV2nxLKdCR20v/GdFkjeiywLx8nh26fwoH/IIHovC9SP69FgOHAqAzWCduxCP0Oeg1Y5L1/Koebh+14ZJYjO5/XcB2PjDJ47+d776t57+uF/ratwgYbWZtu82gci4N+XiN5LNI4DVjsvV+VyeYRPxZk4/Ma5s9JZt+zDvo3jTAdj2x+18ts24gfi+z8TSOyx+K37HxWo/C35EA+SqPIROOC/fJZBewG1gGfhx6vCUwN3T4c62I2F1gIDAvTc5+FjfD0V+o+gcHAYL9v1J8XQ8vnk8VoR3nI8A7W5SIxdByuSJfh2tBrnoudmN42AhmOA34D5mFv3rtjcSzSZepIaMTKaB+P0PM0A2aFjskUrBtZtN8bpbFC7pA0j8XiWHyPFb5zgVOj9d7I5LNRGfga+yP9NVAptG5Efl9kkmEpdo7BnNBlVPoMofsH/H4Jc47Joc/rPOAjbCCLaB+LcaGf+Tzsj2+NGB2L4sBboeMxGzglBsfiJODX0P5/AVrG6FjcENr/H8AjgIvwsTgJ62o1L81n4iyy8bc9nMcjixwH/byG63hkliHdOssJjQYYg2Nx0M9rNI4F8Aahvx9p1o/2sTjo5zVcxyK0r8y+Zx30b1q4jkcWGaL6PTyLHAf9mxbpY5Gdz2o4j0V2L6m/xEVERERERKSAKnRdPUVERERERAobFX4iIiIiIiIFnAo/ERERERGRAk6Fn4iIiIiISAGnwk9ERERERKSAU+EnIiKSjnMu2Tk3J83ljjDuu55zLrJzNYmIiKRTNNYBREREAmiX975ZrEOIiIiEi1r8REREssk5t9w596hzbkbo0iD0eF3n3NfOuXmh6zqhx6s75z5wzs0NXdqGdhXnnHvFObfQOfeFc65UzF6UiIgUCir8REREDlQqXVfP3mmWbfPetwZeAJ4JPfYCMNZ7fxzwNvBc6PHngG+9902BFsDC0OMNgRe990cD/wE9IvpqRESk0HPe+1hnEBERCRTnXLz3vmwGjy8HTvHeL3POFQPWeu8rO+c2AjW894mhx//13ldxzm0Aannvd6fZRz3gS+99w9D924Fi3vsHovDSRESkkFKLn4iISM74TG5ntk5Gdqe5nYzOuRcRkQhT4SciIpIzvdNc/1/o9k9An9Dti4AfQre/Bq4CcM7FOefKRyukiIhIWvoPo4iIyIFKOefmpLn/mfc+dUqHEs65X7B/nvYNPXY9MMY5dyuwAbgs9PgNwGjn3BVYy95VwL+RDi8iIpKezvETERHJptA5fq289xtjnUVERCQn1NVTRERERESkgFOLn4iIiIiISAGnFj8REREREZECToWfiIiIiIhIAafCT0REREREpIBT4SciIiIiIlLAqfATEREREREp4FT4iYiIiIiIFHD/D9ULPX7n9fTTAAAAAElFTkSuQmCC"></figcaption></figure>



<p>Next, let&#8217;s print the accuracy and a confusion matrix on the predictions from the validation dataset.  </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># function that returns the label for a given probability
def getLabel(prob):
    if(prob &gt; .5):
               return 'dog'
    else:
               return 'cat'

# get the predictions for the validation data
val_df = validate_df.copy()
val_df['pred'] = &quot;&quot;
val_pred_prob = model.predict(validation_generator)

for i in range(val_pred_prob.shape[0]):
    val_df['pred'][i] = getLabel(val_pred_prob[i])
          
# create a confusion matrix
y_val = val_df['category']
y_pred = val_df['pred']

print('Accuracy: {:.2f}'.format(accuracy_score(y_val, y_pred)))
cnf_matrix = confusion_matrix(y_val, y_pred)

# plot the confusion matrix in form of a heatmap

%matplotlib inline
class_names=[False, True] # name  of classes
fig, ax = plt.subplots(figsize=(8, 8))
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names)
plt.yticks(tick_marks, class_names)
sns.heatmap(pd.DataFrame(cnf_matrix), annot=True, cmap=&quot;YlGnBu&quot;, fmt='g')
plt.title('Confusion matrix')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')</pre></div>



<pre class="wp-block-preformatted">Accuracy: 0.82</pre>



<figure class="wp-block-image size-large"><img decoding="async" width="479" height="496" data-attachment-id="2716" data-permalink="https://www.relataly.com/image-classification-with-deep-learning/2485/image-24-4/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2020/12/image-24.png" data-orig-size="479,496" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-24" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2020/12/image-24.png" src="https://www.relataly.com/wp-content/uploads/2020/12/image-24.png" alt="confusion matrix for an image classification model" class="wp-image-2716" srcset="https://www.relataly.com/wp-content/uploads/2020/12/image-24.png 479w, https://www.relataly.com/wp-content/uploads/2020/12/image-24.png 290w" sizes="(max-width: 479px) 100vw, 479px" /></figure>



<h3 class="wp-block-heading" id="h-step-9-image-classification-on-sample-images">Step #9 Image Classification on Sample Images</h3>



<p>Now that we have trained the model, I bet you can&#8217;t wait to test the image classifier on some sample data. For this purpose, ensure that you have some sample images in the &#8220;sample&#8221; folder. Running the code below will feed the image classifier with the test dataset. Based on this dataset, the model will then predict the labels for the images from the sample folder. Finally, the code below prints the images in an image grid and the predicted labels. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># set the path to the sample images
sample_path = &quot;data/images/cats-and-dogs/sample/&quot;
sample_df = createImageDf(sample_path)
sample_df['category'] = sample_df['category'].replace({0:'cat',1:'dog'})
sample_df['pred'] = &quot;&quot;

# create an image data generator for the sample images - we will only rescale the images
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_dataframe(
    sample_df, 
    sample_path,    
    shuffle=False,
    x_col='filename', y_col='category',
    target_size=target_size)

# make the predictions 
pred_prob = model.predict(test_generator)
image_number = pred_prob.shape[0]

# define the plot size
for i in range(pred_prob.shape[0]):
    sample_df['pred'][i] = getLabel(pred_prob[i])
    
print('Accuracy: {:.2f}'.format(accuracy_score(sample_df['category'], sample_df['pred'])))

nrows = 6
ncols = int(round(image_number / nrows, 0))
fig, axs = plt.subplots(nrows, ncols, figsize=(15, 15))
for i, ax in enumerate(fig.axes):
    if i &lt; sample_df.shape[0]:
        filepath = sample_path + sample_df.at[i ,'filename']
        ax = ax
        img = Image.open(filepath).resize(target_size)
        ax.imshow(img)
        ax.set_title(sample_df.at[i ,'filename'] + '\n' + ' predicted: '  + str(sample_df.at[i ,'pred']))
        result = [True if sample_df.at[i ,'pred'] == sample_df.at[i ,'category'] else False]
        ax.set_xlabel(str(result))
        ax.set_xticks([]); ax.set_yticks([])</pre></div>



<figure class="wp-block-image size-full"><img decoding="async" width="862" height="864" data-attachment-id="6516" data-permalink="https://www.relataly.com/image-classification-with-deep-learning/2485/output-5/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/04/output.png" data-orig-size="862,864" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="output" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/04/output.png" src="https://www.relataly.com/wp-content/uploads/2022/04/output.png" alt="image classification - the image shows several dogs and cats" class="wp-image-6516" srcset="https://www.relataly.com/wp-content/uploads/2022/04/output.png 862w, https://www.relataly.com/wp-content/uploads/2022/04/output.png 300w, https://www.relataly.com/wp-content/uploads/2022/04/output.png 150w, https://www.relataly.com/wp-content/uploads/2022/04/output.png 768w" sizes="(max-width: 862px) 100vw, 862px" /></figure>



<p>Our image classifier achieves an accuracy of around 83% on the validation set. The model is not perfect, but it should have labeled most images correctly. With deeper architectures, more data, and training runs, you can create classification models that achieve better results over 95%.</p>



<h2 class="wp-block-heading" id="h-summary">Summary</h2>



<p>In this tutorial, you learned how to train an image classification model. We have prepared a dataset and performed several transformations to bring the data in shape for training. Finally, we have trained a convolutional neural network to distinguish between dogs and cats. You can now use this knowledge to train image classification models that determine other objects. </p>



<p>There are many other cool things that you can do with CNNs. For example, object localization in images and videos and even stock market prediction. But these are topics for further articles.</p>



<p>I am always happy to receive feedback. I hope you enjoyed the article and would be happy if you left a comment. Cheers</p>



<h2 class="wp-block-heading" id="h-sources-and-further-reading">Sources and Further Reading</h2>



<ol class="wp-block-list"><li><a href="https://amzn.to/3MAy8j5" target="_blank" rel="noreferrer noopener">Andriy Burkov Machine Learning Engineering</a></li><li><a href="https://amzn.to/3D0gB0e" target="_blank" rel="noreferrer noopener">Oliver Theobald (2020) Machine Learning For Absolute Beginners: A Plain English Introduction</a></li><li><a href="https://amzn.to/3MyU6Tj" target="_blank" rel="noreferrer noopener">Charu C. Aggarwal (2018) Neural Networks and Deep Learning</a></li><li><a href="https://amzn.to/3S9Nfkl" target="_blank" rel="noreferrer noopener">Aurélien Géron (2019) Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems </a></li><li><a href="https://amzn.to/3EKidwE" target="_blank" rel="noreferrer noopener">David Forsyth (2019) Applied Machine Learning Springer</a></li><li>[1] D. H. Hubel and T. N. Wiesel &#8211; Receptive Fields of Neurons in the Cat&#8217;s Striate Cortex, The Journal of physiology (1959)</li></ol>



<p class="has-contrast-2-color has-base-3-background-color has-text-color has-background"><em>The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.</em></p>
<p>The post <a href="https://www.relataly.com/image-classification-with-deep-learning/2485/">Image Classification with Convolutional Neural Networks &#8211; Classifying Cats and Dogs in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.relataly.com/image-classification-with-deep-learning/2485/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">2485</post-id>	</item>
	</channel>
</rss>
