<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Customer Segmentation Archives</title>
	<atom:link href="https://www.relataly.com/category/use-case/customer-segmentation/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.relataly.com/category/use-case/customer-segmentation/</link>
	<description>The Business AI Blog</description>
	<lastBuildDate>Sat, 27 May 2023 09:56:55 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://www.relataly.com/wp-content/uploads/2023/04/cropped-AI-cat-Icon-White.png</url>
	<title>Customer Segmentation Archives</title>
	<link>https://www.relataly.com/category/use-case/customer-segmentation/</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">175977316</site>	<item>
		<title>How to Use Hierarchical Clustering For Customer Segmentation in Python</title>
		<link>https://www.relataly.com/customer-segmentation-using-hierarchical-clustering-in-python/11335/</link>
					<comments>https://www.relataly.com/customer-segmentation-using-hierarchical-clustering-in-python/11335/#respond</comments>
		
		<dc:creator><![CDATA[Florian Follonier]]></dc:creator>
		<pubDate>Thu, 22 Dec 2022 18:50:14 +0000</pubDate>
				<category><![CDATA[Agglomerative Clustering]]></category>
		<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Customer Segmentation]]></category>
		<category><![CDATA[Data Science]]></category>
		<category><![CDATA[Data Visualization]]></category>
		<category><![CDATA[Exploratory Data Analysis (EDA)]]></category>
		<category><![CDATA[Finance]]></category>
		<category><![CDATA[Insurance]]></category>
		<category><![CDATA[Kaggle Competitions]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Marketing Automation]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Scikit-Learn]]></category>
		<category><![CDATA[Seaborn]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Use Cases]]></category>
		<category><![CDATA[AI in Finance]]></category>
		<category><![CDATA[AI in Insurance]]></category>
		<category><![CDATA[Beginner Tutorials]]></category>
		<category><![CDATA[Classic Machine Learning]]></category>
		<category><![CDATA[Digital Transformation]]></category>
		<guid isPermaLink="false">https://www.relataly.com/?p=11335</guid>

					<description><![CDATA[<p>Have you ever found yourself wondering how you can better understand your customer base and target your marketing efforts more effectively? One solution is to use hierarchical clustering, a method of grouping customers into clusters based on their characteristics and behaviors. By dividing your customers into distinct groups, you can tailor your marketing campaigns and ... <a title="How to Use Hierarchical Clustering For Customer Segmentation in Python" class="read-more" href="https://www.relataly.com/customer-segmentation-using-hierarchical-clustering-in-python/11335/" aria-label="Read more about How to Use Hierarchical Clustering For Customer Segmentation in Python">Read more</a></p>
<p>The post <a href="https://www.relataly.com/customer-segmentation-using-hierarchical-clustering-in-python/11335/">How to Use Hierarchical Clustering For Customer Segmentation in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Have you ever found yourself wondering how you can better understand your customer base and target your marketing efforts more effectively? One solution is to use hierarchical clustering, a method of grouping customers into clusters based on their characteristics and behaviors. By dividing your customers into distinct groups, you can tailor your marketing campaigns and personalize your marketing efforts to meet the specific needs of each group. This can be especially useful for businesses with large customer bases, as it allows them to target their marketing efforts to specific segments rather than trying to appeal to everyone at once. Additionally, hierarchical clustering can help businesses identify common patterns and trends among their customers, which can be useful for targeting future marketing efforts and improving the overall customer experience. In this tutorial, we will use Python and the scikit-learn library to apply hierarchical (agglomerative) clustering to a dataset of customer data. </p>



<p>The rest of this tutorial proceeds in two parts. The first part will discuss hierarchical clustering and how we can use it to identify clusters in a set of customer data. The second part is a hands-on Python tutorial. We will explore customer health insurance data and apply an agglomerative clustering approach to group the customers into meaningful segments. Finally, we will use a tree-like diagram called a dendrogram, which is helpful for visualizing the structure of the data. The resulting segments could inform our marketing strategies and help us better understand our customers. So let&#8217;s get started!</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img fetchpriority="high" decoding="async" width="896" height="510" data-attachment-id="12402" data-permalink="https://www.relataly.com/customer-segmentation-using-hierarchical-clustering-in-python/11335/isometric-view-of-people-customer-segmentation-using-machine-learning-python-tutorial-min/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/02/isometric-view-of-people-customer-segmentation-using-machine-learning-python-tutorial-min.png" data-orig-size="896,510" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="isometric-view-of-people-customer-segmentation-using-machine-learning-python-tutorial-min" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2023/02/isometric-view-of-people-customer-segmentation-using-machine-learning-python-tutorial-min.png" src="https://www.relataly.com/wp-content/uploads/2023/02/isometric-view-of-people-customer-segmentation-using-machine-learning-python-tutorial-min.png" alt="isometric view of people customer segmentation using machine learning python tutorial" class="wp-image-12402" srcset="https://www.relataly.com/wp-content/uploads/2023/02/isometric-view-of-people-customer-segmentation-using-machine-learning-python-tutorial-min.png 896w, https://www.relataly.com/wp-content/uploads/2023/02/isometric-view-of-people-customer-segmentation-using-machine-learning-python-tutorial-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/02/isometric-view-of-people-customer-segmentation-using-machine-learning-python-tutorial-min.png 768w" sizes="(max-width: 896px) 100vw, 896px" /><figcaption class="wp-element-caption">Customer segmentation is a typical use case for clustering. Image generated with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a>. </figcaption></figure>
</div>
</div>



<h2 class="wp-block-heading">What is Hierarchical Clustering?</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>So what is hierarchical clustering? Hierarchical clustering is a method of cluster analysis that aims to build a hierarchy of clusters. It creates a tree-like diagram called a dendrogram, which shows the relationships between clusters. There are two main types of hierarchical clustering: agglomerative and divisive. </p>



<ol class="wp-block-list">
<li>Agglomerative hierarchical clustering: This is a bottom-up approach in which each data point is treated as a single cluster at the outset. The algorithm iteratively merges the most similar pairs of clusters until all data points are in a single cluster.</li>



<li>Divisive hierarchical clustering: This is a top-down approach in which all data points are treated as a single cluster at the outset. The algorithm iteratively splits the cluster into smaller and smaller subclusters until each data point is in its own cluster.</li>
</ol>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h3 class="wp-block-heading">Agglomerative Clustering</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>In this article, we will apply the agglomerative clustering approach, which is a bottom-up approach to clustering. The idea is to initially treat each data point in a dataset as its own cluster and then combine the points with other clusters as the algorithm progresses. The process of agglomerative clustering can be broken down into the following steps:</p>



<ol class="wp-block-list">
<li>Start with each data point in its own cluster.</li>



<li>Calculate the similarity between all pairs of clusters.</li>



<li>Merge the two most similar clusters.</li>



<li>Repeat steps 2 and 3 until all the data points are in a single cluster or until a predetermined number of clusters is reached.</li>
</ol>



<p>There are several ways to calculate the similarity between clusters, including using measures such as the Euclidean distance, cosine similarity, or the Jaccard index. The specific measure used can impact the results of the clustering algorithm.</p>



<p>For details on how the clustering approach works, see the&nbsp;<a href="https://en.wikipedia.org/wiki/Hierarchical_clustering">Wikipedia page</a>.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-large"><img decoding="async" width="430" height="512" data-attachment-id="13027" data-permalink="https://www.relataly.com/mushrooms_and_fruits_pattern-min-2/" data-orig-file="https://www.relataly.com/wp-content/uploads/2023/03/mushrooms_and_fruits_pattern-min.png" data-orig-size="506,602" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="mushrooms_and_fruits_pattern-min" data-image-description="&lt;p&gt;Hierarchical clustering is an unsupversied way to classify things. &lt;/p&gt;
" data-image-caption="&lt;p&gt;Hierarchical clustering is an unsupversied way to classify things. &lt;/p&gt;
" data-large-file="https://www.relataly.com/wp-content/uploads/2023/03/mushrooms_and_fruits_pattern-min.png" src="https://www.relataly.com/wp-content/uploads/2023/03/mushrooms_and_fruits_pattern-min-430x512.png" alt="Hierarchical clustering is an unsupversied way to classify things. " class="wp-image-13027" srcset="https://www.relataly.com/wp-content/uploads/2023/03/mushrooms_and_fruits_pattern-min.png 430w, https://www.relataly.com/wp-content/uploads/2023/03/mushrooms_and_fruits_pattern-min.png 252w, https://www.relataly.com/wp-content/uploads/2023/03/mushrooms_and_fruits_pattern-min.png 506w" sizes="(max-width: 430px) 100vw, 430px" /><figcaption class="wp-element-caption">Hierarchical clustering is an unsupervised technique to classify things based on patterns in their data. Image created with <a href="http://www.midjourney.com" target="_blank" rel="noreferrer noopener">Midjourney</a>.</figcaption></figure>
</div>
</div>



<h3 class="wp-block-heading">Hierarchical Clustering vs. K-means</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>In a previous article, we have already discussed the popular <a href="https://www.relataly.com/simple-cluster-analysis-with-k-means-with-python/5070/" target="_blank" rel="noreferrer noopener">clustering approach k-means</a>. So how are k-means and hierarchical clustering different? Hierarchical clustering and k-means are both clustering algorithms that can be used to group similar data points together. However, there are several key differences between these two approaches:</p>



<ol class="wp-block-list">
<li><strong>The number of clusters:</strong> In k-means, the number of clusters must be specified in advance, whereas in hierarchical clustering, the number of clusters is not specified. Instead, hierarchical clustering creates a hierarchy of clusters, starting with each data point as its own cluster and then merging the most similar clusters until all data points are in a single cluster.</li>



<li><strong>Cluster shape:</strong> K-means produces clusters that are spherical, while hierarchical clustering produces clusters that can have any shape. This means that k-means is better suited for data that is well-separated into distinct, spherical clusters, while hierarchical clustering is more flexible and can handle more complex cluster shapes.</li>



<li><strong>Distance measure:</strong> K-means uses a distance measure, such as the Euclidean distance, to calculate the similarity between data points, while hierarchical clustering can use a variety of distance measures. This means that k-means is more sensitive to the scale of the features, while hierarchical clustering is less sensitive to the feature scale.</li>



<li><strong>Computational complexity:</strong> K-means is generally faster than hierarchical clustering, especially for large datasets. This is because k-means only requires a single pass through the data to assign data points to clusters, while hierarchical clustering requires multiple passes to merge clusters.</li>



<li><strong>Visualization: </strong>Hierarchical clustering produces a tree-like diagram called a &#8220;dendrogram.&#8221; The dendrogram shows the relationships between clusters. This can be useful for visualizing the structure of the data and understanding how clusters are related.</li>
</ol>



<p>Next, let&#8217;s look at how we can implement a hierarchical clustering model in Python. </p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<p></p>
</div>
</div>



<h2 class="wp-block-heading">Customer Segmentation using Hierarchical Clustering in Python</h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>In this comprehensive guide, we explore the application of hierarchical clustering for effective customer segmentation using a customer dataset. This data-driven segmentation method enables businesses to identify distinct customer clusters based on various factors, including demographics, behaviors, and preferences.</p>



<p>Customer segmentation is a strategic approach that splits a customer base into smaller, more manageable groups with similar characteristics. It aims to better understand the diverse needs and wants of different customer segments to enhance marketing strategies and product development.</p>



<p>Applying customer segmentation through hierarchical clustering allows businesses to personalize their marketing messages, design targeted campaigns, and tailor products to meet the unique needs of each segment. This proactive approach can stimulate increased customer loyalty and sales.</p>



<p>We begin by loading the customer data and selecting the relevant features we want to use for clustering. We then standardize the data using the StandardScaler from scikit-learn. Next, we apply hierarchical clustering using the AgglomerativeClustering method, specifying the number of clusters we want to create. Finally, we add the predictions to the original data as a new column and view the resulting segments by calculating the mean of each feature for each segment.</p>



<p>The code is available on the GitHub repository.</p>



<div class="wp-block-kadence-advancedbtn kb-buttons-wrap kb-btns_bada6f-73"><a class="kb-button kt-button button kb-btn_43f94b-af kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit  kt-btn-has-text-true kt-btn-has-svg-true  wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-tutorials/blob/master/03%20Clustering/043%20Customer%20Segmentation%20using%20Hierarchical%20Clustering%20with%20Python.ipynb" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fe_eye kt-btn-icon-side-left"><svg viewBox="0 0 24 24"  fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M1 12s4-8 11-8 11 8 11 8-4 8-11 8-11-8-11-8z"/><circle cx="12" cy="12" r="3"/></svg></span><span class="kt-btn-inner-text">View on GitHub </span></a>

<a class="kb-button kt-button button kb-btn_17702b-41 kt-btn-size-standard kt-btn-width-type-full kb-btn-global-inherit  kt-btn-has-text-true kt-btn-has-svg-true  wp-block-button__link wp-block-kadence-singlebtn" href="https://github.com/flo7up/relataly-public-python-API-tutorials" target="_blank" rel="noreferrer noopener"><span class="kb-svg-icon-wrap kb-svg-icon-fa_github kt-btn-icon-side-left"><svg viewBox="0 0 496 512"  fill="currentColor" xmlns="http://www.w3.org/2000/svg"  aria-hidden="true"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg></span><span class="kt-btn-inner-text">Relataly GitHub Repo </span></a></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img decoding="async" width="512" height="513" data-attachment-id="12366" data-permalink="https://www.relataly.com/customer-segmentation-using-hierarchical-clustering-in-python/11335/the_future_of_the_healthcare_using_blockchain-min/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/the_future_of_the_healthcare_using_blockchain-min.png" data-orig-size="512,513" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="the_future_of_the_healthcare_using_blockchain-min" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/the_future_of_the_healthcare_using_blockchain-min.png" src="https://www.relataly.com/wp-content/uploads/2022/12/the_future_of_the_healthcare_using_blockchain-min.png" alt="In this machine learning tutorial, we will run a hierarchical clustering algorithm on health data." class="wp-image-12366" srcset="https://www.relataly.com/wp-content/uploads/2022/12/the_future_of_the_healthcare_using_blockchain-min.png 512w, https://www.relataly.com/wp-content/uploads/2022/12/the_future_of_the_healthcare_using_blockchain-min.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/the_future_of_the_healthcare_using_blockchain-min.png 140w" sizes="(max-width: 512px) 100vw, 512px" /><figcaption class="wp-element-caption">The future of healthcare will see a tight collaboration between humans and AI. Image generated using&nbsp;Midjourney</figcaption></figure>
</div>
</div>



<h3 class="wp-block-heading">About the Customer Health Insurance Dataset</h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>In this tutorial, we will work with a public dataset on health_insurance_customer_data from kaggle.com. Download the <a href="https://www.kaggle.com/datasets/teertha/ushealthinsurancedataset" target="_blank" rel="noreferrer noopener">CSV file from Kaggle</a> and copy it into the following path, starting from the folder with your python notebook: data/customer/</p>



<p>The dataset is relatively simple and contains 1338 rows of insured customers. It includes the insurance charges, as well as demographic and personal information such as Age, Sex, BMI, Number of Children, Smoker, and Region. The dataset does not have any undefined or missing values.</p>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"></div>
</div>



<h3 class="wp-block-heading" id="h-prerequisites">Prerequisites</h3>



<p>Before we start the coding part, ensure that you have set up your Python 3 environment and the required packages. If you don’t have an environment, follow&nbsp;this tutorial&nbsp;to set up the&nbsp;<a href="https://www.anaconda.com/products/individual" target="_blank" rel="noreferrer noopener">Anaconda environment</a>. Also, make sure you install all required packages. In this tutorial, we will be working with the following standard packages:&nbsp;</p>



<ul class="wp-block-list">
<li>pandas</li>



<li>NumPy</li>



<li>matplotlib</li>



<li>scikit-learn</li>
</ul>



<p>You can install packages using console commands:</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">pip install &lt;package name&gt; 
conda install &lt;package name&gt; (if you are using the anaconda packet manager)</pre></div>



<h3 class="wp-block-heading">Step #1 Load the Data</h3>



<p>To begin, we need to load the required packages and the data we want to cluster. We will load the data by reading the CSV file via the pandas library. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># import necessary libraries
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import AgglomerativeClustering
from sklearn.preprocessing import LabelEncoder
from pandas.api.types import is_string_dtype
import pandas as pd
import math
import seaborn as sns

# load customer data
customer_df = pd.read_csv(&quot;data/customer/customer_health_insurance.csv&quot;)
customer_df.head(3)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">	age	sex		bmi		children	smoker	region		charges
0	19	female	27.90	0			yes		southwest	16884.9240
1	18	male	33.77	1			no		southeast	1725.5523
2	28	male	33.00	3			no		southeast	4449.4620</pre></div>



<h3 class="wp-block-heading">Step #2 Explore the Data</h3>



<p>Next, it is a good idea to explore the data and get a sense of its structure and content. This can be done using a variety of methods, such as examining the shape of the dataframe, checking for missing values, and plotting some basic statistics. For example, the following plots will explore the relationships between some of the variables. We won&#8217;t go into too much detail here.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">def make_kdeplot(df, column_name, target_name):
    fig, ax = plt.subplots(figsize=(10, 6))
    sns.kdeplot(data=df, hue=column_name, x=target_name, ax = ax, linewidth=2,)
    ax.tick_params(axis=&quot;x&quot;, rotation=90, labelsize=10, length=0)
    ax.set_title(column_name)
    ax.set_xlim(0, df[target_name].quantile(0.99))
    plt.show()

# make kde plot for ext_color 
make_kdeplot(customer_df, 'smoker', 'charges')</pre></div>



<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="11363" data-permalink="https://www.relataly.com/customer-segmentation-using-hierarchical-clustering-in-python/11335/image-17-3/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-17.png" data-orig-size="833,571" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-17" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-17.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-17.png" alt="" class="wp-image-11363" width="567" height="389" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-17.png 833w, https://www.relataly.com/wp-content/uploads/2022/12/image-17.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-17.png 768w" sizes="(max-width: 567px) 100vw, 567px" /></figure>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># make kde plot for ext_color 
make_kdeplot(customer_df, 'sex', 'charges')</pre></div>



<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="11364" data-permalink="https://www.relataly.com/customer-segmentation-using-hierarchical-clustering-in-python/11335/image-44-2/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-44.png" data-orig-size="846,571" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-44" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-44.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-44.png" alt="" class="wp-image-11364" width="572" height="386" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-44.png 846w, https://www.relataly.com/wp-content/uploads/2022/12/image-44.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-44.png 768w" sizes="(max-width: 572px) 100vw, 572px" /></figure>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">sns.lmplot(x=&quot;charges&quot;, y=&quot;age&quot;, hue=&quot;smoker&quot;, data=customer_df, aspect=2)
plt.show()</pre></div>



<figure class="wp-block-image size-large is-resized"><img decoding="async" data-attachment-id="11365" data-permalink="https://www.relataly.com/customer-segmentation-using-hierarchical-clustering-in-python/11335/image-45/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-45.png" data-orig-size="1067,489" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-45" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-45.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-45-1024x469.png" alt="" class="wp-image-11365" width="700" height="321" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-45.png 1024w, https://www.relataly.com/wp-content/uploads/2022/12/image-45.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-45.png 768w, https://www.relataly.com/wp-content/uploads/2022/12/image-45.png 1067w" sizes="(max-width: 700px) 100vw, 700px" /></figure>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">def make_boxplot(customer_df, x,y,h):
    fig, ax = plt.subplots(figsize=(10,4))
    box = sns.boxplot(x=x, y=y, hue=h, data=customer_df)
    box.set_xticklabels(box.get_xticklabels())
    fig.subplots_adjust(bottom=0.2)
    plt.tight_layout()

make_boxplot(customer_df, &quot;smoker&quot;, &quot;charges&quot;, &quot;sex&quot;)</pre></div>



<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="11366" data-permalink="https://www.relataly.com/customer-segmentation-using-hierarchical-clustering-in-python/11335/image-46/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-46.png" data-orig-size="989,390" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-46" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-46.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-46.png" alt="" class="wp-image-11366" width="675" height="266" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-46.png 989w, https://www.relataly.com/wp-content/uploads/2022/12/image-46.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-46.png 768w" sizes="(max-width: 675px) 100vw, 675px" /></figure>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">make_boxplot(customer_df, &quot;region&quot;, &quot;charges&quot;, &quot;sex&quot;)</pre></div>



<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="11367" data-permalink="https://www.relataly.com/customer-segmentation-using-hierarchical-clustering-in-python/11335/image-47-4/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-47.png" data-orig-size="989,390" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-47" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-47.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-47.png" alt="" class="wp-image-11367" width="693" height="273" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-47.png 989w, https://www.relataly.com/wp-content/uploads/2022/12/image-47.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-47.png 768w" sizes="(max-width: 693px) 100vw, 693px" /></figure>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}">make_boxplot(customer_df, &quot;children&quot;, &quot;bmi&quot;, &quot;sex&quot;)</pre></div>



<figure class="wp-block-image size-full is-resized"><img decoding="async" data-attachment-id="11368" data-permalink="https://www.relataly.com/customer-segmentation-using-hierarchical-clustering-in-python/11335/image-48-4/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-48.png" data-orig-size="989,390" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-48" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-48.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-48.png" alt="" class="wp-image-11368" width="705" height="278" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-48.png 989w, https://www.relataly.com/wp-content/uploads/2022/12/image-48.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-48.png 768w" sizes="(max-width: 705px) 100vw, 705px" /></figure>



<p>Next, let&#8217;s prepare the data for model training. </p>



<h3 class="wp-block-heading" id="h-step-3-prepare-the-data">Step #3 Prepare the Data</h3>



<p>Before we can train a model on the data, we must prepare it for modeling. This typically involves selecting the relevant features, handling missing values, and scaling the data. However, we are using a very simple dataset that already has good data quality. Therefore we can limit our data preparation activities to encoding the labels and scaling the data. </p>



<p>To encode the categorical values, we will use label encoder from the scikit-learn library.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># encode categorical features
label_encoder = LabelEncoder()

for col_name in customer_df.columns:
    if (is_string_dtype(customer_df[col_name])):
        customer_df[col_name] = label_encoder.fit_transform(customer_df[col_name])
customer_df.head(3)</pre></div>



<p>Next, we will scale the numeric variables. While scaling the data is an essential preprocessing step for many machine learning algorithms to work effectively, it is generally not necessary for hierarchical clustering. This is because hierarchical clustering is not sensitive to the scale of the features. However, when you use certain distance measures, such as Euclidean distance, scaling the data might still be useful when performing hierarchical clustering. Scaling the data can help to ensure that all of the features are given equal weight. This can be useful if you want to avoid giving more weight to features with larger scales.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># select features
X = customer_df # we will select all features

# standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_scaled.head(3)</pre></div>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;null&quot;,&quot;mime&quot;:&quot;text/plain&quot;,&quot;theme&quot;:&quot;3024-day&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Plain Text&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;text&quot;}">array([[-1.43876426, -1.0105187 , -0.45332   , ...,  1.34390459,
         0.2985838 ,  1.97058663],
       [-1.50996545,  0.98959079,  0.5096211 , ...,  0.43849455,
        -0.95368917, -0.5074631 ],
       [-0.79795355,  0.98959079,  0.38330685, ...,  0.43849455,
        -0.72867467, -0.5074631 ],
       ...,
       [-1.50996545, -1.0105187 ,  1.0148781 , ...,  0.43849455,
        -0.96159623, -0.5074631 ],
       [-1.29636188, -1.0105187 , -0.79781341, ...,  1.34390459,
        -0.93036151, -0.5074631 ],
       [ 1.55168573, -1.0105187 , -0.26138796, ..., -0.46691549,
         1.31105347,  1.97058663]])</pre></div>



<h3 class="wp-block-heading">Step #4 Train the Hierarchical Clustering Algorithm</h3>



<p>To train a hierarchical clustering model using scikit-learn, we can use the AgglomerativeClustering or Ward class. The main parameters for these classes are:</p>



<ul class="wp-block-list">
<li><strong>n_clusters: </strong>The number of clusters to form. This parameter is required for AgglomerativeClustering but is not used for <code>Ward</code>.</li>



<li><strong>affinity: </strong>The distance measure used to calculate the similarity between pairs of samples. This can be any of the distance measures implemented in scikit-learn, such as the Euclidean distance or the cosine similarity.</li>



<li>l<strong>inkage: </strong>The method used to calculate the distance between clusters. This can be one of &#8220;ward,&#8221; &#8220;complete,&#8221; &#8220;average,&#8221; or &#8220;single.&#8221;</li>



<li><strong>distance_threshold:</strong> The maximum distance between two clusters that allows them to be merged. This parameter is only used in the AgglomerativeClustering class.</li>
</ul>



<p>To train the model, we specify the desired parameters and fit the model to the data using the fit_predict method. This method will fit the model to the data and generate predictions in one step.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># apply hierarchical clustering 
model = AgglomerativeClustering(affinity='euclidean')
predicted_segments = model.fit_predict(X_scaled)</pre></div>



<p>Now we have a trained clustering model also predicted the segments for our data.</p>



<h3 class="wp-block-heading">Step #5 Visualize the Results</h3>



<p>After the model is trained, we can visualize the results to get a better understanding of the clusters that were formed. There is a wide range of plots and tools to visualize clusters. In this tutorial, we will use a scatterplot and a dendrogram. </p>



<h4 class="wp-block-heading">5.1 Scatterplot</h4>



<p>For this, we can use the lmplot function in Seaborn. The lmplot creates a 2D scatterplot with an optional overlay of a linear regression model. The plot visualizes the relationship between two variables and fits a linear regression model to the data that can highlight differences. In the following, we use this linear regression model to highlight the differences between our two cluster segments and the age of the customers. </p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># add predictions to data as a new column
customer_df['segment'] = predicted_segments

# create a scatter plot of the first two features, colored by segment
sns.lmplot(x=&quot;charges&quot;, y=&quot;age&quot;, hue=&quot;segment&quot;, data=customer_df, aspect=2)
plt.show()</pre></div>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="470" data-attachment-id="11370" data-permalink="https://www.relataly.com/customer-segmentation-using-hierarchical-clustering-in-python/11335/image-49-2/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-49.png" data-orig-size="1065,489" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-49" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-49.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-49-1024x470.png" alt="" class="wp-image-11370" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-49.png 1024w, https://www.relataly.com/wp-content/uploads/2022/12/image-49.png 300w, https://www.relataly.com/wp-content/uploads/2022/12/image-49.png 768w, https://www.relataly.com/wp-content/uploads/2022/12/image-49.png 1065w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>We can see that our model has determined two clusters in our data. The clusters seem to correspond well with the smoker category, which indicates that this attribute is decisive in forming relevant groups.</p>



<h4 class="wp-block-heading" id="h-5-2-dendrogram">5.2 Dendrogram</h4>



<p>The hierarchical clustering approach lets us visualize relationships between different groups in our dataset in a dendrogram. A dendrogram is a graphical representation of a hierarchical structure, such as the relationships between different groups of objects or organisms. It is typically used in biology to show the relationships between different species or taxonomic groups, but it can also be used in other fields to represent the hierarchical structure of any set of data. In a dendrogram, the objects or groups being studied are represented as branches on a tree-like diagram. The branches are usually labeled with the names of the objects or groups, and the lengths of the branches represent the distances or dissimilarities between the objects or groups. The branches are also arranged in a hierarchical manner, with the most closely related objects or groups being placed closer together and the more distantly related ones being placed farther apart.</p>



<div class="wp-block-codemirror-blocks-code-block code-block"><pre class="CodeMirror" data-setting="{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:false,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text/x-python&quot;,&quot;theme&quot;:&quot;monokai&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}"># Visualize data similarity in a dendogram
def plot_dendrogram(model, **kwargs):
    # create the counts of samples under each node
    counts = np.zeros(model.children_.shape[0])
    n_samples = len(model.labels_)
    for i, merge in enumerate(model.children_):
        current_count = 0
        for child_idx in merge:
            if child_idx &lt; n_samples:
                current_count += 1  # leaf node
            else:
                current_count += counts[child_idx - n_samples]
        counts[i] = current_count

    linkage_matrix = np.column_stack(
        [model.children_, model.distances_, counts]
    ).astype(float)

    # Plot the corresponding dendrogram
    dendrogram(linkage_matrix, orientation='right',**kwargs)


plt.title(&quot;Hierarchical Clustering Dendrogram&quot;)
# plot the top three levels of the dendrogram
plot_dendrogram(cluster_model, truncate_mode=&quot;level&quot;, p=4)
plt.xlabel(&quot;Euclidean Distance&quot;)
plt.ylabel(&quot;Number of points in node (or index of point if no parenthesis).&quot;)
plt.show()</pre></div>



<p>Source: This code block is based on code <a href="https://scikit-learn.org/stable/auto_examples/cluster/plot_agglomerative_dendrogram.html" target="_blank" rel="noreferrer noopener">from the scikit-learn page</a></p>



<figure class="wp-block-image size-full"><img decoding="async" width="575" height="453" data-attachment-id="11396" data-permalink="https://www.relataly.com/customer-segmentation-using-hierarchical-clustering-in-python/11335/image-53-4/#main" data-orig-file="https://www.relataly.com/wp-content/uploads/2022/12/image-53.png" data-orig-size="575,453" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image-53" data-image-description="" data-image-caption="" data-large-file="https://www.relataly.com/wp-content/uploads/2022/12/image-53.png" src="https://www.relataly.com/wp-content/uploads/2022/12/image-53.png" alt="" class="wp-image-11396" srcset="https://www.relataly.com/wp-content/uploads/2022/12/image-53.png 575w, https://www.relataly.com/wp-content/uploads/2022/12/image-53.png 300w" sizes="(max-width: 575px) 100vw, 575px" /></figure>



<h2 class="wp-block-heading">Summary</h2>



<p>In conclusion, hierarchical clustering is a powerful tool for customer segmentation that can help businesses better understand their customer base and target their marketing efforts more effectively. By grouping customers into clusters based on their characteristics and behaviors, companies can create targeted campaigns and personalize their marketing efforts to better meet the needs of each group. Using Python and the scikit-learn library, we were able to apply an agglomerative clustering approach to a dataset of customer data and identify two distinct segments. We can then use these segments to inform our marketing strategies and get a better understanding of our customers.</p>



<p>By the way, customer segmentation is an area where real-world data can be prone to bias and unfairness. If you&#8217;re concerned about this, check out our latest article on <a href="https://www.relataly.com/building-fair-machine-machine-learning-models-with-fairlearn/12804/" target="_blank" rel="noreferrer noopener">addressing fairness in machine learning with fairlearn</a>.</p>



<p>I hope this article was useful. If you have any feedback, please write your thoughts in the comments. </p>



<h2 class="wp-block-heading">Sources and Further Reading</h2>



<p>Articles</p>



<ul class="wp-block-list">
<li><a href="https://scikit-learn.org/stable/auto_examples/cluster/plot_agglomerative_dendrogram.html" target="_blank" rel="noreferrer noopener">https://scikit-learn.org/stable/auto_examples/cluster/plot_agglomerative_dendrogram.html</a></li>



<li>Images generated with OpenAI Dall-E and Midjourney.</li>
</ul>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<div class="wp-block-group"><div class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained">
<h4 class="wp-block-heading"><strong>Books on Clustering</strong></h4>



<ul class="wp-block-list">
<li><a href="https://amzn.to/3Gb5kfj" target="_blank" rel="noreferrer noopener">&#8220;Data Clustering: Algorithms and Applications&#8221; by Charu C. Aggarwal</a>: This book covers a wide range of clustering algorithms, including hierarchical clustering, and discusses their applications in various fields.</li>



<li><a href="https://amzn.to/3WmhGXB" target="_blank" rel="noreferrer noopener">&#8220;Data Mining: Practical Machine Learning Tools and Techniques&#8221; by Ian H. Witten and Eibe Frank</a>: This book is a comprehensive introduction to data mining and machine learning, including a chapter on hierarchical clustering.</li>
</ul>



<div style="display: inline-block;">
<iframe sandbox="allow-popups allow-scripts allow-modals allow-forms allow-same-origin" style="width:120px;height:240px;" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" src="//ws-eu.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&amp;OneJS=1&amp;Operation=GetAdHtml&amp;MarketPlace=DE&amp;source=ss&amp;ref=as_ss_li_til&amp;ad_type=product_link&amp;tracking_id=flo7up-21&amp;language=de_DE&amp;marketplace=amazon&amp;region=DE&amp;placement=0128042915&amp;asins=0128042915&amp;linkId=1e9fe160a76f7255e3eea8e0119ca74f&amp;show_border=true&amp;link_opens_in_new_window=true"></iframe>

<iframe sandbox="allow-popups allow-scripts allow-modals allow-forms allow-same-origin" style="width:120px;height:240px;" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" src="//ws-eu.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&amp;OneJS=1&amp;Operation=GetAdHtml&amp;MarketPlace=DE&amp;source=ss&amp;ref=as_ss_li_til&amp;ad_type=product_link&amp;tracking_id=flo7up-21&amp;language=de_DE&amp;marketplace=amazon&amp;region=DE&amp;placement=B00EYROAQU&amp;asins=B00EYROAQU&amp;linkId=ba1fcb8a59417e729afe33f6eceb2a9f&amp;show_border=true&amp;link_opens_in_new_window=true"></iframe>
</div>
</div></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<div class="wp-block-group"><div class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained">
<h4 class="wp-block-heading"><strong>Books on Machine Learning</strong></h4>



<ul class="wp-block-list">
<li><a href="https://amzn.to/3S9Nfkl" target="_blank" rel="noreferrer noopener">Aurélien Géron (2019) Hands-On Machine Learning</a></li>



<li><a href="https://amzn.to/3EKidwE" target="_blank" rel="noreferrer noopener">David Forsyth (2019) Applied Machine Learning Springer</a></li>
</ul>



<div style="display: inline-block;">

  <iframe sandbox="allow-popups allow-scripts allow-modals allow-forms allow-same-origin" style="width:120px;height:240px;" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" src="//ws-eu.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&amp;OneJS=1&amp;Operation=GetAdHtml&amp;MarketPlace=DE&amp;source=ss&amp;ref=as_ss_li_til&amp;ad_type=product_link&amp;tracking_id=flo7up-21&amp;language=de_DE&amp;marketplace=amazon&amp;region=DE&amp;placement=3030181162&amp;asins=3030181162&amp;linkId=669e46025028259138fbb5ccec12dfbe&amp;show_border=true&amp;link_opens_in_new_window=true"></iframe>

<iframe sandbox="allow-popups allow-scripts allow-modals allow-forms allow-same-origin" style="width:120px;height:240px;" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" src="//ws-eu.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&amp;OneJS=1&amp;Operation=GetAdHtml&amp;MarketPlace=DE&amp;source=ss&amp;ref=as_ss_li_til&amp;ad_type=product_link&amp;tracking_id=flo7up-21&amp;language=de_DE&amp;marketplace=amazon&amp;region=DE&amp;placement=1492032646&amp;asins=1492032646&amp;linkId=2214804dd039e7103577abd08722abac&amp;show_border=true&amp;link_opens_in_new_window=true"></iframe>
</div>
</div></div>
</div>
</div>



<p class="has-contrast-2-color has-base-3-background-color has-text-color has-background"><em>The links above to Amazon are affiliate links. By buying through these links, you support the Relataly.com blog and help to cover the hosting costs. Using the links does not affect the price.</em></p>



<p><strong>Relataly articles on clustering and machine learning</strong></p>



<ul class="wp-block-list">
<li><a href="https://www.relataly.com/simple-cluster-analysis-with-k-means-with-python/5070/" target="_blank" rel="noreferrer noopener">Simple Clustering using K-means in Python</a>: This article gives an overview of cluster analysis with k-means.</li>



<li><a href="https://www.relataly.com/crypto-market-cluster-analysis-using-affinity-propagation-python/8114/" target="_blank" rel="noreferrer noopener">Clustering crypto markets using affinity propagation in Python</a>: This article applies cluster analysis to crypto markets and creates a market map for various cryptocurrencies.</li>



<li><a href="https://www.relataly.com/building-fair-machine-machine-learning-models-with-fairlearn/12804/" target="_blank" rel="noreferrer noopener">Addressing fairness in machine learning with the fairlearn library</a></li>
</ul>



<p></p>
<p>The post <a href="https://www.relataly.com/customer-segmentation-using-hierarchical-clustering-in-python/11335/">How to Use Hierarchical Clustering For Customer Segmentation in Python</a> appeared first on <a href="https://www.relataly.com">relataly.com</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.relataly.com/customer-segmentation-using-hierarchical-clustering-in-python/11335/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">11335</post-id>	</item>
	</channel>
</rss>
