Building a Content-based Movie Recommender in Python

content-based recommender system python machine learning.png

Content-based recommender systems are a popular type of machine learning algorithm that recommends relevant articles based on what a user has previously consumed or liked. This approach aims to identify items with certain keywords, understand what the customer likes, and then identify other items that are similar to items the … Read more

Clustering Financial Market Structures using Affinity Propagation in Python

visualizing crypto market structure, lasso regression, python, scikit-learn

Affinity propagation is an unsupervised clustering technique that stands out from other clustering approaches by its capacity to determine the number of clusters in a dataset. This tutorial demonstrates this capacity by applying the technique to analyze the crypto market structure. We perform a cluster analysis of historical prices of … Read more

Using Random Search to Tune the Hyperparameters of a Random Decision Forest with Python

random search hyperparameter tuning a regression model python

Random search is an efficient method for automated hyperparameter tuning machine learning models. Hyperparameters are model properties (e.g., the number of estimators for an ensemble model). Unlike model parameters, the machine learning algorithm does not discover the model hyperparameters during training. Instead, we need to specify them in advance. Finding … Read more

PySpark Weather Analytics

pyspark tutorial zurich weather analytics

Apache Spark is a popular distributed computing framework for Big Data processing and analytics. In this tutorial, we will work hands-on with PySpark, Spark’s Python-specific interface. We built on the conceptual knowledge gained in a previous tutorial: Introduction to BigData Analytics with Apache Spark, in which we learned about the … Read more

Getting Started with Big Data Analytics – Apache Spark Concepts and Architecture

Apache Spark Tutorial Big Data Processing

Apache Spark is one of the most popular open-source engines for Big Data processing and analytics. Its distributed architecture can process workloads in a highly parallelized manner, thus allowing Spark to achieve high computation efficiency – especially when processing extensive data sets. This article aims to help you familiarize yourselves … Read more