Forecasting Criminal Activity in San Francisco using XGBoost and Python

I recently came across an interesting Kaggle contest that involves predicting different types of criminal activity in San Francisco. Not surprisingly, in a huge city like San Francisco, numerous crimes occur daily. Among the most commonly reported are vehicle theft, … Continued

Forecasting Beer Sales with ARIMA in Python

ARIMA (Auto Regressive Integrated Moving Average) is a useful statistical modelling technique for time series forecasting. Compared to machine learning, ARIMA is a classical modeling technique that is very strong especially when the time series to be analyzed follows a clear … Continued

Getting started with Image Recognition: Classifying Cats and Dogs

This article kicks off a new blog series on image recognition and classification with Convolutional Neural Networks (CNNs). CNNs belong to the field of deep learning, a subarea of machine learning, and have become a cornerstone to many exciting innovations … Continued

Anyone About to Leave? Predicting Customer Churn of a Telecommunications Provider

Telecommunications service providers face considerable pressure to expand and retain their subscriber base. One of the biggest cost factors are customers cancelling their contracts. Innovative service providers therefore have learned to use machine learning to predict which of their customers … Continued

Hyperparameter Tuning of a Random Forest Classifier using Grid Search in Python

The functionality of machine learning models can be controlled with their hyperparameters. The choice of these parameters often has a significant impact on model performance and, in practice, can make the difference between sufficient and outstanding performance. Data scientists therefore … Continued

Feature Engineering for Multivariate Time Series Prediction with Python

Multivariate time series predictions and especially stock market forecasts pose challenging machine learning problems. Unlike univariate forecasting models, multivariate models do not rely exclusively on historical time series data, but use additional functions that are often developed from the time … Continued