Predicting Flight Delays using Azure Machine Learning

If you travel a lot, you’ve probably already experienced this – you’re in a total hurry on your way to the airport trying to catch a flight, only to find out at the airport that your flight is delayed anyway. Wasn’t it great to know in advance when a flight is going to be delayed? This would make travelling more relaxed. Well there is a solution and it is based on Machine Learning. We can use past data on delayed flights to develop a classification model that predicts flight delays. More specifically, we speak of a statistical model that calculates the probability of a certain flight being delayed or on-time. In this blog post on flight delay prediction, I show how to develop a prediction model in the Azure ML Studio (classic) workbench that predicts whether a flight will be more or less than 30 minutes late. All you need is a Microsoft Live account and about 30 – 40 minutes of your time.

The predictive model will use a decision tree algorithm to search for patterns in the data of past flight connections and apply these patterns to classify flights into two classes: delayed and not delayed. The practical relevance of flight-delay prediction models is underlined by the fact that they are being used by travel service providers to warn customers when a flight is likely to be delayed.

1 Access Microsoft Azure ML Studio

This guide will use Microsoft’s data science workbench Azure ML Studio (classic). The workbench provides comprehensive functionality such as creating data pipelines, training and testing machine learning models, and publishing trained models as a web service via an API.

The studio is available via a free trial access. You can create a free test account (8h valid) via “Sign up here” on Azure ML Studio or log in with an existing Microsoft Live account. After the successful login, you should see the experiments section:

Welcome Screen in Azure ML Studio Classic
Experiments section of the Azure ML Studio

2 Importing Training Data into Azure ML

In this tutorial, we will work with the csv dataset “FlightDelayData“. You can download it from the link below:

After you have downloaded the dataset, you can import the it into Azure ML Studio. To do this, navigate to “Experiments” and click on “+ New” at the bottom left. On the following page, select “Dataset” on the left and then “Upload Local File”. Select the file FlightDelayData and confirm the upload.

Uploading a new dataset
Uploading a new dataset

Confirm the dialog to access the experiment workspace. Here, you will find a list of different modules (highlighted in light blue) to the left of the workspace. The modules provide all central functions in Azure ML Studio, such as transforming and exploring the data as well as using them in machine learning.

The modul tab of Azure ML
The modul tab of Azure ML

3 Exploring the Data

Now that the data set is available in Azure ML, we will prepare it for its use in the training of our flight delay prediction model. First, we will drag and drop the FlightDelayData dataset from “Saved Datasets” into the grey workspace of the experiment. Next, we will visualize the data by right-clicking on FlightDelayData –> “dataset” –> “Visualize” in the grey work area.

Clicking on the individual columns will give you an overview of the characteristics and the distribution of the data sets. In the upper left corner you can see that the dataset contains 135970 entries for flight connections. Each entry or line represents one flight. All flights took place in 2013. The data furthermore includes the departure and arrival locations of flights, time and day of departure and arrival, the airline, as well as the deviation from the planned take-off and landing time.

4 Creating a Data Pipeline

Before we can train the model, we need to split the data into two parts: train and test. We will use the first part of the data to train the ML model and the second part to evaluate its predictions. This approach is known as supervised learning. In order to split the data, search for the “Split Data module” in the search list on the left and drag and drop it into the grey workspace. After this, you can connect the two modules by clicking on the output of the data set (FlightDelayData) and drag it to the input of the “Split Data module” (see screenshot).

Next we configure the Split Data module. Click on the module and make the following settings on the right side under “Properties”: Fraction of rows in the first output dataset: 0.7 and Random seed: 123.

In this way, we divide the data randomly in a 70/30 ratio. You can leave the other as they are.

Splitting the data into train and test
Splitting the data into train and test

(In practice, the compilation and preparation of the data is of course much more complex. To simplify this example, I have already carried out some steps in advance.)

5 Creating a Classification Model

Now we will create a classification model. Therefore, we will pull further models into the grey area of the workbench. Our model will use a boosted decision tree classifier. We can use this algorithm by dragging the module “Two-Class Boosted Decision Tree” into the grey workspace below the other modules. You can leave the settings of the module unchanged.

Next, we select the module “Train Model” and drag it into the grey workspace under the other modules. In the workspace, connect the output of the “Two-Class Boosted Decision Tree” module to the left input of the “Train Model” module.

Remember, we want to predict whether flights will be more or less than 15 minutes late. To do this, select “Train Model” in the grey workspace and click on “Launch Column Selector” under Properties on the right. In the Column Selector, enter “ArrDel15” under “Column Name”. This column contains the information whether flights were more or less than 15 minutes late. This is the so-called “prediction label”. Don’t forget to connect the left output of the Split Data module to the right input of Train Model.

To later be able to evaluate the predictions of the model, we will add a “Score Model”. We do this, by selecting the module “Score Model” and dragging it into the workspace below the other modules. Finally, we need to create two connections. First, connect the left input of Score Model with the left output of Train Model. Second, connect the right input of “Score Model” to the right output of Split Data, which is the 30% of the original data set we’re testing the model with.

Selecting the prediction label
Selecting the prediction label

5 Training the Model

Before we can train the model, we add the module “Evaluate Model” by searching it in the module tab and dragging it into the workspace. Finally, we connect the (left) input of Evaluate Model with the output of Score Model.

Creating a machine learning model

Congratulations! You are ready to train the model. Start the training process by clicking “Run” in the dark bar at the bottom. It may take a few minutes until the process has finished. Meanwhile, you can monitor the progress of the processing by the green checkmarks that are shown on the modules.

Model after successfull training
Model after successful training

6 Evaluating Model Performance

So far we have build a statistical model on flight delay prediction. Of course, we want to know how often our model is right or wrong with the predictions. Evaluating the performance of prediction models is thus an important step in their development. To evaluate the model performance, right-click on “Evaluate Model” -> “Evaluation results” -> “Visualize”. Below you find the receiver operating characteristic (ROC) of the trained model:

Metrics used to evaluate the performance of a classification model

Let’s look at the different metrics on the bottom.

  • The test data set contains a total of 40791 flights, which is 30% of the original data.
  • For the total number of flights, the model correctly predicted for 2098 flights that they would have more than 15 delays (true positives).
  • The model was wrong in 1310 cases (false positives).
  • 6825 flights were more than 15 minutes delayed (false negatives) contrary to the model’s prediction.
  • The model was correct in 30558 cases with the estimate that these flights will have less than 15 minutes delay.
  • Overall, the model is correct in about 80% of the cases (Accuracy = 0.801).

Finally, it’s a good idea to take a look at the ROC curve. The curve illustrates the reliability of the model in dependency of the prediction threshold. The larger the area under the curve, the better the prediction model. The gray diagonal line corresponds to a 50% chance to lie correctly, i.e. easy to guess. With a perfect model that is correct for every flight, the area would be 1.0. The curve is sloped upwards and lies above the grey line. This shows that the model works better than random assumptions.


In this tutorial you have learned to create a flight delay prediction model that can predict with 80% certainty whether flights on certain routes will be more or less than 15 minutes late.

Of course, the prediction model is only a first version and still offers a lot of potential for optimization. One option to further improve the model would be to add further information like the weather, the aircraft type, etc. Another option would be to test different algorithms and settings.

I hope you found this blog post useful. Please leave a comment if you have remarks or questions.


  • Hi, my name is Florian! I am a Zurich-based Data Scientist with a passion for Artificial Intelligence and Machine Learning. After completing my PhD in Business Informatics at the University of Bremen, I started working as a Machine Learning Consultant for the swiss consulting firm ipt. When I'm not working on use cases for our clients, I work on own analytics projects and report on them in this blog.

Follow Florian Müller:

Data Scientist & Machine Learning Consultant

Hi, my name is Florian! I am a Zurich-based Data Scientist with a passion for Artificial Intelligence and Machine Learning. After completing my PhD in Business Informatics at the University of Bremen, I started working as a Machine Learning Consultant for the swiss consulting firm ipt. When I'm not working on use cases for our clients, I work on own analytics projects and report on them in this blog.

2 Responses

Leave a Reply