Predicting Flight Delays using Azure Machine Learning

If you travel a lot, you’ve probably already experienced this – you’re in a real hurry on your way to the airport trying to catch a flight, only to find out at the airport that your flight is delayed. Wasn’t it great to know in advance when a flight is going to be delayed? Well, there is a solution, and it uses Machine Learning. We can use past data on delayed flights to develop a classification model that predicts flight delays. This article shows how this works by creating a flight-delay prediction model in the Azure ML Studio (classic) workbench. The model uses a Decision Tree algorithm to predict whether a flight will be more or less than 30 minutes late. In this approach, the algorithm searches for patterns in the data of past flight connections and applies these patterns to classify flights into two classes: delayed and not delayed. Travel service providers use similar models to warn customers when a flight is likely to be delayed.

Step #1 Access Microsoft Azure ML Studio

This article uses Microsoft’s data science workbench Azure ML Studio (classic). The workbench provides comprehensive functions such as creating data pipelines, training and testing machine learning models, and publishing trained models as a web service via an API.

The studio is available via free trial access. You can create a free test account (8h valid) via “Sign up here” on Azure ML Studio or log in with an existing Microsoft Live account. After the successful login, you will see the experiments section:

Welcome Screen in Azure ML Studio Classic
Experiments section of the Azure ML Studio

Step #2 Importing Training Data into Azure ML

In this tutorial, we will work with the CSV dataset “FlightDelayData.” You can download it from the link below:

After you have downloaded the dataset, you can import it into Azure ML Studio. To do this, navigate to “Experiments” and click on “+ New” at the bottom left. On the following page, select “Dataset” on the left and then “Upload Local File.” Select the file FlightDelayData and confirm the upload.

Uploading a new dataset
Uploading a new dataset

Confirm the dialog to access the experiment workspace. Here, you will find a list of different modules (highlighted in light blue) to the left of the workspace. The modules provide all central functions in Azure ML Studio, such as transforming and exploring the data and using them in machine learning.

The modul tab of Azure ML
The module tab of Azure ML

Step #3 Exploring the Data

Now that the data set is available in Azure ML, we will prepare it for its use in the training of our flight delay prediction model. First, we will drag and drop the FlightDelayData dataset from “Saved Datasets” into the grey workspace of the experiment. Next, we will visualize the data by right-clicking on FlightDelayData –> “dataset” –> “Visualize” in the grey work area.

Clicking on the individual columns will give you an overview of the characteristics and the distribution of the data sets. In the upper left corner, you can see that the dataset contains 135970 entries for flight connections. Each entry or line represents one flight. All flights took place in 2013. Furthermore, the data includes the departure and arrival locations of flights, time and day of departure and arrival, the airline, and the deviation from the planned take-off and landing time.

Step #4 Creating a Data Pipeline

Before we can train the model, we need to split the data into two parts: train and test. We will use the first part of the data to train the ML model and the second part to evaluate its predictions. This approach is known as supervised learning. To split the data, search for the “Split Data module” in the search list on the left and drag and drop it into the grey workspace. After this, you can connect the two modules by clicking on the output of the data set (FlightDelayData) and drag it to the input of the “Split Data module” (see screenshot).

Next, we configure the Split Data module. Click on the module and make the following settings on the right side under “Properties”: Fraction of rows in the first output dataset: 0.7 and Random seed: 123.

In this way, we divide the data randomly in a 70/30 ratio. You can leave the other as they are.

Splitting the data into train and test
Splitting the data into train and test

(In practice, the compilation and preparation of the data are, of course, much more complex. To simplify this example, I have already carried out some steps in advance.)

Step #5 Creating a Classification Model

Now we will create a classification model. Therefore, we will pull further models into the grey area of the workbench. Our model will use a boosted decision tree classifier. We can use this algorithm by dragging the module “Two-Class Boosted Decision Tree” into the grey workspace below the other modules. You can leave the settings of the module unchanged.

Next, we select the module “Train Model” and drag it into the grey workspace under the other modules. In the workspace, connect the output of the “Two-Class Boosted Decision Tree” module to the left input of the “Train Model” module.

Remember, we want to predict whether flights will be more or less than 15 minutes late. To do this, select “Train Model” in the grey workspace and click on “Launch Column Selector” under Properties on the right. In the Column Selector, enter “ArrDel15” under “Column Name”. This column contains the so-called “prediction label,” which is the information on whether flights were more or less than 15 minutes late. Don’t forget to connect the left output of the Split Data module to the right input of the Train Model.

To later evaluate the predictions of the model, we will add a “Score Model.” We do this by selecting the module “Score Model” and dragging it into the workspace below the other modules. Finally, we need to create two connections. First, connect the left input of the “Score Model” with the left output of the “Train Model.” Second, connect the right input of “Score Model” to the right output of “Split Data,” which is the 30% of the original data set we use to test the model.

Selecting the prediction label
Selecting the prediction label

Step #6 Training the Model

Before we can train the model, we add the module “Evaluate Model” by searching it in the module tab and dragging it into the workspace. Finally, we connect the (left) input of “Evaluate Model” with the output of “Score Model.”

Creating a machine learning model

Congratulations! You are ready to train the model. Start the training process by clicking “Run” in the dark bar at the bottom. It may take a few minutes until the process has finished. Meanwhile, you can monitor the progress of the processing by the green checkmarks shown on the modules.

Model after successfull training
Model after successful training

Step #7 Evaluating Model Performance

So far, we have built a statistical model on flight delay prediction. Of course, we want to know how often our model is right or wrong with the predictions. Evaluating the performance of prediction models is thus an important step in their development. To evaluate the model performance, right-click on “Evaluate Model” -> “Evaluation results” -> “Visualize”. Below you find the receiver operating characteristic (ROC) of the trained model:

ROC Curve and Results of the Flight Prediction Classifier
Metrics used to evaluate the performance of a classification model

Let’s look at the different metrics at the bottom.

  • The test data set contains 40791 flights, which is 30% of the original data.
  • The model correctly predicted for 2098 flights that they would have more than 15 delays (true positives).
  • The model was wrong in 1310 cases (false positives).
  • 6825 flights were more than 15 minutes delayed (false negatives), contrary to the model’s prediction.
  • The model was correct in 30558 cases, with the estimate that these flights will have less than 15 minutes delays.
  • Overall, the model is correct in about 80% of the cases (Accuracy = 0.801).

Finally, it’s a good idea to take a look at the ROC curve. The curve illustrates the reliability of the model depending on the prediction threshold. The larger the area under the curve, the better the prediction model. The gray diagonal line corresponds to a 50% chance to lie correctly, i.e., easy to guess. With a perfect model that is correct for every flight, the area would be 1.0. The curve is sloped upwards and lies above the grey line. This shows that the model works better than random assumptions.


In this article, we have created a flight delay prediction model in Azure Machine Learning Studio. The model can predict with 80% certainty whether flights on specific routes will be more or less than 15 minutes late.

The prediction model is only a first version and still offers a lot of room for optimization. One option to further improve the model would be to add features such as the weather, the aircraft type, etc. Another option would be to test different algorithms and hyperparameters.

I hope this article was helpful. If you have remarks or questions, please write them in the comments.


  • Hi, I am Florian, a Zurich-based consultant for AI and Data. Since the completion of my Ph.D. in 2017, I have been working on the design and implementation of ML use cases in the Swiss financial sector. I started this blog in 2020 with the goal in mind to share my experiences and create a place where you can find key concepts of machine learning and materials that will allow you to kick-start your own Python projects.

Leave a Reply