Flight Delay Prediction

About Our Project

A delayed flight, which causes a last-minute shift in schedule and unneeded time in an airport, is every flier’s worst nightmare. To prevent fliers from having to deal with this inconvenience, our team developed a model which can show users if there will be a delay. Both airport companies and fliers will be able to use this model to adjust their schedules accordingly.

The Process

Week One

Exploratory Data Analysis

During our first week of development, our team learned more about the problem we decided to face through exploratory data analysis (EDA).
Week Two

Machine Learning

During our second week of development, we tackled the major problem at hand; predicting if there would actually be a delay.
Week 3

Deploying Our Website

During our last week of development, our team created the very website you're browsing through!
Be Part
Of Our
Story!

Exploratory Data Analysis

Before predicting any delay behaviors, we needed to establish an understanding of the data we were working with; what correlations could we find between the airline being traveled, or the length of the flight, and the flight being delayed? What do these correlations (or lack thereof) mean?

We established these connections through EDA, or Exploratory Data Analysis. As the name suggests, we explored our data through plots and visualizations in order to gain a deeper understanding of the meaning behind the numbers on a spreadsheet. We used the Airline Delay Prediction Dataset on Kaggle.

General Observations

The plots we created compared our independent variables (Flight length, departure time, airline, etc) to our target, in order for us to find the most relevant trends. The specifics for each plot are listed below, but the general trends are as follows:

Most features have little to no correlation to the occurrence of a delay
The arrival time for on time flights is 100 minutes behind the arrival time for delayed flights.
Wednesday is the most busy day of the week, followed closely by Tuesday and Thursday
Majority of airports (both the beginning and end locations) have very little documentation, meaning we would benefit from grouping those airports into an “Other” category

Data Preparation

After finding trends in our data through EDA, we knew how we needed to clean our data. A few examples of how we transformed our data included:

Dropping the rows where the flight duration was zero (was not recorded)
Dropping unnecessary columns, in this case the id column
Encoding categorical variables (such as the different airport names) as numbers
Cutting down our large sample size of ~500,000 to a random sample of 5,000 for our data visualization, and 20,000 for training and testing the model.
Cutting down the number of categories in AirportTo and AirportFrom by grouping any airport with three or less samples into an “Other” category

These changes were made for the benefit of the model, as computers can’t handle categories or such large datasets, but can work with numbers just fine. Similarly, our computers weren’t able to handle such a large amount of data, so cutting down the sample size immensely helped our processing speed without sacrificing too much information.

Services

Lorem ipsum dolor sit amet consectetur.

XYZ

Lorem ipsum dolor sit amet, consectetur adipisicing elit. Minima maxime quam architecto quo inventore harum ex magni, dicta impedit.

ABC

Lorem ipsum dolor sit amet, consectetur adipisicing elit. Minima maxime quam architecto quo inventore harum ex magni, dicta impedit.

DEF

Lorem ipsum dolor sit amet, consectetur adipisicing elit. Minima maxime quam architecto quo inventore harum ex magni, dicta impedit.

The Data

Understanding the dataset through visualizations

Heatmap

Relationships between features

Airlines

Distribution of airlines in the dataset

Day of Flight

Distribution of flights based on day of the week

Source Airports

Distribution of on-time and delayed flights

Destination Airports

Distribution of on-time and delayed flights

Arrival Times - Delay

Distribution of times when flight was delayed

Arrival Times - On Time

Distribution of times when flight was on-time

Flight Duration - Delay

Distribution of durations when flight was delayed

Flight Duration - On Time

Distribution of durations when flight was on-time

Our Amazing Team

Lorem ipsum dolor sit amet consectetur.

Parveen Anand

Lead Designer

Diana Petersen

Lead Marketer

Larry Parker

Lead Developer

Parveen Anand

Lead Designer

Diana Petersen

Lead Marketer

Larry Parker

Lead Developer

Lorem ipsum dolor sit amet, consectetur adipisicing elit. Aut eaque, laboriosam veritatis, quos non quis ad perspiciatis, totam corporis ea, alias ut unde.

About Our Project

The Process

Week One

Exploratory Data Analysis

Week Two

Machine Learning

Week 3

Deploying Our Website

Be Part Of Our Story!

Exploratory Data Analysis

General Observations

Data Preparation

Services

Lorem ipsum dolor sit amet consectetur.

XYZ

ABC

DEF

The Data

Understanding the dataset through visualizations

Our Amazing Team

Lorem ipsum dolor sit amet consectetur.

Parveen Anand

Diana Petersen

Larry Parker

Parveen Anand

Diana Petersen

Larry Parker

Be Part
Of Our
Story!