Genesis Generators
Flight Delay Prediction
Learn More

About Our Project

A delayed flight, which causes a last-minute shift in schedule and unneeded time in an airport, is every flier’s worst nightmare. To prevent fliers from having to deal with this inconvenience, our team developed a model which can show users if there will be a delay. Both airport companies and fliers will be able to use this model to adjust their schedules accordingly.

The Process

Exploratory Data Analysis

Before predicting any delay behaviors, we needed to establish an understanding of the data we were working with; what correlations could we find between the airline being traveled, or the length of the flight, and the flight being delayed? What do these correlations (or lack thereof) mean?

We established these connections through EDA, or Exploratory Data Analysis. As the name suggests, we explored our data through plots and visualizations in order to gain a deeper understanding of the meaning behind the numbers on a spreadsheet. We used the Airline Delay Prediction Dataset on Kaggle.

General Observations

The plots we created compared our independent variables (Flight length, departure time, airline, etc) to our target, in order for us to find the most relevant trends. The specifics for each plot are listed below, but the general trends are as follows:

  • Most features have little to no correlation to the occurrence of a delay

  • The arrival time for on time flights is 100 minutes behind the arrival time for delayed flights.

  • Wednesday is the most busy day of the week, followed closely by Tuesday and Thursday

  • Majority of airports (both the beginning and end locations) have very little documentation, meaning we would benefit from grouping those airports into an “Other” category

Data Preparation

After finding trends in our data through EDA, we knew how we needed to clean our data. A few examples of how we transformed our data included:

  • Dropping the rows where the flight duration was zero (was not recorded)

  • Dropping unnecessary columns, in this case the id column

  • Encoding categorical variables (such as the different airport names) as numbers

  • Cutting down our large sample size of ~500,000 to a random sample of 5,000 for our data visualization, and 20,000 for training and testing the model.

  • Cutting down the number of categories in AirportTo and AirportFrom by grouping any airport with three or less samples into an “Other” category

These changes were made for the benefit of the model, as computers can’t handle categories or such large datasets, but can work with numbers just fine. Similarly, our computers weren’t able to handle such a large amount of data, so cutting down the sample size immensely helped our processing speed without sacrificing too much information.

Services

Lorem ipsum dolor sit amet consectetur.

XYZ

Lorem ipsum dolor sit amet, consectetur adipisicing elit. Minima maxime quam architecto quo inventore harum ex magni, dicta impedit.

ABC

Lorem ipsum dolor sit amet, consectetur adipisicing elit. Minima maxime quam architecto quo inventore harum ex magni, dicta impedit.

DEF

Lorem ipsum dolor sit amet, consectetur adipisicing elit. Minima maxime quam architecto quo inventore harum ex magni, dicta impedit.

The Data

Understanding the dataset through visualizations

...
Heatmap
Relationships between features
...
Airlines
Distribution of airlines in the dataset
...
Day of Flight
Distribution of flights based on day of the week
...
Source Airports
Distribution of on-time and delayed flights
...
Destination Airports
Distribution of on-time and delayed flights
...
Arrival Times - Delay
Distribution of times when flight was delayed
...
Arrival Times - On Time
Distribution of times when flight was on-time
...
Flight Duration - Delay
Distribution of durations when flight was delayed
...
Flight Duration - On Time
Distribution of durations when flight was on-time

Our Amazing Team

Lorem ipsum dolor sit amet consectetur.

...

Parveen Anand

Lead Designer

...

Diana Petersen

Lead Marketer

...

Larry Parker

Lead Developer

...

Parveen Anand

Lead Designer

...

Diana Petersen

Lead Marketer

...

Larry Parker

Lead Developer

Lorem ipsum dolor sit amet, consectetur adipisicing elit. Aut eaque, laboriosam veritatis, quos non quis ad perspiciatis, totam corporis ea, alias ut unde.