Netflix Movie Analysis¶

In this project we use a dataset from Datacamp containing movie and TV show information from Netflix titles. The informaiton includes details such as genre, type, release year, etc. We answer two questions:

What was the most frequent movie duration in the 1990s?

How many action films in the 90s were under 90 min?

In [ ]:
import pandas as pd
import matplotlib.pyplot as plt
In [ ]:
netflix_df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Datacamp/Netflix Movies/netflix_data.csv')
In [ ]:
netflix_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4812 entries, 0 to 4811
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       4812 non-null   object
 1   type          4812 non-null   object
 2   title         4812 non-null   object
 3   director      4812 non-null   object
 4   cast          4812 non-null   object
 5   country       4812 non-null   object
 6   date_added    4812 non-null   object
 7   release_year  4812 non-null   int64 
 8   duration      4812 non-null   int64 
 9   description   4812 non-null   object
 10  genre         4812 non-null   object
dtypes: int64(2), object(9)
memory usage: 413.7+ KB


What was the most frequent movie duration in the 1990s?

In [ ]:
movies_only = netflix_df[netflix_df['type'] == 'Movie']
In [ ]:
_90s_only = movies_only[(movies_only['release_year'] >= 1990) & (movies_only['release_year'] < 2000)]
In [ ]:
# Look at distribution
plt.hist(_90s_only['duration'], bins=10)
plt.title('Distribution of Movie Durations in the 1990s')
plt.xlabel('Duration (minutes)')
plt.ylabel('Number of Movies')
plt.show()
No description has been provided for this image


Most frequent duration looks to be around 100 min. We can calculate it more specifically using mode as seen below.

In [ ]:
_90s_only['duration'].mode()[0]
Out[ ]:
94

How many action films in the 90s were under 90 min?

In [ ]:
action_movies = _90s_only[_90s_only['genre'] == 'Action']
In [ ]:
(action_movies['duration'] < 90).sum()
Out[ ]:
7

There are 7 movies from the 90s where duration is < 90 min