Netflix Movie Analysis¶
In this project we use a dataset from Datacamp containing movie and TV show information from Netflix titles. The informaiton includes details such as genre, type, release year, etc. We answer two questions:
What was the most frequent movie duration in the 1990s?
How many action films in the 90s were under 90 min?
In [ ]:
import pandas as pd
import matplotlib.pyplot as plt
In [ ]:
netflix_df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Datacamp/Netflix Movies/netflix_data.csv')
In [ ]:
netflix_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4812 entries, 0 to 4811 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 show_id 4812 non-null object 1 type 4812 non-null object 2 title 4812 non-null object 3 director 4812 non-null object 4 cast 4812 non-null object 5 country 4812 non-null object 6 date_added 4812 non-null object 7 release_year 4812 non-null int64 8 duration 4812 non-null int64 9 description 4812 non-null object 10 genre 4812 non-null object dtypes: int64(2), object(9) memory usage: 413.7+ KB
What was the most frequent movie duration in the 1990s?
In [ ]:
movies_only = netflix_df[netflix_df['type'] == 'Movie']
In [ ]:
_90s_only = movies_only[(movies_only['release_year'] >= 1990) & (movies_only['release_year'] < 2000)]
In [ ]:
# Look at distribution
plt.hist(_90s_only['duration'], bins=10)
plt.title('Distribution of Movie Durations in the 1990s')
plt.xlabel('Duration (minutes)')
plt.ylabel('Number of Movies')
plt.show()
Most frequent duration looks to be around 100 min. We can calculate it more specifically using mode as seen below.
In [ ]:
_90s_only['duration'].mode()[0]
Out[ ]:
94
How many action films in the 90s were under 90 min?
In [ ]:
action_movies = _90s_only[_90s_only['genre'] == 'Action']
In [ ]:
(action_movies['duration'] < 90).sum()
Out[ ]:
7
There are 7 movies from the 90s where duration is < 90 min