This project presents an exploratory data analysis (EDA) of a Netflix dataset using Python. It covers data cleaning, transformation, visualization, and insight extraction to uncover meaningful patterns in Netflix movie/TV data. The goal is to understand genre trends, popularity distribution, voting patterns, and release year insights.
-
π₯ Data Loading: Load and inspect structured CSV datasets.
-
π§Ή Data Cleaning:
- Handle missing values and duplicates.
- Convert date columns to datetime format and extract year.
- Drop irrelevant columns.
-
π Exploratory Data Analysis:
- Descriptive statistics.
- Vote categorization into
popular,average,below_avg,not_popular. - Genre splitting and normalization.
-
π Visualizations:
- Genre frequency distribution.
- Vote category distribution.
- Popularity extremes (most/least popular movies).
- Release year trends.
-
π Insights Extraction: Identify top genres, most popular titles, and yearly content trends.
netflix-data-analysis/
β
βββ Netflix_Data_Analysis.ipynb # Main Jupyter notebook
βββ netflix_dataset.csv # Dataset used
βββ README.md # Project description
βββ requirements.txt # Python dependencies
git clone https://github.com/vinitjain2005/Netflix-Data-Analysis.git
cd Netflix-Data-Analysispip install -r requirements.txt
β οΈ Make sure you have Jupyter installed:pip install notebook
jupyter notebookOpen Netflix_Data_Analysis.ipynb and run the cells to reproduce the analysis.
- Python 3.x
- Jupyter Notebook
- Pandas β Data manipulation and cleaning
- Matplotlib / Seaborn β Data visualization
- Genre distribution bar charts.
- Vote category counts.
- Most popular vs least popular movies.
- Release year histogram.
This project is open-source under the MIT License.
Contributions are welcome! Fork the repository, enhance the notebook, or suggest new visualizations via pull requests.