This project involves exploratory data analysis (EDA) and linear regression modeling on a medical insurance dataset to uncover key insights and predict insurance charges based on individual characteristics.
- Source:
insurance.csv
- Features:
age
: Age of the policyholdersex
: Gender of the policyholderbmi
: Body Mass Indexchildren
: Number of dependentssmoker
: Smoker or non-smokerregion
: Residential regioncharges
: Insurance cost
ABC Insurance aims to understand which factors influence medical insurance premiums. The goal is to analyze the dataset and build a model to predict charges using various customer attributes.
- Perform EDA to uncover patterns and relationships
- Visualize insights using Python libraries (Matplotlib, Seaborn)
- Build and evaluate a simple linear regression model
- Provide business-level observations
- Python
- Jupyter Notebook
- Pandas, NumPy
- Matplotlib, Seaborn
- Smokers pay significantly higher premiums than non-smokers.
- There is a positive correlation between age, BMI, and charges.
- Southeast region showed relatively higher charges.
A linear regression model was used to predict charges
. The model was evaluated using Root Mean Square Error (RMSE) and visualized with predicted vs actual plots.