Identifying Customer Segments of a mail-order sales company

First we perform data cleaning on the general population dataset.
Next we wrap all the data cleaning steps inside a function to auto-clean datasets in one fell swoop. Data imputation is done wherever necessary afterwards.
Using PCA, we reduce dimensionality and retain enough principal components to explain about 85% of the cleaned & imputed general population dataset.
We cluster the cleaned & imputed general population dataset using the KMeans algorithm and appropriately select the number of components by finding the elbow of the inertia vs number of components plot.
We then use the previous steps on the cleaned & imputed customer dataset to identify which of the previously identified clusters are overrepresented/underrepresented in the customers dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
Identify_Customer_Segments_Submission.html		Identify_Customer_Segments_Submission.html
Identify_Customer_Segments_Submission.ipynb		Identify_Customer_Segments_Submission.ipynb
README.md		README.md

Provide feedback