-
First we perform data cleaning on the general population dataset.
-
Next we wrap all the data cleaning steps inside a function to auto-clean datasets in one fell swoop. Data imputation is done wherever necessary afterwards.
-
Using PCA, we reduce dimensionality and retain enough principal components to explain about 85% of the cleaned & imputed general population dataset.
-
We cluster the cleaned & imputed general population dataset using the KMeans algorithm and appropriately select the number of components by finding the elbow of the inertia vs number of components plot.
-
We then use the previous steps on the cleaned & imputed customer dataset to identify which of the previously identified clusters are overrepresented/underrepresented in the customers dataset.
-
Notifications
You must be signed in to change notification settings - Fork 0
This is a project submitted as a part of Bosch AI Talent Accelerator Data Scientist program offered by Bosch & Udacity. The project involves identifying customer segments by comparing the general population vs customer demographics datasets from AZ Direct Gmbh and Arvato Financial Solutions using clustering techniques.
vkilohani/Identifying-Customer-Segments-from-AZ-Direct-Gmbh-and-Arvato-Financial-Solutions-dataset
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
This is a project submitted as a part of Bosch AI Talent Accelerator Data Scientist program offered by Bosch & Udacity. The project involves identifying customer segments by comparing the general population vs customer demographics datasets from AZ Direct Gmbh and Arvato Financial Solutions using clustering techniques.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published