Big-Data-k-means-clustering

In this Problem Set we will solve the Unsupervised problem using k-means clustering algorithm.

Task 1

Read the wine data from the link provided below. Split the wine data into X and y. The X should have the features associated with each class of wine. The y should indicate the type of wine.

Peform PCA and extract the top two components.

Generate a scatter plot for the 2 components generated by PCA. Do they appear to be in clusters of 3?

The referece plot is given below.

Task 2

Run a k-means clustering model for the input data. This should generate the cluster centoids. Perform this for a value of k=3 and plot the cluster centroid vs. data points in that cluster as a scatter plot.

To check how well k-means performed, print the prediction accuracy and plot the confusion matrix. It is not straightforward to print the accuracy score. Makesure to match the predicted wine class to the original wine class and then print the accuracy.

Task 3

Run the KMeans model for different values of $k$ and plot the distortions in each case and identify the elbow of the curve for PCA'ed dataset with n=2 components.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Blobs.png		Blobs.png
ProblemSet3_dp35222.ipynb		ProblemSet3_dp35222.ipynb
README.md		README.md
centroids.png		centroids.png
confusion.png		confusion.png
elbow.png		elbow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Big-Data-k-means-clustering

Task 1

Task 2

Task 3

About

Uh oh!

Releases

Packages

Languages

Deeksha-Pandit/Big-Data-k-means-clustering

Folders and files

Latest commit

History

Repository files navigation

Big-Data-k-means-clustering

Task 1

Task 2

Task 3

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages