K-Means Clustering

Customer segmentation using K-Means clustering on mall customer data.

Overview

This project applies the K-Means++ algorithm to segment mall customers based on their annual income and spending score. The elbow method is used to determine the optimal number of clusters (k = 5), and results are visualized as a 2D scatter plot.

Implementations are provided in both Python and R.

Dataset

Mall_Customers.csv — 200 records with the following columns:

Column	Description
CustomerID	Unique identifier
Genre	Gender (Male / Female)
Age	Customer age
Annual Income (k$)	Yearly income in thousands
Spending Score (1-100)	Score assigned by the mall

Methodology

Load the dataset and select features (Annual Income, Spending Score)
Run K-Means for k = 1..10 and record WCSS (Within-Cluster Sum of Squares)
Plot the elbow curve to identify optimal k
Fit K-Means++ with k = 5 and visualize the resulting clusters

🛠 Tech Stack

Tool	Purpose
🐍 Python 3	Primary implementation
📊 scikit-learn	KMeans clustering
🔢 NumPy	Numerical operations
📈 Matplotlib	Visualization
🐼 pandas	Data loading
📉 R	Alternative implementation

Getting Started

# Install dependencies
pip install numpy pandas matplotlib scikit-learn

# Run the clustering
python kmeans.py

For the R version:

# Requires: cluster package
Rscript kmeans.R

Project Structure

├── kmeans.py                      # Python K-Means implementation
├── kmeans.R                       # R K-Means implementation
├── data_preprocessing_template.py # Generic preprocessing template (Python)
├── data_preprocessing_template.R  # Generic preprocessing template (R)
├── Mall_Customers.csv             # Dataset
└── README.md

⚠️ Known Issues

data_preprocessing_template.py references a generic Data.csv that is not included — it is a reusable template, not specific to this project.
The R script shadows the built-in kmeans function by assigning the result to a variable named kmeans.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

K-Means Clustering

Overview

Dataset

Methodology

🛠 Tech Stack

Getting Started

Project Structure

⚠️ Known Issues

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
Mall_Customers.csv		Mall_Customers.csv
README.md		README.md
data_preprocessing_template.R		data_preprocessing_template.R
data_preprocessing_template.py		data_preprocessing_template.py
kmeans.R		kmeans.R
kmeans.py		kmeans.py

Folders and files

Latest commit

History

Repository files navigation

K-Means Clustering

Overview

Dataset

Methodology

🛠 Tech Stack

Getting Started

Project Structure

⚠️ Known Issues

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages