| title | emoji | colorFrom | colorTo | sdk | app_port | tags | pinned | short_description | license | |
|---|---|---|---|---|---|---|---|---|---|---|
UnsupervisedCustumerPrediction |
🧩 |
indigo |
blue |
docker |
8501 |
|
false |
Streamlit app that predicts cluster labels from uploaded CSV |
mit |
🔗 Live Demo:(https://huggingface.co/spaces/EnYa32/UnsupervisedCustumerPrediction)
📓 Kaggle Competition: https://www.kaggle.com/code/enesyama/unsupervisedcustumer-clustering
- Task: Unsupervised Clustering
- Models: KMeans + Gaussian Mixture Model
- Preprocessing: StandardScaler + PCA (95%)
- Input: Tabular CSV
- Output: Cluster Labels
- Deployment: Streamlit App
- Pipeline: Saved sklearn objects (.pkl)
Raw Features
→ StandardScaler
→ PCA (95% variance)
→ Clustering Model (KMeans / GMM)
→ Cluster Label Prediction
- Upload a CSV file
- The app checks required feature columns
- Applies scaler + PCA
- Outputs Predicted cluster label for each row
- Lets you download the predictions as a CSV
Because this is an unsupervised learning project, evaluation is not based on labeled accuracy.
Model quality was evaluated using:
- Kaggle leaderboard score
- Cluster stability
- PCA visualization
- Distribution consistency across clusters
- Python
- scikit-learn
- PCA
- KMeans
- Gaussian Mixture Models
- Streamlit
- Pickle model persistence
The following trained pipeline artifacts must exist in the repo root:
- feature_names.pkl
- scaler.pkl
- pca.pkl
- kmeans_model_k9.pkl
- gmm_model_k9.pkl
Your CSV must include all feature columns stored in feature_names.pkl.
Optional:
- You may include an
idorIdcolumn.
If present, it will be included in the output asId.
pip install -r requirements.txt
streamlit run app.py
Visual separation in 2D does not always reflect the Kaggle metric.
You can download the CSV file here:
https://www.kaggle.com/competitions/tabular-playground-series-jul-2022/data

