This project aims to analyze and cluster cryptocurrencies using Python and unsupervised machine learning to predict if cryptocurrencies are affected by 24-hour or 7-day price changes. As well as utilizing K-means clustering, employing data normalization and dimensionality reduction techniques such as Principal Component Analysis (PCA). The Cryto_Clustering notebook file with interactive visuals can be easily viewed by clicking on this link: [https://nbviewer.org/github.com/thaychansy/CryptoClustering/blob/main/Crypto_Clustering.ipynb]
- Cryptocurrency Clustering Analysis
- Table of Contents
- Prepare the Data
- Find the Best Value for k Using the Scaled DataFrame
- Cluster Cryptocurrencies with K-means Using the Scaled DataFrame
- Optimize Clusters with Principal Component Analysis
- Find the Best Value for k Using the PCA DataFrame
- Cluster Cryptocurrencies with K-means Using the PCA DataFrame
- Conclusion
- Contact
To normalize the data from the CSV file, we use the StandardScaler()
module from scikit-learn
. We create a DataFrame with the scaled data and set the "coin_id" from the original DataFrame as the index for the new DataFrame.
from sklearn.preprocessing import StandardScaler
import pandas as pd
# Load data
df_market_data = pd.read_csv(
"Resources/crypto_market_data.csv",
index_col="coin_id")
# Normalize data
df_market_data_scaled = StandardScaler().fit_transform(df_market_data[['price_change_percentage_24h', 'price_change_percentage_7d',
'price_change_percentage_14d', 'price_change_percentage_30d',
'price_change_percentage_60d', 'price_change_percentage_200d',
'price_change_percentage_1y']])
# Create scaled DataFrame
df_market_data_scaled = pd.DataFrame(price_change_scaled, columns=['price_change_percentage_24h', 'price_change_percentage_7d',
'price_change_percentage_14d', 'price_change_percentage_30d',
'price_change_percentage_60d', 'price_change_percentage_200d',
'price_change_percentage_1y'])
Using the elbow method, we determine the best value for k through the following steps:
- Create a list of k values from 1 to 11.
- Compute the inertia for each value of k.
- Plot the elbow curve.
# Initialize lists
k_values = list(range(1, 12))
inertia = []
# Compute inertia for each k
for i in k:
k_model = KMeans(n_clusters=i, random_state=0)
k_model.fit(df_market_data_scaled)
inertia.append(k_model.inertia_)
# Plot elbow curve
df_elbow.hvplot.line(
x="k",
y="inertia",
title="Elbow Curve",
xticks=k
)

Using the best k value, we cluster the cryptocurrencies as follows:
- Initialize and fit the K-means model.
- Predict clusters and add them to the DataFrame.
- Create a scatter plot.
# Initialize the K-Means model using the best value for k
model = KMeans(n_clusters=4, random_state=1)
# Fit the K-Means model using the scaled data
model.fit(df_market_data_scaled)
# Predict the clusters to group the cryptocurrencies using the scaled data
k_4 = model.predict(df_market_data_scaled)
# Plot
price_change_scaled_predictions_df.hvplot.scatter(
x="price_change_percentage_24h",
y="price_change_percentage_7d",
by="crypto_segment", # Color points by the cluster labels from K-Means
hover_cols=["coinid"], # Add the cryptocurrency name to hover information
title="Crypto Segmentation based on K-Means Clustering (k=4)"
)

We perform PCA to reduce features to three principal components.
- Fit PCA and retrieve explained variance.
- Create a new DataFrame with PCA data.
# Create a PCA model instance and set `n_components=3`.
pca = PCA(n_components=3)
# Use the PCA model with `fit_transform` to reduce to
# three principal components.
crypto_pca = pca.fit_transform(df_market_data_scaled)
# Retrieve the explained variance to determine how much information
# can be attributed to each principal component.
pca.explained_variance_ratio_
# Create a new DataFrame with the PCA data.
crypto_pca_df = pd.DataFrame(
crypto_pca,
columns=["PCA1", "PCA2", "PCA3"]
# Set the coinid column as index
crypto_pca_df.set_index('coinid', inplace=True)
)
Use the elbow method on the PCA DataFrame as previously described to determine the best k value.
Best Value for k with PCA is k = 4.
# Create a list with the number of k-values from 1 to 11
k = list(range(1, 12))
# Create a list with the number of k-values from 1 to 11
k = list(range(1, 12))
# Create an empty list to store the inertia values
inertia = []
# Create a for loop to compute the inertia with each possible value of k
for i in k:
k_model = KMeans(n_clusters=i, random_state=0)
k_model.fit(crypto_pca_df)
inertia.append(k_model.inertia_)
# Plot
df_elbow_pca.hvplot.line(
x="k",
y="inertia",
title="Elbow Curve PCA",
xticks=k
)

Repeat the clustering process using the PCA DataFrame.
Initialize K-means with the best k value from PCA.
- Fit and predict clusters.
- Create a scatter plot.
# Initialize K-means with best k from PCA
model = KMeans(n_clusters=4, random_state=0)
model.fit(crypto_pca_df)
# Predict the clusters to group the cryptocurrencies using the PCA data
k_4 = model.predict(crypto_pca_df)
# Create a copy of the DataFrame with the PCA data
crypto_pca_predictions_df = crypto_pca_df.copy()
# Add a new column to the DataFrame with the predicted clusters
crypto_pca_predictions_df["cryto_segments"] = k_4
# Plot
crypto_pca_predictions_df.hvplot.scatter(
x="PCA1",
y="PCA2",
by="cryto_segments",
hover_cols=["coinid"], # Add the cryptocurrency name to hover information
title="PCA Crypto Segmentation based on K-Means Clustering (k=4)"
)

This analysis provides insights into the clustering of cryptocurrencies using both the scaled data and the PCA-transformed data. The results can help in identifying trends and making informed investment decisions. The cryptocurrency clustering project successfully applied unsupervised learning techniques, specifically K-means clustering, to group cryptocurrencies based on their price change patterns over various time periods.


Thay Chansy - @thaychansy - or [email protected]
Please visit my Portfolio Page: thaychansy.github.io (https://thaychansy.github.io/)