Cryptocurrency Clustering Analysis

This project aims to analyze and cluster cryptocurrencies using Python and unsupervised machine learning to predict if cryptocurrencies are affected by 24-hour or 7-day price changes. As well as utilizing K-means clustering, employing data normalization and dimensionality reduction techniques such as Principal Component Analysis (PCA). The Cryto_Clustering notebook file with interactive visuals can be easily viewed by clicking on this link: [https://nbviewer.org/github.com/thaychansy/CryptoClustering/blob/main/Crypto_Clustering.ipynb]

Prepare the Data

To normalize the data from the CSV file, we use the StandardScaler() module from scikit-learn. We create a DataFrame with the scaled data and set the "coin_id" from the original DataFrame as the index for the new DataFrame.

from sklearn.preprocessing import StandardScaler
import pandas as pd

# Load data
df_market_data = pd.read_csv(
    "Resources/crypto_market_data.csv",
    index_col="coin_id")

# Normalize data
df_market_data_scaled = StandardScaler().fit_transform(df_market_data[['price_change_percentage_24h', 'price_change_percentage_7d', 
                                                                    'price_change_percentage_14d', 'price_change_percentage_30d', 
                                                                    'price_change_percentage_60d', 'price_change_percentage_200d',
                                                                    'price_change_percentage_1y']])

# Create scaled DataFrame
df_market_data_scaled = pd.DataFrame(price_change_scaled, columns=['price_change_percentage_24h', 'price_change_percentage_7d', 
                                                                    'price_change_percentage_14d', 'price_change_percentage_30d', 
                                                                    'price_change_percentage_60d', 'price_change_percentage_200d',
                                                                    'price_change_percentage_1y'])

Find the Best Value for k Using the Scaled DataFrame

Using the elbow method, we determine the best value for k through the following steps:

Create a list of k values from 1 to 11.
Compute the inertia for each value of k.
Plot the elbow curve.

# Initialize lists
k_values = list(range(1, 12))
inertia = []

# Compute inertia for each k
for i in k:
    k_model = KMeans(n_clusters=i, random_state=0)
    k_model.fit(df_market_data_scaled)
    inertia.append(k_model.inertia_)

# Plot elbow curve
df_elbow.hvplot.line(
    x="k", 
    y="inertia", 
    title="Elbow Curve", 
    xticks=k
)

Cluster Cryptocurrencies with K-means Using the Scaled DataFrame

Using the best k value, we cluster the cryptocurrencies as follows:

Initialize and fit the K-means model.
Predict clusters and add them to the DataFrame.
Create a scatter plot.

# Initialize the K-Means model using the best value for k
model = KMeans(n_clusters=4, random_state=1)

# Fit the K-Means model using the scaled data
model.fit(df_market_data_scaled)

# Predict the clusters to group the cryptocurrencies using the scaled data
k_4 = model.predict(df_market_data_scaled)

# Plot
price_change_scaled_predictions_df.hvplot.scatter(
    x="price_change_percentage_24h",
    y="price_change_percentage_7d",
    by="crypto_segment",  # Color points by the cluster labels from K-Means
    hover_cols=["coinid"],  # Add the cryptocurrency name to hover information
    title="Crypto Segmentation based on K-Means Clustering (k=4)"
)

Optimize Clusters with Principal Component Analysis

We perform PCA to reduce features to three principal components.

Fit PCA and retrieve explained variance.
Create a new DataFrame with PCA data.

# Create a PCA model instance and set `n_components=3`.
pca = PCA(n_components=3)

# Use the PCA model with `fit_transform` to reduce to 
# three principal components.
crypto_pca = pca.fit_transform(df_market_data_scaled)

# Retrieve the explained variance to determine how much information 
# can be attributed to each principal component.
pca.explained_variance_ratio_

# Create a new DataFrame with the PCA data.
crypto_pca_df = pd.DataFrame(
    crypto_pca,
    columns=["PCA1", "PCA2", "PCA3"]

# Set the coinid column as index
crypto_pca_df.set_index('coinid', inplace=True)
)

Find the Best Value for k Using the PCA DataFrame

Use the elbow method on the PCA DataFrame as previously described to determine the best k value.

Best Value for k with PCA is k = 4.

# Create a list with the number of k-values from 1 to 11
k = list(range(1, 12))

# Create a list with the number of k-values from 1 to 11
k = list(range(1, 12))

# Create an empty list to store the inertia values
inertia = []

# Create a for loop to compute the inertia with each possible value of k
for i in k:
    k_model = KMeans(n_clusters=i, random_state=0)
    k_model.fit(crypto_pca_df)
    inertia.append(k_model.inertia_)

# Plot
df_elbow_pca.hvplot.line(
    x="k",
    y="inertia",
    title="Elbow Curve PCA",
    xticks=k
)

Cluster Cryptocurrencies with K-means Using the PCA DataFrame

Repeat the clustering process using the PCA DataFrame.

Initialize K-means with the best k value from PCA.

Fit and predict clusters.
Create a scatter plot.

# Initialize K-means with best k from PCA
model = KMeans(n_clusters=4, random_state=0)
model.fit(crypto_pca_df)

# Predict the clusters to group the cryptocurrencies using the PCA data
k_4 = model.predict(crypto_pca_df)

# Create a copy of the DataFrame with the PCA data
crypto_pca_predictions_df = crypto_pca_df.copy()

# Add a new column to the DataFrame with the predicted clusters
crypto_pca_predictions_df["cryto_segments"] = k_4

# Plot
crypto_pca_predictions_df.hvplot.scatter(
    x="PCA1",
    y="PCA2",
    by="cryto_segments",
    hover_cols=["coinid"],  # Add the cryptocurrency name to hover information
    title="PCA Crypto Segmentation based on K-Means Clustering (k=4)"
)

Conclusion

This analysis provides insights into the clustering of cryptocurrencies using both the scaled data and the PCA-transformed data. The results can help in identifying trends and making informed investment decisions. The cryptocurrency clustering project successfully applied unsupervised learning techniques, specifically K-means clustering, to group cryptocurrencies based on their price change patterns over various time periods.

Contact

Thay Chansy - @thaychansy - or [email protected]

Please visit my Portfolio Page: thaychansy.github.io (https://thaychansy.github.io/)

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
Resources		Resources
Crypto_Clustering.ipynb		Crypto_Clustering.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cryptocurrency Clustering Analysis

Table of Contents

Prepare the Data

Find the Best Value for k Using the Scaled DataFrame

Cluster Cryptocurrencies with K-means Using the Scaled DataFrame

Optimize Clusters with Principal Component Analysis

Find the Best Value for k Using the PCA DataFrame

Cluster Cryptocurrencies with K-means Using the PCA DataFrame

Conclusion

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

thaychansy/crypto-clustering-unsuperivsed-learning

Folders and files

Latest commit

History

Repository files navigation

Cryptocurrency Clustering Analysis

Table of Contents

Prepare the Data

Find the Best Value for k Using the Scaled DataFrame

Cluster Cryptocurrencies with K-means Using the Scaled DataFrame

Optimize Clusters with Principal Component Analysis

Find the Best Value for k Using the PCA DataFrame

Cluster Cryptocurrencies with K-means Using the PCA DataFrame

Conclusion

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages