The project involves implementing clustering algorithms to analyze synthetic heart disease datasets using K-Means, Hierarchical, and Mean Shift clustering techniques. The performance of these algorithms is evaluated through metrics such as Silhouette Score, Calinski-Harabasz Index, and Davies-Bouldin Index. Overview This project analyzes various clustering techniques (K-Means, Hierarchical Clustering, and Mean Shift) on an Iris-like dataset, focusing on the impact of preprocessing and PCA on clustering performance.
Dataset The dataset contains the following features:
sepal_length sepal_width petal_length petal_width species (for validation) Clustering Techniques K-Means Clustering: Assesses cluster counts (c = 3, 4, 5) and evaluates performance using Silhouette, Calinski-Harabasz, and Davies-Bouldin scores.
Hierarchical Clustering: Analyzes the effects of normalization and PCA on clustering performance.
Mean Shift Clustering: Examines the algorithm's effectiveness under various preprocessing conditions.
Results Results are summarized in comparison tables, illustrating the performance of each algorithm across different configurations.