This repository contains an RMarkdown file showcasing a detailed statistical analysis of Italian regions. The project explores multivariate statistical techniques applied to a dataset created specifically for this analysis.
The analysis addresses various research questions and includes the following steps:
- Research Questions: Defining the scope and objectives of the study.
- Dataset Creation and Variables Explanation: Merging and preprocessing the dataset to suit the analysis needs.
- Exploratory Analysis: Conducting initial exploration and validation of the data.
- Correlation Plot: Visualizing relationships between variables.
- Principal Component Analysis (PCA): Using Bartlettโs test for dimensionality reduction.
- Multidimensional Scaling: Reducing dimensions while preserving distances.
- Clustering Methods:
- Hierarchical Clustering
- K-Means Clustering
- PAM Clustering
- Model-Based Clustering
- MANOVA: Performing a multivariate analysis of variance using the clustering results.
- Conclusions and Further Improvements: Summarizing findings and discussing potential enhancements.
The repository also includes a presentation featuring:
- Key plots and visuals.
- Main conclusions drawn from the analysis.
- Suggestions for further improvements.
The dataset was sourced from DatiOpen. Note that the original data was not on the same scale, presenting an additional challenge during preprocessing. The dataset was merged and adjusted to meet the needs of the analysis.
Working with data from my own country has been an incredibly rewarding experience. Exploring these techniques while delving into regional data provided valuable insights and a deeper connection to the project.
Feel free to reach out with suggestions or questions at alessia.leofolliero@gmail.com. Your feedback is welcome!
Let me know if you need further adjustments or enhancements! ๐