The code in this repository is an implementation of the methodology developed by Samadi et al. (2017) that combines Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) methods to incorporate the time dependencies of time-series data. This implementation was developed as a project for the curriculum unit "Estatística Multivariada" given by Adelaide Freitas at the University of Aveiro.
The implementation uses the NCAR Research Data Archive
dataset (ds578.1), which contains
monthly mean surface temperature (degrees C) and monthly accumulated
precipitation (millimetres) from 160 land stations in China from 1951 to 2000.
Only the temperature data was used for this project and it's located in the
files ch160sta.txt and ch160temp.txt.
The code was featured on a poster via a QR code for the One Day Meeting CIDMA conference.
-
Download the repository: You can download this repository to your local machine either by cloning it or by downloading it as a zip file.
- To clone the repository, use the following command in your terminal:
git clone https://github.com/Adrilihan/PCA-CCA.git
- To download the repository as a zip file, click on the
Codebutton on the repository page and then clickDownload ZIP. Extract the zip file to your desired location.
-
Open the R project: Navigate to the directory where you downloaded the repository and open the
PCA-CCA.Rprojfile. This will start the R environment with the correct working directory. -
Install the required packages: This project uses the
renvpackage for dependency management. If you don't haverenvinstalled, you can install it using the following command in the R console:install.packages("renv")Then, to install the project dependencies, use the following command:
renv::restore()
-
Run the R Markdown file: Finally, you can run the
PCA.Rmdfile to execute the code and generate the report.
Please note that this project was developed in R, so you need to have R installed on your machine. If you don't have R installed, you can download it from The Comprehensive R Archive Network (CRAN).
Currently, this repository contains an R Markdown file, PCA.Rmd, which
provides a detailed walkthrough of the steps taken to format the data and apply
the PCA-CCA methodology. Users can refer to this file to understand the process
and use the code provided.
In the future, we aim to develop a function that can perform the analysis given formatted data. Please stay tuned for updates.
This project is licensed under the MIT License. See the LICENSE file for details.
Data: https://rda.ucar.edu/datasets/ds578.1/
Paper: https://link.springer.com/article/10.1007/s00180-016-0667-1
PCA wiki: https://en.wikipedia.org/wiki/Principal_component_analysis
CCA wiki: https://en.wikipedia.org/wiki/Canonical_correlation
R plotly package documentation: https://plotly.com/r/
CCA R implementation: https://cmdlinetips.com/2020/12/canonical-correlation-analysis-in-r/