We welcome contributions from the community. To start, you will need the following software:
- Python 3.11.
- Git.
- Make (or a Windows equivalent).
- Google Cloud SDK.
-
You need to first clone the repository.
git clone https://<TBA>/paqarin.git -
From the root directory of the project, create a virtual environment named
.venvusing:python -m venv .venvMake sure you're using the correct Python version!
-
Activate the
.venvvirtual environment. In Windows, you should run:.venv\Scripts\activate -
Once the
.venvenvironment is active, run the following command to install all the project dependencies:make install make install-optionalTo access
makefrom a Windows machine, you need to installUnxUtilsfirst. Due to conflicting dependencies, is highly likely that this process fails. If this happens, runpython -m pip install -r env_state.txtto get a exact copy of a working Python environment. -
To make sure everything is working, run the tests with:
make test
- We use black to format our code. To apply it to your changes, just run
make formatfrom the project directory. - Our style guide is determined by flake8. To check if your code complies with our standards, run
make lintfrom the project directory. - We expect new features to include unit tests. To run pytest and produce a coverage report, run
make testfrom the project directory. - Before submitting changes, please run
make checklistand verify your code does not produce errors. This routine executes flake8, the mypy type checker, and all the unit tests.
These are the main modules of the package. We can organise them in 3 groups:
-
generator.py: This module contains abstractions for synthetic time series generation algorithms. If you're adding support for a new algorithm, you will need to extend the types defined in this module. -
timegan.py: This module contains components for generating time series using the TimeGAN algorithm. -
doppleganger.py: This modules contains components for generating time series using the DoppleGANger algorithm. -
par.py: This module contains components for generating time series using the CPAR algorithm.
-
adapter.py: This module contains the abstractions for handling multiple libraries for synthetic time series generation. If you're adding support for a new library, make sure you register it inget_generator_adapter. -
sdv_adapter.py: This module contains the adapter code for running synthetic time series generation algorithms from the SDV library. -
synthcity_adapter.py: This module contains the adapter code for running synthetic time series generation algorithms from the Synthcity library. -
ydata_adapter.py: This module contains the adapter code for running synthetic time series generation algorithms from the ydata-synthetic library.
-
evaluation.py: This modules contains components for gathering evaluation metrics of synthetic time series. -
multivariate_metrics.py:This module contains logic for evaluating synthetic time series on multivariate time series forecasting tasks, as proposed by Yoon et al.. -
univariate_metrics.py: This module contains logic for evaluating synthetic time series on univariate time series forecasting tasks, using the AutoGluon library.
-
cloud_trainer.py: This module contains the logic for training synthetic time series generation models in the Google Cloud Platform. This is still work in progress, so use with care. -
data_plots.py: This module contains plotting functionality for multivariate time series. -
data_utils.py: This module contains data processing functionality that's common to many synthetic data generation algorithms.