Skip to content

By employing a Generative Adversarial Network (GAN), the accuracy and efficiency of solar energy system designs is improved, offering a valuable opportunity to enhance to traditional data collection and simulation methods. The synthetic data output captures essential statistical properties of real data, ensuring its reliability and applicability.

License

Notifications You must be signed in to change notification settings

fizzi01/Synthetic-Time-Series-Data

Repository files navigation

Addressing data scarcity in local photovoltaic datasets: a GAN-based workflow

This repository contains the code to execute the workflow proposed in the paper. This workflow involves generating synthetic photovoltaic (PV) energy output, and validating the generated data.

Table of Contents

  1. Prerequisites
  2. Repository Structure
  3. Data Directory
  4. Running the Workflow
  5. Detailed Description
  6. Contact

Prerequisites

Before running the code, ensure you have the necessary dependencies installed. You can install them using the provided requirements.txt file.

pip install -r requirements.txt

Repository Structure

  • main.py: The main script to execute the entire workflow.
  • PvModel.py: Contains the PV system simulation model using the pvlib library. This module allows you to define the parameters of the specific PV system to be simulated.
  • DataProcessor.py: Handles the preprocessing and postprocessing of the data.
  • DataLoad.py: Module responsible for loading dataset from csv file, and preproccesing it for training the model
  • DataEvaluation.py: Module responsible for evaluating the synthetic data generated by the model
  • model/: Contains the trained DoppelGANger model and postprocess meta data.

Data Directory

The data/ directory contains all the datasets and data files referenced in the comment. This includes:

Running the Workflow

To run the entire workflow, execute the main.py script:

python main.py

This script will guide you through the steps of loading data, training the model, generating synthetic data, simulating the PV system, and validating the results.

Detailed Description

The main.py script orchestrates the entire workflow for generating synthetic photovoltaic (PV) energy data. Here is a detailed description of the script:

  1. Defining Constants: Constants such as location coordinates (Lecce, Italy), batch size, learning rate, latent dimension, epochs, sequence length, and sample length are defined.

    LOCATION = Location(40.3548, 18, 0, 0, 'Lecce')
    
    BATCH = 128
    LR = 0.001
    LATENT_DIM = 24
    EPOCHS = 850
    SEQUENCE_LENGTH = 24
    SAMPLE_LENGHT = 8
  2. Loading and Preprocessing Data: The DataLoad module is used to load the input meteorological data from a CSV file. The categorical and numerical columns are defined, and the data is prepared for training.

    loader = dl.Data(data_path="data/training/open-meteo-Lecce.csv")
  3. Setting Model and Training Parameters: The model parameters and training parameters for DoppelGANger are defined.

    model_args = ModelParameters(batch_size=BATCH, lr=LR, betas=(0.2, 0.9), latent_dim=LATENT_DIM, gp_lambda=2, pac=1)
    train_args = TrainParameters(epochs=EPOCHS, sequence_length=SEQUENCE_LENGTH, sample_length=SAMPLE_LENGHT, rounds=1, measurement_cols=numerical_cols)
  4. Training or Loading the Model: The script checks if the model needs to be retrained or if an existing trained model and its metadata can be loaded. If retraining is chosen, the data is prepared, and the DoppelGANger model is trained. The model and data processor are saved for future use.

    retrain = input('Retrain? (y/n) ').lower().replace(" ", "")
    model_path = "model/modelv4"
    processor_path = "model/metav4"
    
    if retrain == 'n':
        ...
    else:
        ...
  5. Generating Synthetic Data: Synthetic data is generated using the trained DoppelGANger model. The synthetic data is saved as a CSV file after reversing the normalization.

    synth_data = []
    
    ...
    synth_data = model_dop_gan.sample(n_samples=int(samples_n))
    ...
    
    synth_df = pd.concat(synth_data, axis=0)
    synth_df = processor.reverse_transform(synth_df)
  6. Evaluating Synthetic Data: The generated synthetic data is evaluated against the real data using metrics like MSE and RMSE. The evaluation results are saved as a CSV file.

    metrics_val = get_metrics(real_df=loader.data, synth_df=synth_df, params=processor.numerical_col)
  7. Generating PV Data: The synthetic data is used to simulate the PV system's performance using the PvModel module. The results are saved as a CSV file.

    module = pvm.PVModel(location=LOCATION)
    
    energy_sample = module.run_model(data, synth_df)

Contact

For any questions or issues, please contact the authors or open an issue in this repository.

About

By employing a Generative Adversarial Network (GAN), the accuracy and efficiency of solar energy system designs is improved, offering a valuable opportunity to enhance to traditional data collection and simulation methods. The synthetic data output captures essential statistical properties of real data, ensuring its reliability and applicability.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages