Addressing data scarcity in local photovoltaic datasets: a GAN-based workflow

This repository contains the code to execute the workflow proposed in the paper. This workflow involves generating synthetic photovoltaic (PV) energy output, and validating the generated data.

Prerequisites

Before running the code, ensure you have the necessary dependencies installed. You can install them using the provided requirements.txt file.

pip install -r requirements.txt

Repository Structure

main.py: The main script to execute the entire workflow.
PvModel.py: Contains the PV system simulation model using the pvlib library. This module allows you to define the parameters of the specific PV system to be simulated.
DataProcessor.py: Handles the preprocessing and postprocessing of the data.
DataLoad.py: Module responsible for loading dataset from csv file, and preproccesing it for training the model
DataEvaluation.py: Module responsible for evaluating the synthetic data generated by the model
model/: Contains the trained DoppelGANger model and postprocess meta data.

Data Directory

The data/ directory contains all the datasets and data files referenced in the comment. This includes:

Input datasets (open-meteo-Lecce.csv)
Synthetic datasets (ModelOutput.csv, ModelOutput_denormalized.csv, pv_output_data.csv)
Validation metrics (metrics.csv)

Running the Workflow

To run the entire workflow, execute the main.py script:

python main.py

This script will guide you through the steps of loading data, training the model, generating synthetic data, simulating the PV system, and validating the results.

Detailed Description

The main.py script orchestrates the entire workflow for generating synthetic photovoltaic (PV) energy data. Here is a detailed description of the script:

Defining Constants: Constants such as location coordinates (Lecce, Italy), batch size, learning rate, latent dimension, epochs, sequence length, and sample length are defined.
```
LOCATION = Location(40.3548, 18, 0, 0, 'Lecce')

BATCH = 128
LR = 0.001
LATENT_DIM = 24
EPOCHS = 850
SEQUENCE_LENGTH = 24
SAMPLE_LENGHT = 8
```
Loading and Preprocessing Data: The DataLoad module is used to load the input meteorological data from a CSV file. The categorical and numerical columns are defined, and the data is prepared for training.
```
loader = dl.Data(data_path="data/training/open-meteo-Lecce.csv")
```

Setting Model and Training Parameters: The model parameters and training parameters for DoppelGANger are defined.

model_args = ModelParameters(batch_size=BATCH, lr=LR, betas=(0.2, 0.9), latent_dim=LATENT_DIM, gp_lambda=2, pac=1)
train_args = TrainParameters(epochs=EPOCHS, sequence_length=SEQUENCE_LENGTH, sample_length=SAMPLE_LENGHT, rounds=1, measurement_cols=numerical_cols)

Training or Loading the Model: The script checks if the model needs to be retrained or if an existing trained model and its metadata can be loaded. If retraining is chosen, the data is prepared, and the DoppelGANger model is trained. The model and data processor are saved for future use.
```
retrain = input('Retrain? (y/n) ').lower().replace(" ", "")
model_path = "model/modelv4"
processor_path = "model/metav4"

if retrain == 'n':
    ...
else:
    ...
```

Generating Synthetic Data: Synthetic data is generated using the trained DoppelGANger model. The synthetic data is saved as a CSV file after reversing the normalization.

synth_data = []

...
synth_data = model_dop_gan.sample(n_samples=int(samples_n))
...

synth_df = pd.concat(synth_data, axis=0)
synth_df = processor.reverse_transform(synth_df)

Evaluating Synthetic Data: The generated synthetic data is evaluated against the real data using metrics like MSE and RMSE. The evaluation results are saved as a CSV file.
```
metrics_val = get_metrics(real_df=loader.data, synth_df=synth_df, params=processor.numerical_col)
```
Generating PV Data: The synthetic data is used to simulate the PV system's performance using the PvModel module. The results are saved as a CSV file.
```
module = pvm.PVModel(location=LOCATION)

energy_sample = module.run_model(data, synth_df)
```

Contact

For any questions or issues, please contact the authors or open an issue in this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Addressing data scarcity in local photovoltaic datasets: a GAN-based workflow

Table of Contents

Prerequisites

Repository Structure

Data Directory

Running the Workflow

Detailed Description

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
model		model
DataEvaluation.py		DataEvaluation.py
DataLoad.py		DataLoad.py
DataProcessor.py		DataProcessor.py
LICENSE		LICENSE
PvModel.py		PvModel.py
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

License

fizzi01/Synthetic-Time-Series-Data

Folders and files

Latest commit

History

Repository files navigation

Addressing data scarcity in local photovoltaic datasets: a GAN-based workflow

Table of Contents

Prerequisites

Repository Structure

Data Directory

Running the Workflow

Detailed Description

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages