This repository contains the code to execute the workflow proposed in the paper. This workflow involves generating synthetic photovoltaic (PV) energy output, and validating the generated data.
Before running the code, ensure you have the necessary dependencies installed. You can install them using the provided requirements.txt file.
pip install -r requirements.txtmain.py: The main script to execute the entire workflow.PvModel.py: Contains the PV system simulation model using thepvliblibrary. This module allows you to define the parameters of the specific PV system to be simulated.DataProcessor.py: Handles the preprocessing and postprocessing of the data.DataLoad.py: Module responsible for loading dataset from csv file, and preproccesing it for training the modelDataEvaluation.py: Module responsible for evaluating the synthetic data generated by the modelmodel/: Contains the trained DoppelGANger model and postprocess meta data.
The data/ directory contains all the datasets and data files referenced in the comment. This includes:
- Input datasets (open-meteo-Lecce.csv)
- Synthetic datasets (ModelOutput.csv, ModelOutput_denormalized.csv, pv_output_data.csv)
- Validation metrics (metrics.csv)
To run the entire workflow, execute the main.py script:
python main.pyThis script will guide you through the steps of loading data, training the model, generating synthetic data, simulating the PV system, and validating the results.
The main.py script orchestrates the entire workflow for generating synthetic photovoltaic (PV) energy data. Here is a detailed description of the script:
-
Defining Constants: Constants such as location coordinates (Lecce, Italy), batch size, learning rate, latent dimension, epochs, sequence length, and sample length are defined.
LOCATION = Location(40.3548, 18, 0, 0, 'Lecce') BATCH = 128 LR = 0.001 LATENT_DIM = 24 EPOCHS = 850 SEQUENCE_LENGTH = 24 SAMPLE_LENGHT = 8
-
Loading and Preprocessing Data: The
DataLoadmodule is used to load the input meteorological data from a CSV file. The categorical and numerical columns are defined, and the data is prepared for training.loader = dl.Data(data_path="data/training/open-meteo-Lecce.csv")
-
Setting Model and Training Parameters: The model parameters and training parameters for DoppelGANger are defined.
model_args = ModelParameters(batch_size=BATCH, lr=LR, betas=(0.2, 0.9), latent_dim=LATENT_DIM, gp_lambda=2, pac=1) train_args = TrainParameters(epochs=EPOCHS, sequence_length=SEQUENCE_LENGTH, sample_length=SAMPLE_LENGHT, rounds=1, measurement_cols=numerical_cols)
-
Training or Loading the Model: The script checks if the model needs to be retrained or if an existing trained model and its metadata can be loaded. If retraining is chosen, the data is prepared, and the DoppelGANger model is trained. The model and data processor are saved for future use.
retrain = input('Retrain? (y/n) ').lower().replace(" ", "") model_path = "model/modelv4" processor_path = "model/metav4" if retrain == 'n': ... else: ...
-
Generating Synthetic Data: Synthetic data is generated using the trained DoppelGANger model. The synthetic data is saved as a CSV file after reversing the normalization.
synth_data = [] ... synth_data = model_dop_gan.sample(n_samples=int(samples_n)) ... synth_df = pd.concat(synth_data, axis=0) synth_df = processor.reverse_transform(synth_df)
-
Evaluating Synthetic Data: The generated synthetic data is evaluated against the real data using metrics like MSE and RMSE. The evaluation results are saved as a CSV file.
metrics_val = get_metrics(real_df=loader.data, synth_df=synth_df, params=processor.numerical_col)
-
Generating PV Data: The synthetic data is used to simulate the PV system's performance using the
PvModelmodule. The results are saved as a CSV file.module = pvm.PVModel(location=LOCATION) energy_sample = module.run_model(data, synth_df)
For any questions or issues, please contact the authors or open an issue in this repository.