PRAIM

This is the official repository for our paper published in IEEE Transactions on Smart Grid, entitled A Unified Variational Imputation Framework for Electric Vehicle Charging Data Using Retrieval-Augmented Language Model.

Methodology Description

Dataset Preparation

Our study uses four datasets from Boulder (US), Palo Alto (USA), Dundee (UK), and Perth (UK), whose EV charging data is made publicly available in corresponding links. The raw dataset is in the form of discrete charging sessions. We have aggregated the session-level data into a daily basis in the folders of data_Boulder, data_PaloAlto, data_Dundee, and data_Perth, respectively.

In these folders, in addition to the daily EV charging demand data, there are two additional files

distance_matrix.npy: The geographical distance between each two charging stations. Prepared for the construction of distance-based hypergraph.
station_lon_lat.pkl: Each station's longitude and latitude. Prepared to retrieve geospatial PoI information.

DataPoint Generation

For each time step, we create a class EVChargingDataPoint in utils.py, including station_id, history (the historical EV charging demand sequence), missing_mask (Boolean sequence indicating whether specific days do not have charging demand record), calendar_info, station_info (contains PoI information), and embedding.

The code of generating DataPoint (except embedding) can be found in generate_datapoint_except_embedding.py.

To generate embedding that encode all relevant information for each DataPoint, the code generate_datapoint_embedding.py is used to leverage LLMEmbedder (defiend in llm_embedder.py) for the embedding generation via pretrained LLM.

Neural Architecture

The neural architecture can be found in neural_net.py

Training and Evaluation

Code can be found in train_decoder.py

Baselines

Baselines can be found in run_baseline.py

Model Checkpoints

PRAIM's model parameters trained under mask ratio 0.2 are released in the folder outputs/Boulder, outputs/PaloAlto, outputs/Dundee, and outputs/Perth. Each folder incldues several checkpoints and the best model parameters.

Result Visualization

The training and evaluation outputs of all baselines, as well as our PRAIM, are saved in outputs folder. The corresponding MAE metric for imputation is shown below.

Downstream Forecasting Task

Using imputed data for the downstream forecasting task is written in impute_for_downstream_forecast_res_gen.py, with corresponding file for impute_for_downstream_plt.py plotting. The results are saved in res_downstream_forecast folder. The relative performance improvement with imputed data are shown below.

Acknowledgements

This work was supported in part by the Australian Research Council (ARC) Discovery Early Career Researcher Award (DECRA) under Grant DE230100046.

Citation

@ARTICLE{li2026praim,
  author={Li, Jinhao and Wang, Hao},
  journal={IEEE Transactions on Smart Grid}, 
  title={A Unified Variational Imputation Framework for Electric Vehicle Charging Data Using Retrieval-Augmented Language Model}, 
  year={2026},
  volume={TBD},
  number={TBD},
  pages={TBD-TBD},
  doi={TBD}
}

License

The released dataset is made available under the Open Database License. Any rights in individual contents of the database are licensed under the Database Contents License.

Contact

Feel free to contact:

Jinhao Li

Monash University, Faculty of IT, Department of Data Science and AI.

Email: jinhao.li@monash.edu or steplee175@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data_Boulder		data_Boulder
data_Dundee		data_Dundee
data_PaloAlto		data_PaloAlto
data_Perth		data_Perth
outputs		outputs
res_downstream_forecast		res_downstream_forecast
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
baselines.py		baselines.py
generate_datapoint_embedding.py		generate_datapoint_embedding.py
generate_datapoint_embedding_generate_scripts.py		generate_datapoint_embedding_generate_scripts.py
generate_datapoint_except_embedding.py		generate_datapoint_except_embedding.py
generate_datapoint_masked_embedding.py		generate_datapoint_masked_embedding.py
generate_datapoint_masked_embedding_generate_scripts.py		generate_datapoint_masked_embedding_generate_scripts.py
impute_calculate_train_eval_MAE.py		impute_calculate_train_eval_MAE.py
impute_for_downstream_forecast_plot.py		impute_for_downstream_forecast_plot.py
impute_for_downstream_forecast_res_gen.py		impute_for_downstream_forecast_res_gen.py
llm_embedder.py		llm_embedder.py
neural_net.py		neural_net.py
run_baseline.py		run_baseline.py
run_baseline_generate_scripts.py		run_baseline_generate_scripts.py
train_decoder.py		train_decoder.py
train_decoder_generate_scripts.py		train_decoder_generate_scripts.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PRAIM

Methodology Description

Dataset Preparation

DataPoint Generation

Neural Architecture

Training and Evaluation

Baselines

Model Checkpoints

Result Visualization

Downstream Forecasting Task

Acknowledgements

Citation

License

Contact

About

Uh oh!

Releases

Packages

Languages

License

StephLee12/PRAIM

Folders and files

Latest commit

History

Repository files navigation

PRAIM

Methodology Description

Dataset Preparation

DataPoint Generation

Neural Architecture

Training and Evaluation

Baselines

Model Checkpoints

Result Visualization

Downstream Forecasting Task

Acknowledgements

Citation

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages