Toward Graph Data Collaboration in a Data-Sharing-Free Manner: A Novel Privacy-Preserving Graph Pre-training Model
This archive is distributed in association with the INFORMS Journal on Computing under the MIT License.
The software and data in this repository are a snapshot of the software and data that were used in the research reported on in the paper Toward Graph Data Collaboration in a Data-Sharing-Free Manner: A Novel Privacy-Preserving Graph Pre-training Model by Jiarong Xu, Jiaan Wang, Zenan Zhou and Tian Lu.
To cite the contents of this repository, please cite both the paper and this repo, using their respective DOIs.
https://doi.org/10.1287/ijoc.2023.0115
https://doi.org/10.1287/ijoc.2023.0115.cd
Below is the BibTex for citing this snapshot of the repository.
@misc{Jiarong2025Toward,
author = {Xu, Jiarong and Wang, Jiaan and Zhou, Zenan and Lu, Tian},
publisher = {INFORMS Journal on Computing},
title = {{Toward Graph Data Collaboration in a Data-Sharing-Free Manner: A Novel Privacy-Preserving Graph Pre-training Model}},
year = {2025},
doi = {10.1287/ijoc.2023.0115.cd},
url = {https://github.com/INFORMSJoC/2023.0115},
note = {Available for download at https://github.com/INFORMSJoC/2023.0115},
}
We use eight datasets in the experiments: Deezer, Facebook, LastFM, DBLP, Amazon, Twitter, Twitter-Foursquare and Phone-Email.
Among them, we provide the data files of Deezer, Facebook, LastFM, Twitter-Foursquare and Phone-Email in the data/
folder. There datasets are all open-source datasets, we also list their original sources:
- Deezer: https://snap.stanford.edu/data/feather-deezer-social.html
- Facebook: https://snap.stanford.edu/data/facebook-large-page-page-network.html
- LastFM: https://snap.stanford.edu/data/feather-lastfm-social.html
- Twitter-Foursquare: https://github.com/zhichenz98/PARROT-WWW23/tree/master/datasets
- Phone-Email:https://github.com/zhichenz98/PARROT-WWW23/tree/master/datasets
For DBLP, Amazon and Twitter, due to the large datasize, we only list their original sources:
- DBLP: https://snap.stanford.edu/data/com-DBLP.html
- Amazon: https://snap.stanford.edu/data/amazon-meta.html
- Twitter: https://snap.stanford.edu/data/ego-Twitter.html
You should install the following packages in the environment:
- python >= 3.8
- torch >= 1.11.0
- dgl==0.4.3
- torch_geometric==2.0.4 scikit-learn==0.20.3 scipy==1.4.1 coverage==4.5.4 coveralls==1.9.2 black==19.3b0 pytest==5.3.2 networkx==2.3 numpy==1.18.2 tensorboard_logger==0.1.0
The data processing script is provided in scripts/generate_data.py
. For each dataset, we use this script to generate the pretraining and downstream graphs. The run command is shown as follows:
python -u scripts/generate_data.py --dataset [dateset_name]
The structure of GIN network and graph encoder are provided in scripts/graph_encoder_edge_weighted.py
(class GraphEncoder_Edge_Weighted).
To run link prediction task:
bash scripts/link_prediction <gpu> <load_path> <hidden_size> <dowstream_dataset> <pretraining_dataset>
Tu run node classification task:
bash scripts/node_classification.sh <gpu> <load_path> <hidden_size> <downstream_dataset>
- For the implementation of GAL-W and GAL-TV, please refer to GAL.
- For the implementation of EdgeRand and LapGraph, please refer to LinkTeller.
- For the implementation of GCC, please refer to GCC
The implementation of GIN architecture (scripts/gin_edge_weighted.py
) borrows from GCC, and we comply with the corresponding MIT LICENSE in our repository.