Skip to content

INFORMSJoC/2023.0115

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

INFORMS Journal on Computing Logo

Toward Graph Data Collaboration in a Data-Sharing-Free Manner: A Novel Privacy-Preserving Graph Pre-training Model

This archive is distributed in association with the INFORMS Journal on Computing under the MIT License.

The software and data in this repository are a snapshot of the software and data that were used in the research reported on in the paper Toward Graph Data Collaboration in a Data-Sharing-Free Manner: A Novel Privacy-Preserving Graph Pre-training Model by Jiarong Xu, Jiaan Wang, Zenan Zhou and Tian Lu.

Cite

To cite the contents of this repository, please cite both the paper and this repo, using their respective DOIs.

https://doi.org/10.1287/ijoc.2023.0115

https://doi.org/10.1287/ijoc.2023.0115.cd

Below is the BibTex for citing this snapshot of the repository.

@misc{Jiarong2025Toward,
  author =        {Xu, Jiarong and Wang, Jiaan and Zhou, Zenan and Lu, Tian},
  publisher =     {INFORMS Journal on Computing},
  title =         {{Toward Graph Data Collaboration in a Data-Sharing-Free Manner: A Novel Privacy-Preserving Graph Pre-training Model}},
  year =          {2025},
  doi =           {10.1287/ijoc.2023.0115.cd},
  url =           {https://github.com/INFORMSJoC/2023.0115},
  note =          {Available for download at https://github.com/INFORMSJoC/2023.0115},
}  

Data

We use eight datasets in the experiments: Deezer, Facebook, LastFM, DBLP, Amazon, Twitter, Twitter-Foursquare and Phone-Email.

Among them, we provide the data files of Deezer, Facebook, LastFM, Twitter-Foursquare and Phone-Email in the data/ folder. There datasets are all open-source datasets, we also list their original sources:

For DBLP, Amazon and Twitter, due to the large datasize, we only list their original sources:

Replication

Environment

You should install the following packages in the environment:

  • python >= 3.8
  • torch >= 1.11.0
  • dgl==0.4.3
  • torch_geometric==2.0.4 scikit-learn==0.20.3 scipy==1.4.1 coverage==4.5.4 coveralls==1.9.2 black==19.3b0 pytest==5.3.2 networkx==2.3 numpy==1.18.2 tensorboard_logger==0.1.0

Scripts

data processing

The data processing script is provided in scripts/generate_data.py. For each dataset, we use this script to generate the pretraining and downstream graphs. The run command is shown as follows:

python -u scripts/generate_data.py --dataset [dateset_name]

Model Structure

The structure of GIN network and graph encoder are provided in scripts/graph_encoder_edge_weighted.py (class GraphEncoder_Edge_Weighted).

Downstream Tasks

To run link prediction task:

bash scripts/link_prediction <gpu> <load_path> <hidden_size> <dowstream_dataset> <pretraining_dataset>

Tu run node classification task:

bash scripts/node_classification.sh <gpu> <load_path> <hidden_size> <downstream_dataset>

Baselines

  • For the implementation of GAL-W and GAL-TV, please refer to GAL.
  • For the implementation of EdgeRand and LapGraph, please refer to LinkTeller.
  • For the implementation of GCC, please refer to GCC

Acknowledgements

The implementation of GIN architecture (scripts/gin_edge_weighted.py) borrows from GCC, and we comply with the corresponding MIT LICENSE in our repository.