Graph Inference Data Valuation Framework (SVGL)

Official implementation of "Shapley-Guided Utility Learning for Effective Graph Inference Data Valuation".

Overview

SVGL is a novel framework for quantifying the importance of test-time neighbors in Graph Neural Networks (GNNs). It addresses the challenge of evaluating data importance without test labels by:

Transferable Feature Extraction: Combining data-specific and model-specific features to approximate test accuracy
Shapley-guided Optimization: Directly optimizing Shapley value prediction through feature Shapley decomposition
Structure-Aware Valuation: Respecting graph connectivity constraints in value estimation

Installation

# Clone the repository
git clone https://github.com/frankhlchi/Inference-Graph-Data-Valuation.git
cd Inference-Graph-Data-Valuation

# Create conda environment (recommended)
conda activate pydvl  # or create new: conda create -n svgl python=3.9

# Install dependencies
pip install -r requirements.txt

# Install SVGL in development mode
pip install -e .

Quick Start

Run Demo (Recommended)

# Quick demo on Cora dataset (takes ~2 minutes)
python scripts/run_demo.py --dataset Cora --num_samples 5

# Full experiment with more samples
python scripts/run_demo.py --dataset Cora --num_samples 30

Large-Scale Reproduction

# Run multiple datasets / seeds (writes to outputs/reproduction/<timestamp>/)
python scripts/run_parallel_experiments.py \
  --datasets Cora Citeseer Pubmed CS Physics Computers Photo WikiCS \
  --seeds 0 1 2 --num_samples 30 --device cuda:0 --jobs 1

Node-Dropping Evaluation (AUC, Paper Metric)

After run_parallel_experiments.py finishes (or for any completed subset), compute node-dropping AUC from the saved test_samples/ and per-run result.json:

python scripts/eval_node_dropping.py \
  --run_dir outputs/reproduction/<timestamp> \
  --datasets Cora Citeseer Pubmed \
  --seeds 0 1 2 --device cuda:0 --stride 1

This writes per-seed curves and an aggregated summary to: outputs/reproduction/<timestamp>/node_dropping/SUMMARY.md.

Optional: Base GNN Hyperparameter Tuning

The original codebase includes a grid-search step for the base GNN. This repo supports the same workflow via scripts/tune_gnn_hparams.py, and scripts/run_parallel_experiments.py will automatically apply matching overrides from configs/best_hparams.yaml (empty by default).

# Grid-search base GNN hyperparameters and write to configs/best_hparams.yaml (time-consuming)
python scripts/tune_gnn_hparams.py --datasets Cora Citeseer Pubmed --seed 0 --device cuda:0

To disable overrides and force the CLI flags/defaults, pass --no_best_hparams.

Python API

from svgl import load_dataset, preprocess_data, create_model
from svgl.valuation import sample_permutations, shapley_regression
from svgl.utils import fix_seed, get_device

# Setup
fix_seed(42)
device = get_device('auto')

# Load and preprocess data
dataset = load_dataset('Cora', root='./data/')
data = dataset[0].to(device)
split_data = preprocess_data('Cora', setting='inductive', use_pmlp=True)

# Train GNN model
model = create_model('sgc', data.num_features, dataset.num_classes)
# ... training and valuation

Project Structure

Inference-Graph-Data-Valuation/
├── svgl/                       # Main package
│   ├── data/                   # Data loading and preprocessing
│   ├── evaluation/             # Node-dropping evaluation (AUC)
│   ├── models/                 # SGC, GCN, LASSO + tuning
│   ├── valuation/              # Sampling, features, Shapley + SVGL regression
│   └── utils/                  # Helper functions
├── scripts/                    # Runnable entrypoints
│   ├── run_demo.py
│   ├── run_parallel_experiments.py
│   ├── eval_node_dropping.py
│   ├── tune_gnn_hparams.py
│   └── check_progress.sh
├── configs/                    # Config files
│   ├── default.yaml
│   ├── hparam_search.yaml
│   └── best_hparams.yaml
└── outputs/                    # Results directory

Supported Datasets

Category	Datasets
Planetoid	Cora, Citeseer, Pubmed
Coauthor	CS, Physics
Heterophilous	Roman-empire, Amazon-ratings
OGB (appendix)	ogbn-arxiv

Citation

@article{chi2025shapley,
  title={Shapley-Guided Utility Learning for Effective Graph Inference Data Valuation},
  author={Chi, Hongliang and Wu, Qiong and Zhou, Zhengyi and Ma, Yao},
  journal={arXiv preprint arXiv:2503.18195},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
configs		configs
scripts		scripts
svgl		svgl
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph Inference Data Valuation Framework (SVGL)

Overview

Installation

Quick Start

Run Demo (Recommended)

Large-Scale Reproduction

Node-Dropping Evaluation (AUC, Paper Metric)

Optional: Base GNN Hyperparameter Tuning

Python API

Project Structure

Supported Datasets

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

frankhlchi/Inference-Graph-Data-Valuation

Folders and files

Latest commit

History

Repository files navigation

Graph Inference Data Valuation Framework (SVGL)

Overview

Installation

Quick Start

Run Demo (Recommended)

Large-Scale Reproduction

Node-Dropping Evaluation (AUC, Paper Metric)

Optional: Base GNN Hyperparameter Tuning

Python API

Project Structure

Supported Datasets

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages