Skip to content

mkozlik/MIDS-GNN

Repository files navigation

MIDS-GNN

Requirements

Python packages:

  • codetiming
  • matplotlib
  • networkx
  • pandas
  • plotly
  • torch
  • torch_geometric
  • tqdm
  • wandb

Graphs dataset repository: https://github.com/mkrizmancic/my_graphs_dataset

Repository organization

File/Directory Description
Root Files
analyze_accuracies.ipynb Downloads experiment results from Weights & Biases, processes sweep data, creates result tables, and downloads trained models
analyze_runtimes.ipynb Runtime analysis comparing GNN models with baseline methods (Gurobi, Bron-Kerbosch, ILPS), generates execution time tables and plots
convert_model.py Utility to convert saved model state dict to a full model
detailed_analysis.ipynb Detailed analysis of model results, generates visualizations and detailed performance metrics (not included in the paper)
MIDS_dataset.py Core dataset implementation with classes for MIDS (Minimum Independent Dominating Set) datasets: MIDSDataset, MIDSProbabilitiesDataset, MIDSLabelsDataset, and feature filtering transforms
MIDS_script.py Main training script for GNN models with hyperparameter optimization, model training/evaluation, loss functions, and dataset loading functionality
README.md Repository documentation explaining organization, how to replicate paper results, and detailed analysis instructions
run_sweep.sh Bash script to create and run Weights & Biases hyperparameter sweeps for model optimization
Config/ Configuration files for hyperparameter sweeps
Dataset/ Contains processed datasets and metadata
HPC/ High Performance Computing cluster files
Models/ Trained model files
*.pth files Various trained GNN models with different configurations and hyperparameters
best_model.pth Best performing model
prob_model_best.pth Best probability prediction model
Results/ Experiment results and analysis outputs
Tests/ Test files
test_custom_loss.py Unit tests for custom loss function implementations
Utilities/ Utility modules and helper functions
baselines.py Baseline algorithm implementations (Gurobi optimization, ILPS algorithm)
evaluate_model.py Model evaluation script for testing trained models on datasets
gnn_models.py GNN model definitions including custom architectures (GATLinNet, CombinedGNN) and wrappers
graph_utils.py Graph processing utilities, visualization functions, and graph data manipulation
mids_utils.py MIDS-specific utilities including MIDS finding algorithms and validation functions
script_utils.py General script utilities for dataset splitting and common operations
setup_environment.sh Environment setup script for dependencies and configuration

Replicating the paper results

Table I

We ran the hyperparameter optimization on Weights & Biases (wandb).

There were initially 2 sweeps:

  1. using the single label approach (wandb sweep HPC_labels_single)
  2. using the multi-label approach (wandb sweep HPC_labels_all)

These sweeps used only the decision GNN (5 layers) when no probability feature was used. For fair comparison, we ran 2 additional sweeps testing the end-to-end model (10 layers) with a bottleneck in the middle to mimic the behavior of the initial model consisting of a probability GNN and a decision GNN:

  1. using the single label approach (wandb sweep HPC_end_to_end_single_squeezed)
  2. using the multi-label approach (wandb sweep HPC_end_to_end_squeezed)

Using the analyze_accuracies.ipynb notebook, download each run and store the results in a csv file. The notebook will download runs from all mentioned sweeps.

  • labels_all_best.csv - multi-label, best runs for each architecture based on validation loss
  • labels_all_full.csv - multi-label, all runs for each architecture
  • labels_single_best.csv - single label, best runs for each architecture based on validation loss
  • labels_single_full.csv - single label, all runs for each architecture
  • combined_best.csv - combined labels_all_best.csv and labels_single_best.csv for comparison These csv files are available in the Results/ directory, so you can skip the download step.

The notebook will output a table for None (uniform), None (mixed), Noisy, Predicted, and True features.

Finally, the notebook downloads all the best models.

Fig. 3

The results for Fig. 3 are generated by the analyze_runtimes.ipynb notebook.

We use only the multi-label, predicted probability approach.

Running the analyze_accuracies.ipynb notebook will download the best probability model and best decision models for each GNN architecture.

To avoid repeatedly making data on the fly, we set up the following pipeline:

  1. pass every example from probability dataset through the probability model
  2. record probability model execution time, but ignore the output
  3. pass the same example (same graph) from the decision dataset through the decision model (input features are the same in both datasets, but the decision dataset additionally has the probability feature which is the same as the output from the probability model)
  4. record decision model execution time and store the prediction
  5. the total execution time is the sum of the probability model execution time and the decision model execution time

Table II

The analyze_runtimes.ipynb notebook generates the results for Table II using the same approach as described above.

Detailed results (not included in the paper)

Input the best model W&B run ID and run analyze_detailed.ipynb.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •