Python packages:
- codetiming
- matplotlib
- networkx
- pandas
- plotly
- torch
- torch_geometric
- tqdm
- wandb
Graphs dataset repository: https://github.com/mkrizmancic/my_graphs_dataset
| File/Directory | Description |
|---|---|
| Root Files | |
analyze_accuracies.ipynb |
Downloads experiment results from Weights & Biases, processes sweep data, creates result tables, and downloads trained models |
analyze_runtimes.ipynb |
Runtime analysis comparing GNN models with baseline methods (Gurobi, Bron-Kerbosch, ILPS), generates execution time tables and plots |
convert_model.py |
Utility to convert saved model state dict to a full model |
detailed_analysis.ipynb |
Detailed analysis of model results, generates visualizations and detailed performance metrics (not included in the paper) |
MIDS_dataset.py |
Core dataset implementation with classes for MIDS (Minimum Independent Dominating Set) datasets: MIDSDataset, MIDSProbabilitiesDataset, MIDSLabelsDataset, and feature filtering transforms |
MIDS_script.py |
Main training script for GNN models with hyperparameter optimization, model training/evaluation, loss functions, and dataset loading functionality |
README.md |
Repository documentation explaining organization, how to replicate paper results, and detailed analysis instructions |
run_sweep.sh |
Bash script to create and run Weights & Biases hyperparameter sweeps for model optimization |
| Config/ | Configuration files for hyperparameter sweeps |
| Dataset/ | Contains processed datasets and metadata |
| HPC/ | High Performance Computing cluster files |
| Models/ | Trained model files |
*.pth files |
Various trained GNN models with different configurations and hyperparameters |
best_model.pth |
Best performing model |
prob_model_best.pth |
Best probability prediction model |
| Results/ | Experiment results and analysis outputs |
| Tests/ | Test files |
test_custom_loss.py |
Unit tests for custom loss function implementations |
| Utilities/ | Utility modules and helper functions |
baselines.py |
Baseline algorithm implementations (Gurobi optimization, ILPS algorithm) |
evaluate_model.py |
Model evaluation script for testing trained models on datasets |
gnn_models.py |
GNN model definitions including custom architectures (GATLinNet, CombinedGNN) and wrappers |
graph_utils.py |
Graph processing utilities, visualization functions, and graph data manipulation |
mids_utils.py |
MIDS-specific utilities including MIDS finding algorithms and validation functions |
script_utils.py |
General script utilities for dataset splitting and common operations |
setup_environment.sh |
Environment setup script for dependencies and configuration |
We ran the hyperparameter optimization on Weights & Biases (wandb).
There were initially 2 sweeps:
- using the single label approach (wandb sweep
HPC_labels_single) - using the multi-label approach (wandb sweep
HPC_labels_all)
These sweeps used only the decision GNN (5 layers) when no probability feature was used. For fair comparison, we ran 2 additional sweeps testing the end-to-end model (10 layers) with a bottleneck in the middle to mimic the behavior of the initial model consisting of a probability GNN and a decision GNN:
- using the single label approach (wandb sweep
HPC_end_to_end_single_squeezed) - using the multi-label approach (wandb sweep
HPC_end_to_end_squeezed)
Using the analyze_accuracies.ipynb notebook, download each run and store the results in a csv file. The notebook will download runs from all mentioned sweeps.
labels_all_best.csv- multi-label, best runs for each architecture based on validation losslabels_all_full.csv- multi-label, all runs for each architecturelabels_single_best.csv- single label, best runs for each architecture based on validation losslabels_single_full.csv- single label, all runs for each architecturecombined_best.csv- combinedlabels_all_best.csvandlabels_single_best.csvfor comparison These csv files are available in theResults/directory, so you can skip the download step.
The notebook will output a table for None (uniform), None (mixed), Noisy, Predicted, and True features.
Finally, the notebook downloads all the best models.
The results for Fig. 3 are generated by the analyze_runtimes.ipynb notebook.
We use only the multi-label, predicted probability approach.
Running the analyze_accuracies.ipynb notebook will download the best probability model and best decision models for each GNN architecture.
To avoid repeatedly making data on the fly, we set up the following pipeline:
- pass every example from probability dataset through the probability model
- record probability model execution time, but ignore the output
- pass the same example (same graph) from the decision dataset through the decision model (input features are the same in both datasets, but the decision dataset additionally has the probability feature which is the same as the output from the probability model)
- record decision model execution time and store the prediction
- the total execution time is the sum of the probability model execution time and the decision model execution time
The analyze_runtimes.ipynb notebook generates the results for Table II using the same approach as described above.
Input the best model W&B run ID and run analyze_detailed.ipynb.