Cryptographic provenance verification and binary inspection for ML model artifacts in CI/CD pipelines.
Maintained by Sunil Gentyala | github.com/sunilgentyala
GGUF, Safetensors, and PyTorch checkpoint files downloaded from public model hubs represent an active and largely unmonitored supply chain risk in enterprise MLOps. Traditional SAST, DAST, and antivirus tooling cannot inspect binary weight formats. This repository provides a production-ready GitHub Actions workflow and a Python inspection script that enforce cryptographic hash verification, Sigstore/Cosign signature checking, pickle opcode scanning, and Safetensors header anomaly detection before any model artifact is admitted into a downstream pipeline.
This toolkit is the operational companion to the Help Net Security column "Weaponized Weights: The Impending Supply Chain Crisis in GGUF and Safetensors" by Sunil Gentyala.
model-provenance-guard/
├── .github/
│ └── workflows/
│ └── model_scan.yml # Full CI/CD provenance gate workflow
├── scripts/
│ └── verify_weights.py # Safetensors header inspector and anomaly detector
├── registry/
│ └── trusted_models.json # Internal trusted model hash registry (template)
├── docs/
│ └── threat-model.md # Format-level threat model for GGUF, ST, PT
├── requirements.txt # Python dependencies
├── .gitignore
└── README.md
| Control | Formats | Tool |
|---|---|---|
| SHA-256 hash verification | All | Python hashlib + trusted_models.json |
| Cosign signature verification | All | sigstore/cosign-installer |
| Pickle opcode scanning | .pt / .pth | picklescan |
| Safetensors header inspection | .safetensors | verify_weights.py |
| GGUF magic byte validation | .gguf | verify_weights.py |
| Artifact quarantine on failure | All | Workflow failure + annotated summary |
- GitHub Actions runner with Python 3.9 or later
cosignavailable in your runner environment (installed by the workflow viasigstore/cosign-installer)- A
trusted_models.jsonregistry populated with approved model hashes (see template inregistry/)
Edit registry/trusted_models.json to include the SHA-256 hash of every model artifact your organization has reviewed and approved:
{
"models": [
{
"name": "mistral-7b-instruct-v0.3",
"filename": "model.safetensors",
"sha256": "a1b2c3d4e5f6...",
"source_url": "https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3",
"reviewed_by": "engineer@yourorg.com",
"reviewed_at": "2025-04-15T10:00:00Z",
"cosign_bundle": "model.safetensors.bundle"
}
]
}The hash value must be computed by a human reviewer from the artifact at time of initial review, not sourced from the model card on Hugging Face. The registry itself should be protected by branch protection rules requiring at least one reviewer approval on any pull request that modifies it.
Copy .github/workflows/model_scan.yml into your repository. The workflow triggers on pull requests that modify any path containing model artifacts and also runs on a nightly schedule against your staging model cache.
Set the following repository secrets:
| Secret | Purpose |
|---|---|
COSIGN_PUBLIC_KEY |
PEM-encoded public key for verifying model signatures |
MODEL_REGISTRY_PATH |
Path to your trusted_models.json in the runner environment |
pip install -r requirements.txt
python scripts/verify_weights.py --file /path/to/model.safetensors --registry registry/trusted_models.jsonFor GGUF files:
python scripts/verify_weights.py --file /path/to/model.gguf --registry registry/trusted_models.jsonFor PyTorch checkpoints, the workflow automatically runs picklescan. You can also invoke it directly:
pip install picklescan
picklescan -p /path/to/model.ptSee docs/threat-model.md for the full format-level attack surface analysis covering pickle deserialization in .pt files, GGUF metadata injection, Safetensors header manipulation, and neural backdoor scenarios.
The workflow is designed to sit at the model intake boundary: between the artifact download step and the step that registers the model in your internal model registry (MLflow, Weights and Biases, SageMaker Model Registry, or equivalent). Any failure in the provenance gate should block promotion. Configure your orchestration layer (Kubeflow Pipelines, Metaflow, Airflow) to treat a non-zero exit from this workflow as a hard stop, not a warning.
Pull requests are welcome for additional format inspectors (ONNX, TensorFlow SavedModel) and for integrations with specific model hub SDKs. Please open an issue first to discuss scope.
MIT License. See LICENSE.
Sunil Gentyala Lead Cybersecurity and AI Security Consultant, HCLTech IEEE Senior Member | Cloud Security Alliance Representative | ISACA Professional Member Creator: ContextGuard (zero-trust MCP middleware) | GSH Framework (agentic AI threat hunting)
linkedin.com/in/sunil-gentyala | github.com/sunilgentyala email: sunil.gentyala@ieee.org