A Kubernetes Operator that implements data-locality-aware workload scheduling for distributed physics data lakes modelled on the WLCG/Rucio storage topology.
When a physicist submits a PhysicsJob referencing a Rucio dataset
(scope:name format), the operator:
- Resolves which RSE (Rucio Storage Element) holds the primary replica
- Maps that RSE to a Kubernetes node via the
topology.cern.io/sitelabel - Creates an owned
batch/v1.JobwithNodeAffinityconstraints injected — so compute runs co-located with the data, avoiding WAN transfers entirely
Standard kube-scheduler has no storage-topology awareness. This operator
provides that injection automatically, closing the gap between data placement
(Rucio) and compute placement (Kubernetes).
Three install paths, in order of simplicity. All require an existing
Kubernetes (or OpenShift) cluster ≥ v1.28. The container image is
multi-arch (linux/amd64, linux/arm64) and hosted at
ghcr.io/karansinghdev/data-gravity-operator.
kubectl apply -f \
https://github.com/KaranSinghDev/data-gravity-operator/releases/latest/download/install.yamlThis installs the namespace, CRD, RBAC, and the operator Deployment in
one shot. Edit the deployment afterwards to set --rucio-url for your
environment, or override via kubectl set env.
helm repo add data-gravity https://karansinghdev.github.io/data-gravity-operator
helm repo update
helm install data-gravity data-gravity/data-gravity-operator \
--namespace data-gravity-system --create-namespace \
--set rucioURL=https://rucio.cern.chThe Helm path gives you values.yaml for image overrides, replica
count, leader election, metrics TLS, and an in-chart mock-rucio toggle
(--set mockRucio.enabled=true) for evaluation environments.
# 1. Install the local toolchain (Go 1.24, kubebuilder, kind, kubectl, helm)
bash scripts/setup-env.sh
source scripts/env.sh
# 2. Spin up a 4-node kind cluster + build/load operator image
bash scripts/setup-kind.sh
# 3. End-to-end demo: helm install + submit PhysicsJob + verify NodeAffinity routing
bash scripts/demo.shFor development against a running kind cluster (skip the docker rebuild
loop), use bash scripts/dev-run.sh to run the manager binary on your
host with a port-forwarded mock-rucio.
The kind cluster has four worker nodes labelled with realistic WLCG sites:
| Node | Label |
|---|---|
| worker-0 | topology.cern.io/site=cern-prod |
| worker-1 | topology.cern.io/site=bnl-osg2 |
| worker-2 | topology.cern.io/site=in2p3-cc |
| worker-3 | topology.cern.io/site=triumf-lcg2 |
apiVersion: hep.cern.local/v1alpha1
kind: PhysicsJob
metadata:
name: atlas-daod-sample
spec:
# Rucio DID — scope:name format
dataset: "data23_13p6TeV:DAOD_PHYS.123456"
# Container image for the compute workload
image: "gitlab-registry.cern.ch/atlas/athena:24.0.12"
command: ["Reco_tf.py", "--inputAODFile", "/data/input.AOD.pool.root"]
# DataLocal | ClosestSite | AnyAvailable
schedulingPolicy: DataLocal
resources:
requests:
cpu: "2"
memory: "4Gi"Inspect:
kubectl get pj # shortName for physicsjobNAME DATASET PHASE RSE NODE
atlas-daod-sample data23_13p6TeV:DAOD_PHYS.123456 Scheduled CERN-PROD_DATADISK worker-0
| Policy | Behaviour |
|---|---|
DataLocal (default) |
Hard-pins compute to the node whose topology.cern.io/site matches the primary RSE |
ClosestSite |
Same as DataLocal; extension point for a geo-distance ranking across replicas |
AnyAvailable |
No affinity injected; scheduler places freely; RSE still recorded for observability |
| Metric | Type | Labels |
|---|---|---|
physjob_reconcile_total |
Counter | result |
physjob_reconcile_duration_seconds |
Histogram | — |
physjob_resolved_total |
Counter | rse, policy |
physjob_resolution_failures_total |
Counter | reason |
physjob_data_transfer_avoided_bytes |
Counter | rse |
The physjob_data_transfer_avoided_bytes counter accumulates estimated bytes
of WAN transfer eliminated by data-local scheduling. For a typical ATLAS DAOD
dataset (~2.5 TB), a single data-local job avoids 2.5 TB of inter-site traffic.
See docs/architecture.md for the full component
diagram, reconcile-loop pseudocode, and data-flow explanation.
api/v1alpha1/ CRD types (PhysicsJobSpec, PhysicsJobStatus, Phase enum)
internal/controller/ Reconciler + Ginkgo/envtest suite
internal/storage/ StorageTopologyClient interface + Rucio HTTP client
internal/scheduling/ NodeAffinity builder
internal/metrics/ Prometheus registrations
internal/mockrucio/ Mock Rucio API — 9 ATLAS/CMS/LHCb datasets
cmd/main.go Manager entrypoint (--rucio-url flag)
cmd/mock-rucio/main.go Standalone mock-rucio server
config/ Generated CRD + RBAC + sample CR
deploy/ kind cluster config + mock-rucio Kubernetes manifest
helm/data-gravity-operator/ Helm chart (CRD in crds/, RBAC, Deployment, optional mock)
scripts/ setup-env.sh setup-kind.sh demo.sh dev-run.sh
docs/ Architecture doc + Mermaid diagram
| Component | Version |
|---|---|
| Go | 1.24 |
| controller-runtime | v0.21 |
| Kubernetes API | v0.33 (1.33) |
| Ginkgo / Gomega | v2 / v1 |
| Prometheus client | v1.22 |
| kubebuilder scaffold | v4.6 |
| kind | v0.26 |
| Helm | 3.17 |
If you use data-gravity-operator in academic work, please cite it via the
metadata in CITATION.cff. Each tagged GitHub release is
archived on Zenodo with a DOI; replace the DOI below with the version-specific
one for the release you used.
@software{singh_data_gravity_operator_2026,
author = {Singh, Karan},
title = {data-gravity-operator: Data-Locality-Aware Workload
Scheduling for Kubernetes on WLCG Data Lakes},
year = 2026,
publisher = {Zenodo},
version = {0.1.0},
url = {https://github.com/KaranSinghDev/data-gravity-operator}
}See CONTRIBUTING.md for development setup and
contribution guidelines. Maintainers cutting a release should follow
RELEASING.md.
Licensed under the Apache License, Version 2.0. See LICENSE for
the full text. All third-party dependencies are also Apache 2.0 or
compatible permissive licenses.