Skip to content

KaranSinghDev/Graviton-Data-Local-Scheduler

Repository files navigation

data-gravity-operator

License: Apache 2.0 Tests Lint Container Helm Chart

A Kubernetes Operator that implements data-locality-aware workload scheduling for distributed physics data lakes modelled on the WLCG/Rucio storage topology.

When a physicist submits a PhysicsJob referencing a Rucio dataset (scope:name format), the operator:

  1. Resolves which RSE (Rucio Storage Element) holds the primary replica
  2. Maps that RSE to a Kubernetes node via the topology.cern.io/site label
  3. Creates an owned batch/v1.Job with NodeAffinity constraints injected — so compute runs co-located with the data, avoiding WAN transfers entirely

Standard kube-scheduler has no storage-topology awareness. This operator provides that injection automatically, closing the gap between data placement (Rucio) and compute placement (Kubernetes).


Installation

Three install paths, in order of simplicity. All require an existing Kubernetes (or OpenShift) cluster ≥ v1.28. The container image is multi-arch (linux/amd64, linux/arm64) and hosted at ghcr.io/karansinghdev/data-gravity-operator.

Quick install — one-liner (kubectl)

kubectl apply -f \
  https://github.com/KaranSinghDev/data-gravity-operator/releases/latest/download/install.yaml

This installs the namespace, CRD, RBAC, and the operator Deployment in one shot. Edit the deployment afterwards to set --rucio-url for your environment, or override via kubectl set env.

Production install — Helm chart

helm repo add data-gravity https://karansinghdev.github.io/data-gravity-operator
helm repo update

helm install data-gravity data-gravity/data-gravity-operator \
  --namespace data-gravity-system --create-namespace \
  --set rucioURL=https://rucio.cern.ch

The Helm path gives you values.yaml for image overrides, replica count, leader election, metrics TLS, and an in-chart mock-rucio toggle (--set mockRucio.enabled=true) for evaluation environments.

Local evaluation — kind cluster

# 1. Install the local toolchain (Go 1.24, kubebuilder, kind, kubectl, helm)
bash scripts/setup-env.sh
source scripts/env.sh

# 2. Spin up a 4-node kind cluster + build/load operator image
bash scripts/setup-kind.sh

# 3. End-to-end demo: helm install + submit PhysicsJob + verify NodeAffinity routing
bash scripts/demo.sh

For development against a running kind cluster (skip the docker rebuild loop), use bash scripts/dev-run.sh to run the manager binary on your host with a port-forwarded mock-rucio.

The kind cluster has four worker nodes labelled with realistic WLCG sites:

Node Label
worker-0 topology.cern.io/site=cern-prod
worker-1 topology.cern.io/site=bnl-osg2
worker-2 topology.cern.io/site=in2p3-cc
worker-3 topology.cern.io/site=triumf-lcg2

Custom Resource: PhysicsJob

apiVersion: hep.cern.local/v1alpha1
kind: PhysicsJob
metadata:
  name: atlas-daod-sample
spec:
  # Rucio DID — scope:name format
  dataset: "data23_13p6TeV:DAOD_PHYS.123456"
  # Container image for the compute workload
  image: "gitlab-registry.cern.ch/atlas/athena:24.0.12"
  command: ["Reco_tf.py", "--inputAODFile", "/data/input.AOD.pool.root"]
  # DataLocal | ClosestSite | AnyAvailable
  schedulingPolicy: DataLocal
  resources:
    requests:
      cpu: "2"
      memory: "4Gi"

Inspect:

kubectl get pj   # shortName for physicsjob
NAME               DATASET                                  PHASE       RSE                   NODE
atlas-daod-sample  data23_13p6TeV:DAOD_PHYS.123456         Scheduled   CERN-PROD_DATADISK    worker-0

Scheduling policies

Policy Behaviour
DataLocal (default) Hard-pins compute to the node whose topology.cern.io/site matches the primary RSE
ClosestSite Same as DataLocal; extension point for a geo-distance ranking across replicas
AnyAvailable No affinity injected; scheduler places freely; RSE still recorded for observability

Prometheus metrics

Metric Type Labels
physjob_reconcile_total Counter result
physjob_reconcile_duration_seconds Histogram
physjob_resolved_total Counter rse, policy
physjob_resolution_failures_total Counter reason
physjob_data_transfer_avoided_bytes Counter rse

The physjob_data_transfer_avoided_bytes counter accumulates estimated bytes of WAN transfer eliminated by data-local scheduling. For a typical ATLAS DAOD dataset (~2.5 TB), a single data-local job avoids 2.5 TB of inter-site traffic.


Architecture

See docs/architecture.md for the full component diagram, reconcile-loop pseudocode, and data-flow explanation.


Repository layout

api/v1alpha1/               CRD types (PhysicsJobSpec, PhysicsJobStatus, Phase enum)
internal/controller/        Reconciler + Ginkgo/envtest suite
internal/storage/           StorageTopologyClient interface + Rucio HTTP client
internal/scheduling/        NodeAffinity builder
internal/metrics/           Prometheus registrations
internal/mockrucio/         Mock Rucio API — 9 ATLAS/CMS/LHCb datasets
cmd/main.go                 Manager entrypoint (--rucio-url flag)
cmd/mock-rucio/main.go      Standalone mock-rucio server
config/                     Generated CRD + RBAC + sample CR
deploy/                     kind cluster config + mock-rucio Kubernetes manifest
helm/data-gravity-operator/ Helm chart (CRD in crds/, RBAC, Deployment, optional mock)
scripts/                    setup-env.sh  setup-kind.sh  demo.sh  dev-run.sh
docs/                       Architecture doc + Mermaid diagram

Tech stack

Component Version
Go 1.24
controller-runtime v0.21
Kubernetes API v0.33 (1.33)
Ginkgo / Gomega v2 / v1
Prometheus client v1.22
kubebuilder scaffold v4.6
kind v0.26
Helm 3.17

Citing this work

If you use data-gravity-operator in academic work, please cite it via the metadata in CITATION.cff. Each tagged GitHub release is archived on Zenodo with a DOI; replace the DOI below with the version-specific one for the release you used.

@software{singh_data_gravity_operator_2026,
  author       = {Singh, Karan},
  title        = {data-gravity-operator: Data-Locality-Aware Workload
                  Scheduling for Kubernetes on WLCG Data Lakes},
  year         = 2026,
  publisher    = {Zenodo},
  version      = {0.1.0},
  url          = {https://github.com/KaranSinghDev/data-gravity-operator}
}

Contributing

See CONTRIBUTING.md for development setup and contribution guidelines. Maintainers cutting a release should follow RELEASING.md.


License

Licensed under the Apache License, Version 2.0. See LICENSE for the full text. All third-party dependencies are also Apache 2.0 or compatible permissive licenses.

About

Kubernetes operator that eliminates WAN data transfers by scheduling compute co-located with its storage replicas, built for distributed physics data lakes.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors