Problem
When Iris creates GCE instances (controller VM) and TPU VMs via gcloud compute instances create / gcloud compute tpus tpu-vm create, no --service-account flag is passed. This means all resources run under the Compute Engine default service account (748532799086-compute@developer.gserviceaccount.com), which typically has roles/editor — far more permissions than needed.
Proposed Change
Pass an explicit --service-account=<sa> flag when creating controller VMs and TPU worker slices in lib/iris/src/iris/cluster/platform/gcp.py. The SA should be configurable in the cluster config YAML (e.g., platform.gcp.service_account), falling back to the default compute SA if unset for backward compatibility.
This enables:
- Least privilege: controller and workers only get the permissions they actually need (e.g., pull container images, write logs)
- Audit clarity: resource actions in Cloud Audit Logs are attributed to a purpose-specific SA
- CI isolation: the CI smoke test SA only needs
serviceAccountUser on a narrow-scoped runtime SA, not the powerful default compute SA
Files to Change
lib/iris/src/iris/cluster/platform/gcp.py — add --service-account to create_slice, create_vm_slice, and controller VM creation
lib/iris/protos/config.proto (or equivalent) — add service_account field to GcpPlatformConfig
lib/iris/examples/smoke.yaml, coreweave.yaml, etc. — document the new field
🤖 Generated with Claude Code
Problem
When Iris creates GCE instances (controller VM) and TPU VMs via
gcloud compute instances create/gcloud compute tpus tpu-vm create, no--service-accountflag is passed. This means all resources run under the Compute Engine default service account (748532799086-compute@developer.gserviceaccount.com), which typically hasroles/editor— far more permissions than needed.Proposed Change
Pass an explicit
--service-account=<sa>flag when creating controller VMs and TPU worker slices inlib/iris/src/iris/cluster/platform/gcp.py. The SA should be configurable in the cluster config YAML (e.g.,platform.gcp.service_account), falling back to the default compute SA if unset for backward compatibility.This enables:
serviceAccountUseron a narrow-scoped runtime SA, not the powerful default compute SAFiles to Change
lib/iris/src/iris/cluster/platform/gcp.py— add--service-accounttocreate_slice,create_vm_slice, and controller VM creationlib/iris/protos/config.proto(or equivalent) — addservice_accountfield toGcpPlatformConfiglib/iris/examples/smoke.yaml,coreweave.yaml, etc. — document the new field🤖 Generated with Claude Code