Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

- [2026.05.08] - New portable GitLab CI template `templates/functions/gke-kubeconfig.yml` (`.gke-kubeconfig`) that generates a namespace-scoped GKE kubeconfig using WIF-authenticated gcloud credentials. Activated when `ENABLE_GCP_WIF=1` and `K8S_CLUSTER_NAME` is set. Supports `K8S_USE_DNS_ENDPOINT=1` for private clusters. Runs after `setup-gitlab-agent` so the gcloud context always takes precedence. Remotely includable, no Docker image dependency.

### Fixed

- [2026.06.18] - `.gke-kubeconfig` now skips gracefully (without failing the job) when `gcloud` is not available in the job image or is not authenticated, instead of exiting non-zero. This prevents build and test jobs that inherit the global `before_script` but do not need cluster access from failing when `K8S_CLUSTER_NAME` is set as a global CI/CD variable. Generation is gated on `K8S_CLUSTER_NAME` alone and is no longer coupled to `ENABLE_GCP_WIF`, so any gcloud authentication method (WIF, runner service account, service account key) is supported. It still fails fast when `gcloud` is authenticated but a required variable is missing or `get-credentials` fails.

### Changed

- [2025.10.23] - add support for additional docker registry via `ADDITIONAL_DOCKER_REGISTRY` variable.
8 changes: 5 additions & 3 deletions openspec/changes/wif-gke-kubeconfig/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,13 @@ Running `.gke-kubeconfig` last — after `setup-gitlab-agent` — ensures the gc

**Rationale:** The image scripts are called by `scripts/kubectl` and `scripts/destroy` for token-based pipelines only. Adding WIF awareness there breaks the portability pattern — the logic should live in a template, not baked into the image. The template approach covers all use cases without image changes.

### 4. Branch condition: `ENABLE_GCP_WIF=1 && K8S_CLUSTER_NAME` is set
### 4. Branch condition: `K8S_CLUSTER_NAME` is set, gated by gcloud capability

**Decision:** Gate `generate_gke_kubeconfig` on both `ENABLE_GCP_WIF=1` (explicit opt-in) and `K8S_CLUSTER_NAME` being non-empty.
**Decision:** Gate `generate_gke_kubeconfig` on `K8S_CLUSTER_NAME` being non-empty alone. Do NOT couple it to `ENABLE_GCP_WIF`. When `K8S_CLUSTER_NAME` is set but `gcloud` is unavailable or unauthenticated, skip without failing the job; fail fast only when gcloud is authenticated but a required variable is missing or credential fetching fails.

**Rationale:** Same reasoning as before — `ENABLE_GCP_WIF=1` alone is insufficient since a project might use WIF for GCR/Artifact Registry without needing GKE access. `K8S_CLUSTER_NAME` unambiguously signals cluster intent.
**Rationale:** `K8S_CLUSTER_NAME` unambiguously signals cluster intent. Tying generation to `ENABLE_GCP_WIF=1` would wrongly exclude principals that are already authenticated to gcloud by other means (the runner's own service account, a service account key, etc.) and only need a kubeconfig, not federation. This is consistent with Decision 1 (WIF authentication and cluster access are separate concerns) and was raised in review of the resilience fix.

**Alternative considered:** Also requiring `ENABLE_GCP_WIF=1`. Rejected because it couples cluster access to a specific authentication method and breaks non-WIF gcloud auth. Resilience for non-deploy jobs is instead provided by skipping gracefully when gcloud is absent or unauthenticated.

### 5. `K8S_USE_DNS_ENDPOINT` as a boolean flag

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,21 +7,29 @@ The `templates/functions/gke-kubeconfig.yml` template SHALL define all its logic
- **WHEN** a project includes `gke-kubeconfig.yml` via `include: remote:` without using the spark-k8s-deployer image
- **THEN** all functions (`check_gke_env`, `generate_gke_kubeconfig`) SHALL be available in `before_script` without errors

### Requirement: GKE kubeconfig generation gated on ENABLE_GCP_WIF and K8S_CLUSTER_NAME
The `.gke-kubeconfig` `before_script` SHALL generate a GKE kubeconfig only when `ENABLE_GCP_WIF=1` AND `K8S_CLUSTER_NAME` is non-empty. In all other cases it SHALL skip silently with an informational message.
### Requirement: GKE kubeconfig generation gated on K8S_CLUSTER_NAME and gcloud capability
The `.gke-kubeconfig` `before_script` SHALL attempt to generate a GKE kubeconfig whenever `K8S_CLUSTER_NAME` is non-empty. Generation SHALL NOT be tied to `ENABLE_GCP_WIF`: the principal running the job may already hold the permissions to fetch a kubeconfig without Workload Identity Federation, so any gcloud authentication method is supported. When `K8S_CLUSTER_NAME` is set but `gcloud` is unavailable or unauthenticated, the template SHALL skip without failing the job.

#### Scenario: Kubeconfig generated when both conditions are met
- **WHEN** `ENABLE_GCP_WIF=1` and `K8S_CLUSTER_NAME` is set
#### Scenario: Kubeconfig generated when cluster intent and gcloud capability are present
- **WHEN** `K8S_CLUSTER_NAME` is set and `gcloud` is available and authenticated (by any method)
- **THEN** `generate_gke_kubeconfig` SHALL be called and a valid kubeconfig SHALL be produced

#### Scenario: Skipped when ENABLE_GCP_WIF is not 1
- **WHEN** `ENABLE_GCP_WIF` is unset, empty, or `"0"`
#### Scenario: Skipped when K8S_CLUSTER_NAME is absent
- **WHEN** `K8S_CLUSTER_NAME` is unset or empty
- **THEN** the template SHALL print a skip message and exit without error

#### Scenario: Skipped when K8S_CLUSTER_NAME is absent
- **WHEN** `ENABLE_GCP_WIF=1` but `K8S_CLUSTER_NAME` is unset or empty
#### Scenario: Skipped when gcloud is not available in the job image
- **WHEN** `K8S_CLUSTER_NAME` is set but the `gcloud` command is not on `PATH` (e.g. a build or test job using an image without the Cloud SDK)
- **THEN** the template SHALL print a skip message and exit without error, so non-deploy jobs that inherit the global `before_script` are not failed

#### Scenario: Skipped when gcloud is present but not authenticated
- **WHEN** `K8S_CLUSTER_NAME` is set and `gcloud` is available but no account is active (e.g. no authentication step ran or it did not succeed)
- **THEN** the template SHALL print a skip message and exit without error

#### Scenario: Fails fast on a real generation error
- **WHEN** `K8S_CLUSTER_NAME` is set, `gcloud` is available and authenticated, but a required variable is missing or `gcloud container clusters get-credentials` fails
- **THEN** the template SHALL print a descriptive error and exit non-zero

### Requirement: GKE variable validation
`check_gke_env()` SHALL validate that `K8S_CLUSTER_NAME`, `K8S_LOCATION`, `GCP_PROJECT_ID`, and `KUBE_NAMESPACE` are all non-empty. On any missing variable it SHALL print a descriptive error and return non-zero.

Expand Down
1 change: 1 addition & 0 deletions openspec/changes/wif-gke-kubeconfig/tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
- [ ] 4.3 Confirm a pipeline with `ENABLE_GCP_WIF=1` but without `K8S_CLUSTER_NAME` skips silently without errors
- [ ] 4.4 Confirm that when both GitLab Agent and WIF+GKE are configured, the gcloud context is active after `before_script` completes
- [ ] 4.5 Test `K8S_USE_DNS_ENDPOINT=1` on a private cluster to confirm `--dns-endpoint` is passed and connectivity succeeds
- [x] 4.6 Confirm a job with `K8S_CLUSTER_NAME` set but no `gcloud` in the image skips without failing (simulated locally); fail-fast preserved when gcloud is authenticated but vars missing or generation fails. Generation is decoupled from `ENABLE_GCP_WIF` so any gcloud auth method works.

## 5. Documentation

Expand Down
56 changes: 32 additions & 24 deletions templates/functions/gke-kubeconfig.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,21 @@
# This template generates a namespace-scoped kubeconfig for GKE clusters.
# It is an alternative to the GitLab Agent approach and can work with any
# gcloud authentication method (WIF, service account key, etc.).
# It requires gcloud to be already authenticated before this template runs.
# It is an alternative to the GitLab Agent approach and works with any gcloud
# authentication method (WIF, runner service account, service account key, etc.).
# It is deliberately NOT tied to WIF: the principal running the job may already
# hold the permissions to fetch a kubeconfig without federation.
#
# Generation is attempted whenever K8S_CLUSTER_NAME is non-empty (the signal of
# cluster intent) and expects gcloud to be already authenticated by some earlier
# step (e.g. gcp-wif.yml in the before_script chain, or the runner's own
# credentials).
#
# This template is included in the global before_script and therefore runs in
# every job. It is resilient by design: when gcloud is not present in the job
# image, or is present but not authenticated, it skips without failing the job,
# so build/test jobs that inherit K8S_CLUSTER_NAME as a global CI variable but do
# not need cluster access are not broken. It fails the job (exit 1) only when
# gcloud is authenticated and a kubeconfig was clearly intended but a required
# variable is missing or credential fetching fails.
#
# Example:
# include:
Expand All @@ -19,17 +33,14 @@
before_script:
# Functions
- |
check_gcloud_auth() {
if ! command -v gcloud &> /dev/null; then
echo "The gcloud command is not available. Cannot generate GKE kubeconfig."
return 1
fi
check_gcloud() {
command -v gcloud &> /dev/null
}
Comment thread
paolomainardi marked this conversation as resolved.

check_gcloud_auth() {
local active_account
active_account=$(gcloud auth list --filter="status=ACTIVE" --format="value(account)" 2>/dev/null)
if [ -z "${active_account}" ]; then
echo "No active gcloud authenticated account found. Cannot generate GKE kubeconfig."
echo "Authenticate gcloud before using this template (e.g. via gcp-wif.yml)."
return 1
fi

Expand Down Expand Up @@ -89,21 +100,18 @@
print-banner "GKE KUBECONFIG"
fi
if [ -n "${K8S_CLUSTER_NAME:-}" ]; then
if check_gcloud_auth; then
if check_gke_env; then
if generate_gke_kubeconfig; then
echo "GKE kubeconfig generated and scoped to namespace ${KUBE_NAMESPACE}."
else
echo "GKE kubeconfig generation failed."
exit 1
fi
else
echo "GKE kubeconfig generation skipped due to missing variables."
exit 1
fi
else
echo "GKE kubeconfig generation skipped due to missing gcloud authentication."
if ! check_gcloud; then
echo "GKE kubeconfig generation skipped: gcloud is not available in this job image."
elif ! check_gcloud_auth; then
echo "GKE kubeconfig generation skipped: gcloud is not authenticated."
Comment thread
paolomainardi marked this conversation as resolved.
elif ! check_gke_env; then
echo "GKE kubeconfig generation failed: required variables missing."
exit 1
elif ! generate_gke_kubeconfig; then
echo "GKE kubeconfig generation failed."
exit 1
else
echo "GKE kubeconfig generated and scoped to namespace ${KUBE_NAMESPACE}."
fi
else
echo "GKE kubeconfig generation skipped (K8S_CLUSTER_NAME not set)."
Expand Down
Loading