Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

- [2026.05.08] - New portable GitLab CI template `templates/functions/gke-kubeconfig.yml` (`.gke-kubeconfig`) that generates a namespace-scoped GKE kubeconfig using WIF-authenticated gcloud credentials. Activated when `ENABLE_GCP_WIF=1` and `K8S_CLUSTER_NAME` is set. Supports `K8S_USE_DNS_ENDPOINT=1` for private clusters. Runs after `setup-gitlab-agent` so the gcloud context always takes precedence. Remotely includable, no Docker image dependency.

### Fixed

- [2026.06.18] - `.gke-kubeconfig` now skips gracefully (without failing the job) when `gcloud` is not available in the job image or is not authenticated, instead of exiting non-zero. This prevents build and test jobs that inherit the global `before_script` but do not need cluster access from failing when `ENABLE_GCP_WIF=1` and `K8S_CLUSTER_NAME` are set as global CI/CD variables. Generation is gated on `ENABLE_GCP_WIF=1` as documented, and still fails fast when `gcloud` is authenticated but a required variable is missing or `get-credentials` fails.

### Changed

- [2025.10.23] - add support for additional docker registry via `ADDITIONAL_DOCKER_REGISTRY` variable.
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,18 @@ The `.gke-kubeconfig` `before_script` SHALL generate a GKE kubeconfig only when
- **WHEN** `ENABLE_GCP_WIF=1` but `K8S_CLUSTER_NAME` is unset or empty
- **THEN** the template SHALL print a skip message and exit without error

#### Scenario: Skipped when gcloud is not available in the job image
- **WHEN** `ENABLE_GCP_WIF=1` and `K8S_CLUSTER_NAME` is set but the `gcloud` command is not on `PATH` (e.g. a build or test job using an image without the Cloud SDK)
- **THEN** the template SHALL print a skip message and exit without error, so non-deploy jobs that inherit the global `before_script` are not failed

#### Scenario: Skipped when gcloud is present but not authenticated
- **WHEN** `ENABLE_GCP_WIF=1` and `K8S_CLUSTER_NAME` is set and `gcloud` is available but no account is active (e.g. WIF authentication did not run or did not succeed)
- **THEN** the template SHALL print a skip message and exit without error

#### Scenario: Fails fast on a real generation error
- **WHEN** `ENABLE_GCP_WIF=1`, `K8S_CLUSTER_NAME` is set, `gcloud` is available and authenticated, but a required variable is missing or `gcloud container clusters get-credentials` fails
- **THEN** the template SHALL print a descriptive error and exit non-zero

### Requirement: GKE variable validation
`check_gke_env()` SHALL validate that `K8S_CLUSTER_NAME`, `K8S_LOCATION`, `GCP_PROJECT_ID`, and `KUBE_NAMESPACE` are all non-empty. On any missing variable it SHALL print a descriptive error and return non-zero.

Expand Down
1 change: 1 addition & 0 deletions openspec/changes/wif-gke-kubeconfig/tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
- [ ] 4.3 Confirm a pipeline with `ENABLE_GCP_WIF=1` but without `K8S_CLUSTER_NAME` skips silently without errors
- [ ] 4.4 Confirm that when both GitLab Agent and WIF+GKE are configured, the gcloud context is active after `before_script` completes
- [ ] 4.5 Test `K8S_USE_DNS_ENDPOINT=1` on a private cluster to confirm `--dns-endpoint` is passed and connectivity succeeds
- [x] 4.6 Confirm a job with `ENABLE_GCP_WIF=1` and `K8S_CLUSTER_NAME` set but no `gcloud` in the image skips without failing (simulated locally); fail-fast preserved when gcloud is authenticated but vars missing or generation fails

## 5. Documentation

Expand Down
57 changes: 31 additions & 26 deletions templates/functions/gke-kubeconfig.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,17 @@
# This template generates a namespace-scoped kubeconfig for GKE clusters.
# It is an alternative to the GitLab Agent approach and can work with any
# gcloud authentication method (WIF, service account key, etc.).
# It requires gcloud to be already authenticated before this template runs.
# It is an alternative to the GitLab Agent approach.
#
# Generation is gated on ENABLE_GCP_WIF=1 and a non-empty K8S_CLUSTER_NAME, and
# expects gcloud to be already authenticated (e.g. via gcp-wif.yml running earlier
# in the before_script chain).
#
# This template is included in the global before_script and therefore runs in
# every job. It is resilient by design: when gcloud is not present in the job
# image, or is present but not authenticated, it skips without failing the job,
# so build/test jobs that inherit ENABLE_GCP_WIF/K8S_CLUSTER_NAME as global CI
# variables but do not need cluster access are not broken. It fails the job
# (exit 1) only when gcloud is authenticated and a kubeconfig was clearly
# intended but a required variable is missing or credential fetching fails.
#
# Example:
# include:
Expand All @@ -19,17 +29,14 @@
before_script:
# Functions
- |
check_gcloud_auth() {
if ! command -v gcloud &> /dev/null; then
echo "The gcloud command is not available. Cannot generate GKE kubeconfig."
return 1
fi
check_gcloud() {
command -v gcloud &> /dev/null
}
Comment thread
paolomainardi marked this conversation as resolved.

check_gcloud_auth() {
local active_account
active_account=$(gcloud auth list --filter="status=ACTIVE" --format="value(account)" 2>/dev/null)
if [ -z "${active_account}" ]; then
echo "No active gcloud authenticated account found. Cannot generate GKE kubeconfig."
echo "Authenticate gcloud before using this template (e.g. via gcp-wif.yml)."
return 1
fi

Expand Down Expand Up @@ -88,25 +95,23 @@
if command -v print-banner &> /dev/null; then
print-banner "GKE KUBECONFIG"
fi
if [ -n "${K8S_CLUSTER_NAME:-}" ]; then
if check_gcloud_auth; then
if check_gke_env; then
if generate_gke_kubeconfig; then
echo "GKE kubeconfig generated and scoped to namespace ${KUBE_NAMESPACE}."
else
echo "GKE kubeconfig generation failed."
exit 1
fi
else
echo "GKE kubeconfig generation skipped due to missing variables."
exit 1
fi
else
echo "GKE kubeconfig generation skipped due to missing gcloud authentication."
ENABLE_GCP_WIF="${ENABLE_GCP_WIF:-0}"
if [ "${ENABLE_GCP_WIF}" = "1" ] && [ -n "${K8S_CLUSTER_NAME:-}" ]; then

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't link the generation of the kubeconfig to the federation. The principal running the runner might not need the federation but might already have the permissions to obtain a kubeconfig.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This was written by an AI agent on behalf of @paolomainardi.

Good point, agreed. Generation is now decoupled from ENABLE_GCP_WIF in commit a6cb2ff. The gate is K8S_CLUSTER_NAME alone, which signals cluster intent, and gcloud can be authenticated by any method (the runner's own service account, a service account key, or WIF). When gcloud is unavailable or unauthenticated the template skips without failing the job, and it only fails fast when gcloud is authenticated but a required variable is missing or get-credentials fails. The file header, the OpenSpec spec, and design Decision 4 were updated to reflect that cluster access and federation are separate concerns.

if ! check_gcloud; then
echo "GKE kubeconfig generation skipped: gcloud is not available in this job image."
elif ! check_gcloud_auth; then
echo "GKE kubeconfig generation skipped: gcloud is not authenticated."
Comment thread
paolomainardi marked this conversation as resolved.
elif ! check_gke_env; then
echo "GKE kubeconfig generation failed: required variables missing."
exit 1
elif ! generate_gke_kubeconfig; then
echo "GKE kubeconfig generation failed."
exit 1
else
echo "GKE kubeconfig generated and scoped to namespace ${KUBE_NAMESPACE}."
fi
else
echo "GKE kubeconfig generation skipped (K8S_CLUSTER_NAME not set)."
echo "GKE kubeconfig generation skipped (ENABLE_GCP_WIF is not 1 or K8S_CLUSTER_NAME not set)."
fi
if command -v print-banner &> /dev/null; then
print-banner "END GKE KUBECONFIG"
Expand Down
Loading