Common first-day issues for generated repositories.
- Ensure required variables are exported or present in
blueprint/repo.init.env:BLUEPRINT_REPO_NAMEBLUEPRINT_GITHUB_ORGBLUEPRINT_GITHUB_REPOBLUEPRINT_DEFAULT_BRANCHBLUEPRINT_STACKIT_REGIONBLUEPRINT_STACKIT_TENANT_SLUGBLUEPRINT_STACKIT_PLATFORM_SLUGBLUEPRINT_STACKIT_PROJECT_IDBLUEPRINT_STACKIT_TFSTATE_BUCKETBLUEPRINT_STACKIT_TFSTATE_KEY_PREFIX
- Check
docs/docusaurus.config.jsandblueprint/contract.yamlare writable. - If the repo is already initialized, rerun only when you intentionally want to re-apply init-owned files:
BLUEPRINT_INIT_FORCE=true make blueprint-init-repo
- The interactive target requires a TTY terminal.
- In CI/non-interactive shells, use env-file mode with
make blueprint-init-repo.
- Generated repos do not recreate consumer-owned root files during bootstrap.
- Restore them intentionally, then rerun bootstrap:
make blueprint-resync-consumer-seeds BLUEPRINT_RESYNC_APPLY_SAFE=true make blueprint-resync-consumer-seeds make blueprint-bootstrap
- Missing files are classified as
auto-refresh(action=create) and are recreated by safe apply:BLUEPRINT_RESYNC_APPLY_SAFE=true make blueprint-resync-consumer-seeds. - Use
BLUEPRINT_RESYNC_APPLY_ALL=true make blueprint-resync-consumer-seedsonly when full overwrite is intentional for files classified asmanual-merge.
- Generated repos do not recreate init-managed identity files from ambient env during infra bootstrap.
- Restore them intentionally, then rerun infra bootstrap:
BLUEPRINT_INIT_FORCE=true make blueprint-init-repo make infra-bootstrap
- Branch names must match contract prefixes (for example
feature/...,fix/...,chore/...,docs/...). - Compatibility prefixes
codex/...andcopilot/...are accepted even when older consumer contracts do not yet list them. - If running in CI, ensure
GITHUB_HEAD_REF/GITHUB_REF_NAMEis available or setBLUEPRINT_BRANCH_NAME.
- Re-run
make blueprint-init-repowith correct values. - Confirm
blueprint/repo.init.envdoes not contain stale identity overrides. - Confirm
blueprint/repo.init.secrets.envexists (copy fromblueprint/repo.init.secrets.example.envwhen missing). - For enabled optional modules, confirm required non-sensitive inputs in
blueprint/repo.init.envand required sensitive inputs inblueprint/repo.init.secrets.envare non-empty. - Confirm contract and docs identity values match your repository owner/name.
- Confirm
blueprint/contract.yamlsetsrepo_mode: generated-consumer.
manual-mergemeans the current file diverged from the latest seed and appears customized.- Keep dry-run as the default review step, then decide per file:
- merge manually if you need to preserve local customizations
- use safe apply for untouched/missing files:
BLUEPRINT_RESYNC_APPLY_SAFE=true make blueprint-resync-consumer-seeds
- use full overwrite only when intentional:
BLUEPRINT_RESYNC_APPLY_ALL=true make blueprint-resync-consumer-seeds
- Resync now fails fast when a seeded template still contains unresolved blueprint tokens after rendering.
- Typical cause: a consumer-seeded template introduced a token that is not part of the supported replacement set.
- Keep dry-run first, then inspect the template path reported in the error and replace unsupported tokens with concrete values or supported placeholders.
- The installer first checks the repo-local skill source under
.agents/skills/<skill-name>. - If repo-local skill files are missing, it falls back to consumer template assets under
scripts/templates/consumer/init/.agents/skills/<skill-name>. - If both paths are missing, sync blueprint-managed assets first and rerun:
make blueprint-upgrade-consumer BLUEPRINT_RESYNC_APPLY_SAFE=true make blueprint-resync-consumer-seeds make blueprint-install-codex-skill
- The same remediation applies to:
make blueprint-install-codex-skill-consumer-opsmake blueprint-install-codex-skill-sdd-step01-intakemake blueprint-install-codex-skill-sdd-step02-resolve-questionsmake blueprint-install-codex-skill-sdd-step03-spec-completemake blueprint-install-codex-skill-sdd-step04-plan-slicermake blueprint-install-codex-skill-sdd-step05-implementmake blueprint-install-codex-skill-sdd-step06-document-syncmake blueprint-install-codex-skill-sdd-step07-pr-packagermake blueprint-install-codex-skill-sdd-traceability-keepermake blueprint-install-codex-skills
- This usually means the repository is still executing a stale local upgrade engine from an older consumer baseline.
- Current blueprint upgrade wrappers default to
BLUEPRINT_UPGRADE_ENGINE_MODE=source-ref, which runs the engine script resolved fromBLUEPRINT_UPGRADE_SOURCE@BLUEPRINT_UPGRADE_REF. - Current local engine behavior also treats any positive
git merge-filereturn code as conflict-present and emits normal conflict artifacts/report output instead of an internal abort. - If your repository still has the legacy wrapper behavior, run a one-time source-driven upgrade engine call, then rerun validation:
TMP_DIR="$(mktemp -d)" git clone --quiet --no-checkout "${BLUEPRINT_UPGRADE_SOURCE}" "${TMP_DIR}/source" git -C "${TMP_DIR}/source" checkout --quiet "${BLUEPRINT_UPGRADE_REF}" python3 "${TMP_DIR}/source/scripts/lib/blueprint/upgrade_consumer.py" \ --repo-root "$PWD" \ --source "${BLUEPRINT_UPGRADE_SOURCE}" \ --ref "${BLUEPRINT_UPGRADE_REF}" \ --apply \ --plan-path artifacts/blueprint/upgrade_plan.json \ --apply-path artifacts/blueprint/upgrade_apply.json \ --summary-path artifacts/blueprint/upgrade_summary.md rm -rf "${TMP_DIR}" make blueprint-upgrade-consumer-validate make blueprint-upgrade-consumer-postcheck
- After the upgrade lands, keep using
make blueprint-upgrade-consumerwith the default engine mode (source-ref) for deterministic future runs.
Every merge-required entry in upgrade_plan.json and upgrade_summary.md carries a semantic
annotation with three fields:
kind— category of change (see table below)description— human-readable summary naming the changed symbol and new valueverification_hints— concrete actions to verify after applying the merge
kind |
What changed |
|---|---|
function-added |
A shell function was added to the source; check that its call sites are correct after merge |
function-removed |
A shell function was removed from the source; check that no call sites still reference it |
variable-changed |
A variable assignment changed value; verify the new value is correct in your merged file |
source-directive-added |
A source or . directive was added; confirm the sourced file exists at the expected path |
structural-change |
Complex diff or new file — manually review the full diff before resolving the merge |
structural-change is the fallback for diffs that do not match a specific pattern; it is always
actionable via manual review and does not indicate an error in the annotator.
The Merge-Required Annotations section in artifacts/blueprint/upgrade_summary.md lists every
annotated entry with its kind, description, and bullet hints — read it before starting manual merges.
Consumer-renamed manifest deleted after make blueprint-upgrade-consumer BLUEPRINT_UPGRADE_ALLOW_DELETE=true
If a file in your consumer repository has been renamed but the original blueprint path
still exists in the upgrade payload, the upgrade apply step may delete the original path
rather than recognising that it is referenced by a kustomization.yaml in the same directory.
Root cause: The upgrade apply stage checks three layers before deleting any entry:
- Contract ownership —
consumer_seeded,source_only, andinit_managedentries are never deleted. - Consumer-owned workload fast path — any file under
base/apps/whose YAMLmetadata.nameormetadata.labels.appmatches an existingkustomization.yamlresource is preserved. - Kustomization-ref guard — any file whose basename appears in the
resources:orpatches:list of akustomization.yamlin the same directory is preserved, regardless of its path.
If your renamed manifest was still deleted, check:
- The file's basename exactly matches a
resources:orpatches:entry in the siblingkustomization.yaml. Path values are compared case-sensitively. - The sibling
kustomization.yamlis valid YAML. If it contains syntax errors the guard falls back toFalseand logs a warning to stderr:Fix the YAML syntax and rerun the upgrade.warning: _is_kustomization_referenced: failed to parse <path>: <error> - Check
artifacts/blueprint/upgrade_summary.md— theconsumer_kustomization_ref_countfield shows how many entries were preserved by the kustomization-ref guard in the last run.
Recovery: If the file was already deleted, restore it from git history:
git checkout HEAD~1 -- <path/to/deleted-file.yaml>
git add <path/to/deleted-file.yaml>
git commit -m "fix: restore <filename> deleted incorrectly during blueprint upgrade"When the upgrade apply stage merges a .tf file, it scans for top-level Terraform block
declarations that appear more than once (same block type, name, and label).
-
Byte-identical duplicates are automatically removed. The result is recorded as
merged-dedupedin the apply summary and the removed block is listed in thededuplication_logsection ofartifacts/blueprint/upgrade_summary.md. Check thetf_dedup_countsummary field to see how many were removed. -
Non-identical duplicates (blocks with the same header but different bodies) cannot be resolved automatically. The upgrade produces a
conflictartifact at the same path and leaves both block variants in the file separated by conflict markers. Resolve manually:- Open the file flagged as
conflictinartifacts/blueprint/upgrade_summary.md. - Identify the two block variants between the conflict markers.
- Decide which variant (or which merged result) is correct for your repository.
- Remove the conflict markers and the rejected variant.
- Run
terraform validateto confirm the file is syntactically valid. - Commit the resolution before rerunning the bootstrap or plan targets.
- Open the file flagged as
apps/descriptor.yaml is the consumer-owned app metadata source (see
App Onboarding Contract — App Descriptor).
infra-validate parses it and reports any schema, path-safety, or
kustomization-membership failure with deterministic messages naming the descriptor app,
component, and offending value.
Common error patterns and fixes:
-
apps/descriptor.yaml: apps[N].id must be a DNS-style label ...- The app or component
idcontains forbidden characters (/,.., uppercase, shell metacharacters). Rename to lowercase alphanumerics + hyphens (e.g.marketplace-api).
- The app or component
-
apps/descriptor.yaml: app[<id>].component[<id>].manifests.<kind> must live under infra/gitops/platform/base/apps/- Manifest path escapes the apps base directory or uses an absolute path. Use a relative
path under
infra/gitops/platform/base/apps/, or omit the explicit ref to use the convention default ({component-id}-{deployment,service}.yaml).
- Manifest path escapes the apps base directory or uses an absolute path. Use a relative
path under
-
apps/descriptor.yaml: app[<id>].component[<id>]: <kind> manifest missing: <path>- The resolved manifest file does not exist on disk. Create the missing manifest under
infra/gitops/platform/base/apps/or correct the explicit ref.
- The resolved manifest file does not exist on disk. Create the missing manifest under
-
apps/descriptor.yaml: app[<id>].component[<id>]: <kind> manifest filename not listed in infra/gitops/platform/base/apps/kustomization.yaml- The manifest exists but isn't listed in the apps
kustomization.yaml. Add the basename to theresources:list and rerunmake infra-validate. - v1.8.1-only regression (issue #230, fixed in the next blueprint patch). Consumers
that upgraded from v1.8.0 to v1.8.1 hit this with 4 errors at once
(
backend-api-deployment.yaml,backend-api-service.yaml,touchpoints-web-deployment.yaml,touchpoints-web-service.yaml) becauseblueprint-init-repoforce-reseededapps/descriptor.yamlfrom the demo-app template while leaving the consumer's existingkustomization.yamluntouched. Recovery: upgrade to the next blueprint patch (descriptor↔kustomization paired-reseed viaconsumer_seededcontract) and re-runmake blueprint-upgrade-consumer && make blueprint-upgrade-consumer-postcheck. No consumer-side workaround is required after upgrading.
- The manifest exists but isn't listed in the apps
When an existing consumer lacks apps/descriptor.yaml, the upgrade flow emits a
starting-point descriptor at artifacts/blueprint/app_descriptor.suggested.yaml derived
from the current infra/gitops/platform/base/apps/kustomization.yaml. The upgrade does
not write apps/descriptor.yaml automatically — adoption is explicit.
To adopt:
- Open
artifacts/blueprint/app_descriptor.suggested.yaml. - Set
owner.team: TODOto your real team handle for each app. - Adjust app/component
idvalues if the suggested IDs (derived from manifest filenames) don't match your intended naming. - Add optional
service.port,health.*fields per component if you want the catalog manifest renderer to surface them. - Move the file to
apps/descriptor.yamlat your repo root:mv artifacts/blueprint/app_descriptor.suggested.yaml apps/descriptor.yaml
- Run
make infra-validateto confirm the descriptor is valid and all manifests resolve.
After adoption, make blueprint-upgrade-consumer stops emitting the suggested artifact
on subsequent runs. Apps declared in apps/descriptor.yaml are protected from prune as
consumer-app-descriptor (see summary.consumer_app_descriptor_count in
artifacts/blueprint/upgrade_apply.json).
- Generated repositories seed
.github/CODEOWNERSas a starter file with commented examples only. - Replace the example owners with your real team handles before relying on GitHub review assignment.
- Keep
.github/pull_request_template.mdand.github/ISSUE_TEMPLATE/**aligned with your team workflow once you adopt them.
- In the blueprint source repository,
dags/is tracked intentionally as template authoring scaffolding. - In a generated consumer repository,
make blueprint-init-repoprunesdags/whenWORKFLOWS_ENABLED=false. - If
WORKFLOWS_ENABLED=falseanddags/is still present in a fresh consumer repo, rerun first init before your first commit, then re-validate:WORKFLOWS_ENABLED=false BLUEPRINT_INIT_FORCE=true make blueprint-init-repo WORKFLOWS_ENABLED=false make blueprint-bootstrap WORKFLOWS_ENABLED=false make infra-bootstrap WORKFLOWS_ENABLED=false make infra-validate
- Disabling an optional module removes its generated Make targets, but already materialized scaffold files are intentionally preserved.
- Already provisioned resources are not destroyed automatically.
- Run disabled-module teardown first, then refresh the repo state for the new flag set:
make infra-destroy-disabled-modules WORKFLOWS_ENABLED=false make blueprint-render-makefile WORKFLOWS_ENABLED=false make infra-bootstrap
- If you prefer explicit module-level teardown, run the module destroy target directly while the module flag is still enabled.
- Runtime chains (
infra-provision,infra-deploy,infra-smoke, andinfra-provision-deploy) must be side-effect free for blueprint-managed tracked files. infra-validatenow rendersmake/blueprint.generated.mkfrom contract defaults and ignores transient module toggle overrides during runtime flows.- If you intentionally want to materialize optional module targets from a new module flag set, use:
make blueprint-render-makefile
- Confirm clean state after runtime commands:
git status --short make/blueprint.generated.mk
- Local profiles prefer the
docker-desktopcontext when it is present. - CI prefers
kind-*contexts. - Run
make infra-contextto see the resolved cluster and selection source. - If you want to force a different local cluster, set
LOCAL_KUBE_CONTEXTexplicitly before provisioning:export LOCAL_KUBE_CONTEXT=kind-blueprint-e2e make infra-context
- Live smoke fails when blueprint-managed workloads are not healthy.
- Inspect:
artifacts/infra/workload_health.jsonartifacts/infra/workload_pods.jsonartifacts/infra/smoke_diagnostics.json
- Typical causes:
- invalid module credentials or secrets (for example
IAP_COOKIE_SECRETnot being 16, 24, or 32 bytes) - stale local image tags if chart/image pins were edited away from the canonical versions source (
scripts/lib/infra/versions.sh)
- invalid module credentials or secrets (for example
- Re-run the affected module plan/apply target after correcting the contract input.
make infra-local-destroy-allintentionally removes blueprint-managed resources only.- It preserves the selected local cluster itself (
docker-desktop,kind-*, or the explicitLOCAL_KUBE_CONTEXToverride). - Use that target before switching local clusters or before a fresh live rerun:
make infra-local-destroy-all
- Ensure required local tools are available (
bash,git,make,python3,tar). - Confirm CI job exports init variables,
BLUEPRINT_PROFILE, and any intended optional-module flags beforemake blueprint-template-smoke.
- Upgrade your generated repository to the latest blueprint ref so CI picks up Node-24-ready action majors:
.github/actions/prepare-blueprint-ci/action.yml(actions/setup-python@v6,actions/setup-node@v6).github/workflows/ci.yml(actions/checkout@v6)
- Use the upgrade flow from the repository root:
make blueprint-resync-consumer-seeds BLUEPRINT_RESYNC_APPLY_SAFE=true make blueprint-resync-consumer-seeds make blueprint-upgrade-consumer BLUEPRINT_UPGRADE_APPLY=true make blueprint-upgrade-consumer make blueprint-upgrade-consumer-validate make blueprint-upgrade-consumer-postcheck
- Temporary fallback only if you cannot upgrade immediately:
- set
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=truein workflow/job env.
- set
- The app catalog scaffold is opt-in and controlled by
APP_CATALOG_SCAFFOLD_ENABLED. - Enable and materialize the scaffold before running smoke:
APP_CATALOG_SCAFFOLD_ENABLED=true make apps-bootstrap APP_CATALOG_SCAFFOLD_ENABLED=true make apps-smoke
- If you intentionally run a minimal repo without app catalog scaffold, keep
APP_CATALOG_SCAFFOLD_ENABLED=false;apps-smokerecords a skipped catalog check and still succeeds.
- Baseline app runtime GitOps scaffold is controlled by
APP_RUNTIME_GITOPS_ENABLED(defaulttrue). - Reconcile and validate scaffold contract explicitly:
APP_RUNTIME_GITOPS_ENABLED=true make infra-bootstrap APP_RUNTIME_GITOPS_ENABLED=true make infra-validate
- Confirm runtime path includes workload manifests:
infra/gitops/platform/base/kustomization.yamlhas- appsinfra/gitops/platform/base/apps/*containsDeploymentandServicemanifests
- If
APP_CATALOG_SCAFFOLD_ENABLED=true, keepapps/catalog/manifest.yamlsynchronized with runtime paths:deliveryTopologyruntimeDeliveryContract.gitopsWorkloadsruntimeDeliveryContract.manifestsRoot
- If you replaced scaffold images, ensure the same refs are updated in:
apps/catalog/manifest.yamlinfra/gitops/platform/base/apps/*deployment.yaml
- Execute-mode smoke (
DRY_RUN=false) now fails deterministically when app runtime is declared enabled but expected runtime workloads are absent. - Guardrail contract defaults:
APP_RUNTIME_GITOPS_ENABLED=trueAPP_RUNTIME_MIN_WORKLOADS=1(minimumDeployment/StatefulSetobjects in namespaceapps)
- Inspect:
artifacts/apps/apps_smoke.env(runtime_workload_check_*markers)artifacts/infra/workload_health.json(statusReason,requiredNamespaceMinimumPods,emptyRuntimeNamespaces)artifacts/infra/smoke_diagnostics.json(workloadHealth.emptyRuntimeNamespaceCount,appRuntime.minimumExpectedWorkloads)
- If runtime should be intentionally empty during a transition window, set an explicit override for that run:
Then restore
APP_RUNTIME_MIN_WORKLOADS=0 DRY_RUN=false make infra-smoke
APP_RUNTIME_MIN_WORKLOADS=1once workload deployment is expected again.
- The hook contract runs only for local profiles (
local-full,local-lite) and only afterinfra-provision,infra-deploy, andinfra-smokesucceed. - Contract toggles:
LOCAL_POST_DEPLOY_HOOK_ENABLED=falseby default (skip withreason=disabled).LOCAL_POST_DEPLOY_HOOK_CMD='make -C "$ROOT_DIR" infra-post-deploy-consumer'by default.LOCAL_POST_DEPLOY_HOOK_REQUIRED=falseby default (best-effort warn-and-continue).
- Inspect the state artifact:
artifacts/infra/local_post_deploy_hook.env(status,reason,mode,enabled,command_configured)artifacts/infra/local_post_deploy_hook.json(schema-validated canonical state payload)
- Common outcomes:
status=skipped reason=non_local_profile: expected forstackit-*profiles.status=skipped reason=disabled: setLOCAL_POST_DEPLOY_HOOK_ENABLED=trueto execute the hook.status=failure reason=command_failed: hook command failed; withLOCAL_POST_DEPLOY_HOOK_REQUIRED=falsechain continues, withtrueit fails fast.
- In generated-consumer repositories, implement deterministic commands in
make/platform.mktargetinfra-post-deploy-consumer(the seeded target is an intentional fail-fast placeholder until you replace it). - Upgrade preflight guardrail: when
LOCAL_POST_DEPLOY_HOOK_ENABLED=true,make blueprint-upgrade-consumer-preflightreports a required manual action ifinfra-post-deploy-consumeris still placeholder. - Upgrade preflight required-target checklist: when a contract-required consumer-owned Make target is missing, preflight reports a required manual action with the exact target name; implement it in
make/platform.mkormake/platform/*.mk, then rerunmake blueprint-upgrade-consumer-validateandmake blueprint-upgrade-consumer-postcheck.
- Ensure your workflow uses
.github/actions/prepare-blueprint-ci/action.ymlbefore test lanes. - The current action bootstrap contract delegates dependency installation to
BLUEPRINT_PROFILE=local-lite OBSERVABILITY_ENABLED=false make apps-ci-bootstrap. - CI toolchain/OS dependencies are handled by
make infra-prereqsin the shared action. - App/runtime dependencies are handled by
make apps-ci-bootstrap, which composes:make apps-bootstrap(baseline app scaffolding/state)make apps-ci-bootstrap-consumer(consumer-owned dependency install contract)
- In generated-consumer mode, the seeded
apps-ci-bootstrap-consumeris an intentional fail-fast placeholder. Replace it with deterministic commands for your repository layout (no directory scanning/discovery), for example:- backend Python dependency install from your fixed backend path(s)
- touchpoints/package-manager dependency install from your fixed frontend/workspace path(s)
- optional browser/runtime bootstrap only when your package metadata declares that dependency
- Keep all consumer-specific CI bootstrap commands in
apps-ci-bootstrap-consumerinmake/platform.mk(ormake/platform/*.mk) as the single consumer-owned hook. - Confirm path ownership before patching CI failures:
make blueprint-ownership-check OWNERSHIP_PATHS="scripts/bin/platform/touchpoints/test_e2e.sh make/platform.mk"scripts/bin/platform/**andmake/platform*ownership should resolve toplatform-owned.
- If your repository still fails with errors such as
ModuleNotFoundError: fastapi,vitest: command not found, orExecutable doesn't exist ... chrome-headless-shell, resync and upgrade from repository root:make blueprint-resync-consumer-seeds BLUEPRINT_RESYNC_APPLY_SAFE=true make blueprint-resync-consumer-seeds make blueprint-upgrade-consumer BLUEPRINT_UPGRADE_APPLY=true make blueprint-upgrade-consumer make blueprint-upgrade-consumer-validate make apps-ci-bootstrap
- Ensure backend files exist under:
infra/cloud/stackit/terraform/bootstrap/state-backend/<env>.hclinfra/cloud/stackit/terraform/foundation/state-backend/<env>.hcl
- Ensure each backend file contains:
skip_requesting_account_iduse_path_style- STACKIT object storage endpoint (
object.storage...)
- Ensure repository identity values are coherent:
BLUEPRINT_STACKIT_REGIONBLUEPRINT_STACKIT_TFSTATE_BUCKETBLUEPRINT_STACKIT_TFSTATE_KEY_PREFIX
- In execution mode (
DRY_RUN=false), foundation preflight probes SKE API access before Terraform apply. - If you see
service account lacks SKE permissions, ensure the identity behindSTACKIT_SERVICE_ACCOUNT_KEYcan:- enable/read SKE service in the project and region
- list/read SKE clusters in the project
- Re-run preflight after updating IAM:
make infra-stackit-foundation-preflight
- Check the state artifact for probe outcome:
artifacts/infra/stackit_foundation_preflight.env(ske_access_probe=passedexpected in execute mode)
- In execution mode (
DRY_RUN=false), always export:STACKIT_PROJECT_IDSTACKIT_REGIONSTACKIT_SERVICE_ACCOUNT_KEYSTACKIT_TFSTATE_ACCESS_KEY_IDSTACKIT_TFSTATE_SECRET_ACCESS_KEY
- Ensure the backend bucket exists and matches the repository identity:
BLUEPRINT_STACKIT_TFSTATE_BUCKETBLUEPRINT_STACKIT_TFSTATE_KEY_PREFIX
- Ensure
STACKIT_TFSTATE_ACCESS_KEY_ID/STACKIT_TFSTATE_SECRET_ACCESS_KEYcan read/write that bucket. - Re-run in order:
make infra-stackit-bootstrap-preflight make infra-stackit-bootstrap-apply make infra-stackit-foundation-preflight make infra-stackit-foundation-apply make infra-stackit-foundation-fetch-kubeconfig
infra-stackit-foundation-applyretries a bounded number of times when STACKIT returns the known transient PostgreSQL Flex race:Requested instance with ID: ... cannot be found
- Before retrying, the wrapper clears the transient Terraform taint on
stackit_postgresflex_instance.foundation[0]so the next apply can reconcile the existing managed instance instead of destroying and recreating it. - The retry budget is controlled by:
STACKIT_FOUNDATION_APPLY_MAX_ATTEMPTS(default3)STACKIT_FOUNDATION_APPLY_RETRY_DELAY_SECONDS(default30)
- If the final retry still fails:
- check
artifacts/infra/stackit_foundation_apply.env - re-run
make infra-stackit-foundation-apply - if the PostgreSQL instance is visible in STACKIT but Terraform still cannot reconcile it, stop and inspect provider/service health before running destroy
- check
- Helm repository updates are retried with bounded backoff in shared tooling.
- Tune retry budget when running on constrained CI runners or unstable networks:
HELM_REPO_UPDATE_RETRY_MAX_ATTEMPTS(default3)HELM_REPO_UPDATE_RETRY_BASE_DELAY_SECONDS(default2)HELM_REPO_UPDATE_RETRY_MAX_DELAY_SECONDS(default20)HELM_REPO_UPDATE_RETRY_BACKOFF_MULTIPLIER(default2)
- Example:
HELM_REPO_UPDATE_RETRY_MAX_ATTEMPTS=5 HELM_REPO_UPDATE_RETRY_BASE_DELAY_SECONDS=3 make infra-deploy
- Regenerate runtime secret contract from foundation outputs:
make infra-stackit-foundation-seed-runtime-secret
- Verify state artifact:
artifacts/infra/stackit_foundation_runtime_secret.env
- Re-run runtime deploy:
make infra-stackit-runtime-deploy
- Run the canonical reconciliation command directly:
make auth-reconcile-runtime-identity
- Inspect reconciliation state:
artifacts/infra/runtime_credentials_eso_reconcile.envartifacts/infra/argocd_repo_credentials_reconcile.envartifacts/infra/runtime_identity_reconcile.env
- Common failure modes:
- source secret missing:
- seed with
RUNTIME_CREDENTIALS_SOURCE_SECRET_LITERALS='username=...,password=...'before rerunning - or create/manage the source secret with your provider-backed store path
- seed with
ExternalSecretnotReady=True:- confirm ESO CRDs are established (
clustersecretstores.external-secrets.io,externalsecrets.external-secrets.io) - confirm the referenced store (
runtime-credentials-source-store) exists and authenticates correctly
- confirm ESO CRDs are established (
- target secret missing keys:
- verify
RUNTIME_CREDENTIALS_TARGET_SECRET_KEYSmatches the key contract expected by workloads - verify the source secret contains those keys
- verify
- source secret missing:
For operator workflow details, see Runtime Credentials (ESO).
infra-stackit-runtime-prerequisiteswaits for the SKE API hostname to resolve and for/readyzto answer before the firstkubectl apply.- If it times out on hostname resolution, verify the operator machine can resolve the SKE endpoint handed out in the kubeconfig:
python3 - <<'PY'import socketsocket.getaddrinfo("api.<cluster>.<suffix>.ske.<region>.onstackit.cloud", None)PY- or
dig +short <host>
- If resolution fails from your workstation:
- confirm you ran
make blueprint-init-repobefore the first STACKIT bootstrap so backend and tfvars placeholders are initialized - wait a few minutes and re-run
make infra-stackit-foundation-fetch-kubeconfig - check whether corporate DNS, VPN, or local resolver policy is blocking
*.ske.<region>.onstackit.cloud
- confirm you ran
- Inspect
artifacts/infra/stackit_runtime_prerequisites.envfor the recordedkube_api_serverand readiness status before retrying deploy.
- Run the canonical destroy chain:
make infra-stackit-destroy-all
- The destroy flow performs a best-effort delete of blueprint-managed namespaces before Terraform destroys the SKE cluster:
apps,data,messaging,network,security,observability- controller namespaces such as
argocd,external-secrets, andenvoy-gateway-system
- If a cluster still reports
STATE_DELETINGafter that:- inspect whether Kubernetes access is still available with
kubectl get ns - if access is still available, look for namespaces stuck in
Terminatingand remainingLoadBalancerservices or Gateway resources - then retry
make infra-stackit-destroy-allafter those in-cluster resources are gone
- inspect whether Kubernetes access is still available with