ci: harden remaining staging cleanup + rollback hardening (silent AWS error swallowing, verify-image var-interpolation)

## Context

Follow-up to #1235 (INF-05 fix). That PR correctly tightened the ECR delete loop scope and removed the silent `|| 'cudly'` fallback from `rollback.yml`'s `validate` job. Two adjacent hardening opportunities in the same two files weren't part of INF-05's surgical scope but are the same shape of risk.

## 1. `cleanup-staging.yml`: AWS error is silently swallowed at the ECR cleanup loop's edge

Both copies of the loop (lines 78-95 and 148-165) start with:

```bash
for REPO in $(aws ecr describe-repositories \
  --query "repositories[?starts_with(repositoryName,'cudly-staging')].repositoryName" \
  --output text 2>/dev/null); do
```

`2>/dev/null` swallows any AWS error (auth failure, network, throttle, region typo). On error, the command substitution evaluates to empty, the `for` loop body never executes, the post-PR defense-in-depth `case` guard never fires, and the workflow proceeds to the `terraform destroy` step. The error surfaces later (during `destroy`) but at the wrong layer — it looks like a destroy problem, not a credential/perm problem.

Cleaner shape: capture the listing into a temp var with `set -o pipefail` semantics, and `exit 1` on listing failure. The cleanup is already gated on a typed "destroy" confirmation, so failing loud here is the right move (memory `feedback_no_silent_fallbacks.md`).

## 2. `rollback.yml` `verify-image` step still uses inline `${{ vars.ECR_REPOSITORY }}` instead of the env-var pattern

In #1235, the `validate` job was hardened from `${{ vars.ECR_REPOSITORY }}` inline-interpolation to a proper env-var indirection:

```yaml
env:
  ECR_REPOSITORY: ${{ vars.ECR_REPOSITORY }}
run: |
  ...
  IMAGE_URI="...${ECR_REPOSITORY}:..."
```

The parallel `verify-image > Verify AWS image` step (`.github/workflows/rollback.yml:133-154`) was given the same presence check but kept the inline pattern:

```yaml
- name: Verify AWS image
  if: startsWith(inputs.cloud, 'aws')
  run: |
    IMAGE_TAG="${{ inputs.image_tag }}"
    REPO="${{ vars.ECR_REPOSITORY }}"
    ...
    aws ecr describe-images \
        --repository-name $REPO \
        --image-ids imageTag=$IMAGE_TAG \
```

Repo variables come from a closed admin allowlist, so the realistic exploit surface is small, but the inconsistency within the same file is a footgun for future copy-paste. Two micro-changes for consistency with #1235:

- Move `vars.ECR_REPOSITORY` (and arguably `vars.AWS_REGION`) into a step-level `env:` block; reference as `${ECR_REPOSITORY}` / `${AWS_REGION}` in the script body.
- Quote `"$REPO"` and `"$IMAGE_TAG"` in the `aws ecr describe-images` invocation (both are shell-safe today thanks to the `^[a-zA-Z0-9._-]+$` regex on `inputs.image_tag`, but quoting is free defense-in-depth).

## Why now

Both are low-severity hygiene items in the same two files PR #1235 already opened. Bundling them with the same blast-radius rationale (cleanup of force-delete loops + the rollback path) keeps the audit trail tight.

## Out of scope

- GCP `--filter="name:cudly-staging"` in `destroy-gcp` uses GCP filter-substring semantics; a tighter regex/exact filter would parallel the same hardening shape but is owned by a separate cleanup path. Not bundling here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: harden remaining staging cleanup + rollback hardening (silent AWS error swallowing, verify-image var-interpolation) #1330

Context

1. `cleanup-staging.yml`: AWS error is silently swallowed at the ECR cleanup loop's edge

2. `rollback.yml` `verify-image` step still uses inline `${{ vars.ECR_REPOSITORY }}` instead of the env-var pattern

Why now

Out of scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

ci: harden remaining staging cleanup + rollback hardening (silent AWS error swallowing, verify-image var-interpolation) #1330

Description

Context

1. cleanup-staging.yml: AWS error is silently swallowed at the ECR cleanup loop's edge

2. rollback.yml verify-image step still uses inline ${{ vars.ECR_REPOSITORY }} instead of the env-var pattern

Why now

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `cleanup-staging.yml`: AWS error is silently swallowed at the ECR cleanup loop's edge

2. `rollback.yml` `verify-image` step still uses inline `${{ vars.ECR_REPOSITORY }}` instead of the env-var pattern