Skip to content

fix(ci): restrict staging ECR cleanup to cudly-staging repos#1235

Open
cristim wants to merge 1 commit into
mainfrom
ci/inf-05-fix
Open

fix(ci): restrict staging ECR cleanup to cudly-staging repos#1235
cristim wants to merge 1 commit into
mainfrom
ci/inf-05-fix

Conversation

@cristim

@cristim cristim commented Jun 11, 2026

Copy link
Copy Markdown
Member

Problem

Code review finding INF-05 (P3): the staging cleanup workflow's ECR delete loop matched starts_with(repositoryName,'cudly-staging') || repositoryName=='cudly' and then ran aws ecr delete-repository --force. The bare cudly name is exactly the default production repository used by rollback.yml (${{ vars.ECR_REPOSITORY || 'cudly' }}), leaving a hardcoded prod-adjacent name inside a force-delete loop. The == 'cudly' clause was a one-off for the legacy pre-suffix repo and has served its purpose; the rollback default is stale relative to the suffixed cudly-prod-<hex> Terraform naming scheme.

Fix

  • .github/workflows/cleanup-staging.yml (both AWS jobs): drop the repositoryName=='cudly' clause so only cudly-staging* repos are queried, and add an in-loop case guard that refuses (exit 1) any repository name not matching the staging pattern, so a future filter change cannot silently widen the blast radius.
  • .github/workflows/rollback.yml: remove the silent || 'cudly' fallback for vars.ECR_REPOSITORY. The validate job now fails the rollback loudly when the variable is unset instead of guessing a repo name, and the verify-image step double-checks it.

Test evidence

  • actionlint passes on both edited workflows.
  • Shell simulation of the guard: cudly and cudly-prod-ab12 are REFUSED, cudly-staging and cudly-staging-ab12 are deleted.
  • Pre-fix, the JMESPath filter explicitly matched the bare cudly repo; post-fix it is excluded by the query and rejected by the guard.

Closes #1185

@cristim cristim added triaged Item has been triaged priority/p3 Polish / idea / may never ship severity/low Minor harm urgency/eventually No deadline impact/internal Team-internal only effort/xs Trivial / one-liner type/chore Maintenance / non-user-visible labels Jun 11, 2026
@cristim

cristim commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai review

@cristim cristim added triaged Item has been triaged priority/p3 Polish / idea / may never ship severity/low Minor harm urgency/eventually No deadline impact/internal Team-internal only effort/xs Trivial / one-liner type/chore Maintenance / non-user-visible labels Jun 11, 2026
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

@cristim, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 1 minute and 18 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8d23b26b-861a-4f9d-9613-8e381dcf6b15

📥 Commits

Reviewing files that changed from the base of the PR and between 451a70f and 8c61c05.

📒 Files selected for processing (2)
  • .github/workflows/cleanup-staging.yml
  • .github/workflows/rollback.yml
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci/inf-05-fix

Comment @coderabbitai help to get the list of available commands.

The staging cleanup workflow force-deleted any ECR repository named
exactly 'cudly', which is also the default production repository name
in rollback.yml. Drop the legacy '== cudly' clause from the JMESPath
filter (the one-off pre-suffix teardown it served is done) and add an
in-loop guard that refuses to delete any repository not matching the
cudly-staging* naming pattern, so a future filter change cannot widen
the blast radius silently.

Also remove the stale 'cudly' fallback for vars.ECR_REPOSITORY in
rollback.yml: the validate job now fails loudly when the variable is
unset instead of guessing a repository name that diverges from the
suffixed cudly-prod-<hex> Terraform naming scheme.

Validated with actionlint and a shell simulation of the guard
(bare 'cudly' and cudly-prod-* refused, cudly-staging* deleted).

Closes #1185
@cristim

cristim commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@cristim

cristim commented Jun 26, 2026

Copy link
Copy Markdown
Member Author

Adversarial review (PR #1235)

Reviewed against the brief's risk surfaces (filter correctness, sibling-cleanup parity, no-staging-match behavior, dry-run, immediate-vs-deferred delete, base-branch + migration hygiene). The two-file change is sound, INF-05's blast-radius bug is closed, and the defense-in-depth guard is correct. One in-scope-adjacent hardening gap on the rollback side filed as a follow-up.

Verified

  • Filter scoping vs the live TF naming: locals.stack_name = "${var.project_name}-${var.environment}-${random_id.suffix.hex}" (terraform/environments/aws/main.tf:55) with project_name = "cudly" and environment ∈ {dev,staging,prod} (github-*.tfvars) produces cudly-staging-<8hex>, cudly-prod-<8hex>, cudly-dev-<8hex>. The new starts_with(repositoryName,'cudly-staging') JMESPath:
    • MATCHES: cudly-staging (legacy bare), cudly-staging-<hex> (TF-generated). ✓ intended
    • REFUSES: cudly-prod-<hex>, cudly-dev-<hex>, the bare cudly (the INF-05 trigger). ✓
  • Defense-in-depth case "$REPO" guard: traced every alternation:
    cudly                → REFUSE (exit 1)
    cudly-staging        → DELETE
    cudly-staging-ab12   → DELETE
    cudly-prod-ab12      → REFUSE
    cudly-prod           → REFUSE
    cudly-dev-x          → REFUSE
    
    Two layers (JMESPath query + bash case) match the same prefix; defense-in-depth catches a future filter loosening, not a current bug. ✓
  • Both AWS jobs patched identically: destroy-aws-lambda (line 77-95) and destroy-aws-fargate (line 147-165) carry the same query + guard. No copy-paste drift. ✓
  • Sibling cleanups:
  • No-match behaviour: the for loop iterates zero times when nothing matches — correct no-op (nothing to force-delete; terraform destroy handles the destroy proper). No silent "default to everything" path. ✓
  • Delete shape: still aws ecr delete-repository --force (immediate, zero recovery window). This is intentional for a one-off teardown gated on a typed destroy confirmation; deferred lifecycle policies would not actually destroy the repo. Same shape as before, just correctly scoped now. ✓
  • rollback.yml validate hardening: vars.ECR_REPOSITORY is now plumbed through env: and consumed as ${ECR_REPOSITORY} in the shell, eliminating both the silent || 'cudly' fallback (feedback_no_silent_fallbacks.md) and the inline-interpolation script-injection footgun. Empty-var path sets IS_VALID=falseverify-image skips (guarded by if: needs.validate.outputs.is_valid == 'true') → AWS rollback jobs skip via needs-propagation. ✓
  • rollback.yml verify-image hardening: silent || 'cudly' removed and replaced with explicit "❌ ECR_REPOSITORY repository variable is not set" + exit 1. ✓ Fail-loud at the rollback path's edge instead of guessing.
  • Confirmation gate intact: guard job still requires the literal "destroy" string (cleanup-staging.yml:41-44); all destroy jobs needs: guard. ✓
  • actionlint: clean on both edited workflows (exit 0).
  • Branch hygiene: rebased onto origin/main (451a70f73), 0 behind / 1 ahead. PR touches no DB migrations, so the 074↔077 collision shape from feedback_project_migration_number_collisions.md is N/A here. ✓
  • No new magic strings or hardcoded values introduced that aren't already enforced by TF naming (feedback_no_hardcoded_magic_values.md). The literal cudly-staging lives in three places (JMESPath query, case guard, RDS describe-instances query) all reflecting the same TF env value; that consistency is the contract.

Follow-up filed (out of scope here)

UNSTABLE state

The three failing checks on this PR are all pre-existing on main itself (last 5 CI - Build & Test runs on main are all failure):

None reachable from a workflow-only diff. Not blocking.

Verdict

LGTM. CR re-ping posted (@coderabbitai review).

@cristim

cristim commented Jun 26, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

@cristim Thanks for the thorough adversarial pass — agreed, this closes the INF-05 blast-radius issue cleanly.

Your verification around the Terraform-driven naming contract, the duplicated AWS-side guardrails, and the fail-loud rollback behavior adds strong confidence that the scoped cleanup is now doing the right thing. Also noted on #1330 as the right place for the remaining low-severity hardening items outside this PR’s scope.

(ᵔᴥᵔ)ノ🐇


✏️ Learnings added
Learnt from: cristim
URL: https://github.com/LeanerCloud/CUDly/pull/1235

Timestamp: 2026-06-26T17:38:02.822Z
Learning: In `terraform/environments/aws/main.tf`, the AWS stack naming contract is `${var.project_name}-${var.environment}-${random_id.suffix.hex}` with `project_name = "cudly"` and environments `dev`, `staging`, and `prod`, producing names like `cudly-staging-<8hex>`, `cudly-prod-<8hex>`, and `cudly-dev-<8hex>`.

Learnt from: cristim
URL: https://github.com/LeanerCloud/CUDly/pull/1235

Timestamp: 2026-06-26T17:38:02.822Z
Learning: For the staging cleanup and related AWS resources in this repository, the literal `cudly-staging` is an intentional contract reflected consistently across the ECR JMESPath query, the bash `case "$REPO"` guard in `.github/workflows/cleanup-staging.yml`, and the RDS describe-instances query; this is aligned with the Terraform environment naming rather than being an arbitrary hardcoded magic string.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

effort/xs Trivial / one-liner impact/internal Team-internal only priority/p3 Polish / idea / may never ship severity/low Minor harm triaged Item has been triaged type/chore Maintenance / non-user-visible urgency/eventually No deadline

Projects

None yet

Development

Successfully merging this pull request may close these issues.

INF-05: Staging cleanup force-deletes the ECR repo named exactly 'cudly', the default prod repo in rollback.yml

1 participant