chore(infra/website): tf-apply pipeline + restore Route53-record management#1580
chore(infra/website): tf-apply pipeline + restore Route53-record management#1580
Conversation
…tover The Phase-4 cutover gate (manage_apex_records, manage_www_records = false) was for the migration to S3+CloudFront in #1470. The records were imported into state during that cutover, but the variable defaults stayed false — so `terraform plan` has been showing 4 production DNS records (iii.dev A/AAAA + www.iii.dev A/AAAA) as "will be destroyed" because count is now 0 while the resources still exist in state. Flip the defaults to true. State and config now agree; plan no longer proposes destroying the apex/www records. Flag retained as an escape hatch for emergency rollback. Refs: #1470 (Phase-4 S3+CloudFront cutover)
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughAdds a GitHub Actions “Terraform Apply” workflow and a dedicated OIDC IAM role for tf-apply; exposes the role ARN as a Terraform output; introduces Changes
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer (push / dispatch)
participant GH as GitHub Actions
participant OIDC as GitHub OIDC
participant AWS as AWS STS/IAM
participant TF as Terraform (in repo)
participant AWSResources as AWS (Route53, etc.)
Dev->>GH: push to main or manual dispatch (optional ref)
GH->>OIDC: request OIDC token (sub includes github_tf_apply_environment)
OIDC->>AWS: present token to assume role
AWS->>GH: return temporary credentials (assume role github_tf_apply)
GH->>TF: run terraform init && terraform apply (with creds)
TF->>AWSResources: modify infra (Route53, IAM, etc.)
TF-->>GH: stream apply output
GH->>GH: capture and truncate apply.txt, post summary in workflow
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 6/8 reviews remaining, refill in 8 minutes and 40 seconds.Comment |
Terraform plan —
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
infra/terraform/website/variables.tf (1)
68-71: ⚡ Quick winClarify rollback order to make the “no-destroy” path unambiguous.
The current wording can be read as “flip then later state rm.” Please state explicitly that state removal must happen before any apply after flipping to
false, otherwise records can still be planned for deletion.Proposed wording update
- # rollback — set to false to release ownership without destroying the records - # (use `terraform state rm` after flipping). + # rollback: set to false, then remove these records from Terraform state + # before any `terraform apply` (`terraform state rm ...`) to release + # ownership without planning deletion. - # Phase 4 cutover complete; same situation as manage_apex_records. Decoupled - # from apex so the two can be released independently if ever needed. + # Phase 4 cutover complete; same as manage_apex_records. Decoupled from apex + # so each can be released independently (set false + `terraform state rm` + # before apply).Also applies to: 78-79
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@infra/terraform/website/variables.tf` around lines 68 - 71, Update the comment that documents the escape-hatch flag in variables.tf (the "Phase 4 cutover" flag) to explicitly require running `terraform state rm` before performing any `terraform apply` after flipping the flag to false; reword the line that currently says "(use `terraform state rm` after flipping)" to clearly state "first run `terraform state rm` to remove the records from state, then you may run any `terraform apply`; do not run `terraform apply` before removing the records or they may be planned for deletion." Apply the same clarification to the duplicate comment instance (the occurrence at the second block currently around lines 78-79).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@infra/terraform/website/variables.tf`:
- Around line 68-71: Update the comment that documents the escape-hatch flag in
variables.tf (the "Phase 4 cutover" flag) to explicitly require running
`terraform state rm` before performing any `terraform apply` after flipping the
flag to false; reword the line that currently says "(use `terraform state rm`
after flipping)" to clearly state "first run `terraform state rm` to remove the
records from state, then you may run any `terraform apply`; do not run
`terraform apply` before removing the records or they may be planned for
deletion." Apply the same clarification to the duplicate comment instance (the
occurrence at the second block currently around lines 78-79).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e4f1e9af-d73e-4278-8e5b-7b0d17fc9c55
📒 Files selected for processing (1)
infra/terraform/website/variables.tf
Two changes: 1. Strip stale `# Leave false until Phase 4 cutover…` comments from manage_apex_records / manage_www_records — the variable description field already says what they do, and the cutover narrative they preserved is no longer load-bearing. 2. Add a tf-apply pipeline (`.github/workflows/tf-apply.yml`) so changes under `infra/terraform/website/` actually deploy on merge to main. Previously only `tf-plan.yml` ran on PRs and applies were manual, which is how the cleanUrls fix sat unapplied for hours after #1576 merged. - New IAM role `iii-website-prod-github-tf-apply` (AdministratorAccess, trust narrowly scoped to a new `iii-website-prod-tf-apply` env so repo settings can require reviewers without gating routine S3 deploys). - Workflow runs on push to main + workflow_dispatch, uses concurrency `tf-apply-website` to serialize applies, captures output to the job summary. Bootstrap (one-time, manual): AWS_PROFILE=motia-prod terraform apply → grab `github_tf_apply_role_arn` from outputs → set repo secret `AWS_TF_APPLY_ROLE_ARN` → create `iii-website-prod-tf-apply` GitHub environment with required reviewers
Terraform plan —
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/tf-apply.yml:
- Around line 69-87: The job currently writes the full terraform apply output
(the APPLY env variable) into $GITHUB_STEP_SUMMARY; change this to avoid
exposing raw logs by replacing the line that echoes "${APPLY:-(no apply output
captured)}" with a short status message (e.g., "Apply completed; see the job log
for full output.") or a redacted/trimmed version, and ensure the modified output
still appends to $GITHUB_STEP_SUMMARY rather than the raw APPLY content; update
the same block that builds the Job summary (the echo lines around APPLY and the
surrounding details tags) so only the safe status text or link is emitted.
In `@infra/terraform/website/iam_github_oidc.tf`:
- Around line 123-126: The role aws_iam_role.github_tf_apply is currently
attached to the overly-broad AWS managed AdministratorAccess via
aws_iam_role_policy_attachment.github_tf_apply_admin; replace this with a scoped
policy by creating a least-privilege aws_iam_policy (e.g.,
aws_iam_policy.github_tf_apply_policy) that only grants the specific
actions/resources the website Terraform needs (state S3 bucket, DynamoDB lock
table, CloudFront/S3 deploy, Route53, etc.), then update
aws_iam_role_policy_attachment.github_tf_apply_admin to use policy_arn =
aws_iam_policy.github_tf_apply_policy. Alternatively, split into separate
narrower roles if apply and other workflows need different scopes and reference
those role names in place of aws_iam_role.github_tf_apply.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: bda7eb0d-6e9f-4447-8357-ad854896fd42
📒 Files selected for processing (4)
.github/workflows/tf-apply.ymlinfra/terraform/website/iam_github_oidc.tfinfra/terraform/website/outputs.tfinfra/terraform/website/variables.tf
🚧 Files skipped from review as they are similar to previous changes (1)
- infra/terraform/website/variables.tf
| - name: Job summary | ||
| if: always() | ||
| env: | ||
| APPLY: ${{ steps.apply.outputs.apply }} | ||
| run: | | ||
| { | ||
| echo "## terraform apply — \`infra/terraform/website\`" | ||
| echo | ||
| echo "- Commit: \`${{ github.sha }}\`" | ||
| echo "- Ref: \`${{ inputs.ref || github.ref }}\`" | ||
| echo | ||
| echo '<details><summary>Apply output</summary>' | ||
| echo | ||
| echo '```' | ||
| echo "${APPLY:-(no apply output captured)}" | ||
| echo '```' | ||
| echo | ||
| echo '</details>' | ||
| } >> "$GITHUB_STEP_SUMMARY" |
There was a problem hiding this comment.
Trim the raw apply log from the job summary.
Writing the full terraform apply output into $GITHUB_STEP_SUMMARY makes it more durable and easier to skim than the job log, which increases the chance that sensitive diffs or provider error details get surfaced unnecessarily. Consider replacing it with a short status message and a link back to the run logs, or redacting the captured output first.
Suggested change
- echo '```'
- echo "${APPLY:-(no apply output captured)}"
- echo '```'
+ echo "Apply completed; see the job log for full output."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/tf-apply.yml around lines 69 - 87, The job currently
writes the full terraform apply output (the APPLY env variable) into
$GITHUB_STEP_SUMMARY; change this to avoid exposing raw logs by replacing the
line that echoes "${APPLY:-(no apply output captured)}" with a short status
message (e.g., "Apply completed; see the job log for full output.") or a
redacted/trimmed version, and ensure the modified output still appends to
$GITHUB_STEP_SUMMARY rather than the raw APPLY content; update the same block
that builds the Job summary (the echo lines around APPLY and the surrounding
details tags) so only the safe status text or link is emitted.
| resource "aws_iam_role_policy_attachment" "github_tf_apply_admin" { | ||
| role = aws_iam_role.github_tf_apply.name | ||
| policy_arn = "arn:aws:iam::aws:policy/AdministratorAccess" | ||
| } |
There was a problem hiding this comment.
Replace AdministratorAccess with a scoped policy.
Attaching the AWS managed admin policy gives this GitHub OIDC role full account-wide power, which is far broader than the website module needs and makes a workflow compromise much more damaging. Please scope this to the Terraform resources the module actually manages, or split out a narrower apply role.
🧰 Tools
🪛 Checkov (3.2.525)
[high] 123-126: Disallow IAM roles, users, and groups from using the AWS AdministratorAccess policy
(CKV_AWS_274)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@infra/terraform/website/iam_github_oidc.tf` around lines 123 - 126, The role
aws_iam_role.github_tf_apply is currently attached to the overly-broad AWS
managed AdministratorAccess via
aws_iam_role_policy_attachment.github_tf_apply_admin; replace this with a scoped
policy by creating a least-privilege aws_iam_policy (e.g.,
aws_iam_policy.github_tf_apply_policy) that only grants the specific
actions/resources the website Terraform needs (state S3 bucket, DynamoDB lock
table, CloudFront/S3 deploy, Route53, etc.), then update
aws_iam_role_policy_attachment.github_tf_apply_admin to use policy_arn =
aws_iam_policy.github_tf_apply_policy. Alternatively, split into separate
narrower roles if apply and other workflows need different scopes and reference
those role names in place of aws_iam_role.github_tf_apply.
Summary
Two coupled changes that together make
infra/terraform/website/actually applyable from CI without nuking prod DNS.1.
Plan: 4 to destroy→0 to destroyterraform planagainst prod was showing 4 production DNS records as will be destroyed:manage_apex_recordsandmanage_www_recordsdefaulted tofalse— gates from #1470's Phase-4 cutover. The records were imported into state during cutover (Terraform now owns them), but the defaults stayedfalse→count = 0while state has 1 → plan wants to destroy.Cutover has been live since 2026-04. Flip both defaults to
true. Stripped the now-stale comment blocks above each variable (thedescriptionfield already says what the flags do).2. tf-apply pipeline
There was no apply pipeline.
tf-plan.ymlruns on PRs but applies were fully manual — which is why the cleanUrls fix from #1576 sat unapplied for hours after merge.New
.github/workflows/tf-apply.yml:maintouchinginfra/terraform/website/**or the workflow itself, plusworkflow_dispatchfor ad-hoc applies on a chosen ref.tf-apply-websitewithcancel-in-progress: falseso we never interrupt an in-flight apply.iii-website-prod-github-tf-applyIAM role assumed via OIDC.AdministratorAccess(the trust scope is the safety boundary). Trust is narrowly conditioned onrepo:iii-hq/iii:environment:iii-website-prod-tf-apply— a new GitHub environment, separate from the existingiii-website-prodenv, so repo settings can gate applies behind required reviewers without slowing routine S3 syncs.Plan against prod
AWS_PROFILE=motia-prod terraform plan— three iterations:1 add, 1 change, 4 destroy🚨1 add, 1 change, 0 destroy2 add, 0 change, 0 destroyThe earlier
1 add(recreatingaws_sns_topic_subscription.email) and1 change(cosmetic CF Function reformat from a local-formatter-touched targeted apply) settled out by the third plan. Everything left is the two new IAM resources for the tf-apply role.Test plan
terraform validate,terraform fmt -check -recursiveclean.terraform planagainst prod →2 to add, 0 to change, 0 to destroy.AWS_PROFILE=motia-prod terraform applyfrom a laptop — creates the new role.github_tf_apply_role_arnfrom outputs; set as repo secretAWS_TF_APPLY_ROLE_ARN.iii-website-prod-tf-applyGitHub environment in repo settings; add required reviewers if you want manual approval gating.infra/terraform/website/**.Summary by CodeRabbit
New Features
Chores
Security