feat(infra): migrate iii.dev website to AWS S3 + CloudFront (Phase 2+3)#1470
feat(infra): migrate iii.dev website to AWS S3 + CloudFront (Phase 2+3)#1470
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds GitHub Actions workflows and a new Terraform website module provisioning S3, CloudFront (with a viewer-request CloudFront Function and unit tests), ACM DNS validation, Route53 records, IAM OIDC roles for GitHub, observability (SNS/CloudWatch alarms), bootstrap resources, docs, lockfiles, and related CI steps for build, deploy, smoke tests, and tf plan. Changes
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer
participant GH as GitHub Actions
participant TF as Terraform
participant AWS as AWS (S3/CloudFront/ACM/Route53/IAM/SNS)
participant Client as Browser
Dev->>GH: Push PR / merge / workflow_dispatch
GH->>TF: Run tf-plan or deploy jobs (assume role via OIDC)
TF->>AWS: Create/modify infra (S3, CloudFront, ACM, Route53, IAM, SNS)
GH->>AWS: Build & sync website assets to S3, create CloudFront invalidation
Client->>AWS: Request iii.dev
AWS->>AWS: CloudFront receives request
AWS->>AWS: CloudFront Function (viewer-request) executes redirects / SPA rewrite
alt /docs* match
AWS->>Client: 301 redirect to docs.iii.dev...
else host is www
AWS->>Client: 301 redirect to https://iii.dev/...
else extensionless route
AWS->>AWS: Rewrite to /index.html and serve
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (6)
infra/terraform/_bootstrap/main.tf (1)
70-79: Consider enabling Point-in-Time Recovery for the lock table.While the lock table only stores transient lock metadata, enabling PITR adds resilience at minimal cost for a PAY_PER_REQUEST table. This is optional but aligns with AWS security best practices.
♻️ Optional: Enable PITR
resource "aws_dynamodb_table" "terraform_locks" { name = local.lock_table_name billing_mode = "PAY_PER_REQUEST" hash_key = "LockID" attribute { name = "LockID" type = "S" } + + point_in_time_recovery { + enabled = true + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@infra/terraform/_bootstrap/main.tf` around lines 70 - 79, The DynamoDB resource aws_dynamodb_table.terraform_locks should enable Point-in-Time Recovery; update the resource block for terraform_locks to add a point_in_time_recovery configuration with enabled = true so PITR is turned on for the lock table (keep billing_mode = "PAY_PER_REQUEST" and existing attributes intact).infra/terraform/website/README.md (1)
6-8: Reference to local workspace path may confuse readers.The reference to
~/.claude/plans/lazy-nibbling-valiant.mdpoints to a local file in the reviewer's workspace that won't exist for other team members. Consider either removing this reference or adding the relevant context directly to this README.📝 Proposed fix
-Context and rationale: see the full plan at -`~/.claude/plans/lazy-nibbling-valiant.md` in the reviewer's workspace, or read -`website/README.md` after the migration lands. +Context and rationale: this migration moves iii.dev from the k8s edge-proxy → Vercel +path to AWS S3 + CloudFront for improved performance, cost control, and operational +simplicity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@infra/terraform/website/README.md` around lines 6 - 8, Remove the reference to the reviewer’s local file path (~/.claude/plans/lazy-nibbling-valiant.md) from the README and replace it with either a brief summary of the migration plan or a link to a repository-available document (e.g., website/README.md or a docs/ or CONFLUENCE URL) so other team members can access the context; update the sentence that currently reads "Context and rationale: see the full plan at `~/.claude/...`" to point to the new location or include a concise summary of the key rationale directly in this README..github/workflows/smoke.yml (1)
45-54:assert_redirect_todoesn't verify the HTTP status code.The function checks that the
Locationheader matches the expected value but doesn't confirm that the response is actually a 3xx redirect. A misconfigured server could return a 200 with aLocationheader.♻️ Proposed fix to verify redirect status
assert_redirect_to() { local path="$1" expected="$2" - local loc - loc=$(curl -sSI "$BASE_URL$path" | awk 'BEGIN{IGNORECASE=1} /^location:/ {print $2}' | tr -d '\r\n') - if [ "$loc" = "$expected" ]; then + local headers status loc + headers=$(curl -sSI "$BASE_URL$path") + status=$(echo "$headers" | awk '/^HTTP\// {print $2}' | head -1) + loc=$(echo "$headers" | awk 'BEGIN{IGNORECASE=1} /^location:/ {print $2}' | tr -d '\r\n') + if [ "$status" = "301" ] && [ "$loc" = "$expected" ]; then pass "$path → 301 $loc" + elif [ "$status" != "301" ]; then + fail "$path → expected 301, got $status" else fail "$path → expected redirect to $expected, got '$loc'" fi }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/smoke.yml around lines 45 - 54, The assert_redirect_to function currently only checks the Location header; update it to also retrieve and validate the HTTP status code returned by the request (e.g., using curl to capture the status line or http code) and ensure it is a 3xx redirect (status begins with "3") before asserting the Location value; if the status is not 3xx, call fail with a message including the unexpected status and any Location seen, otherwise proceed to compare Location and pass on success.infra/terraform/website/cloudfront.tf (1)
151-165: Tighten/api/search*to the verbs the client actually uses.Allowing
PUT,POST,PATCH, andDELETEon a public marketing-site route expands the surface area for this temporary origin. If the frontend only reads search results, keep this behavior toGET,HEAD, andOPTIONS.If this endpoint is read-only, the CloudFront behavior can be reduced to:
- allowed_methods = ["GET", "HEAD", "OPTIONS", "PUT", "POST", "PATCH", "DELETE"] + allowed_methods = ["GET", "HEAD", "OPTIONS"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@infra/terraform/website/cloudfront.tf` around lines 151 - 165, The ordered_cache_behavior for path_pattern "/api/search*" currently allows dangerous write methods; update the allowed_methods in that ordered_cache_behavior (the block named ordered_cache_behavior where path_pattern = "/api/search*") to only include the read methods ["GET", "HEAD", "OPTIONS"] (leave cached_methods as ["GET","HEAD"] and keep the existing cache_policy_id, origin_request_policy_id, response_headers_policy_id, and compress settings unchanged).infra/terraform/website/s3.tf (1)
5-11: Add a lifecycle policy for noncurrent deploy artifacts.Versioning here is good, but the deploy workflow syncs
website/dist/with--delete, so every replaced or removed asset becomes a noncurrent version that is retained forever. Add a lifecycle rule to expire noncurrent versions and stale delete markers before this bucket grows without bound.Suggested Terraform change
+resource "aws_s3_bucket_lifecycle_configuration" "site" { + bucket = aws_s3_bucket.site.id + + rule { + id = "expire-noncurrent-versions" + status = "Enabled" + + noncurrent_version_expiration { + noncurrent_days = 30 + } + + expiration { + expired_object_delete_marker = true + } + } +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@infra/terraform/website/s3.tf` around lines 5 - 11, The bucket currently enables versioning via aws_s3_bucket_versioning.site but lacks a lifecycle policy to expire noncurrent versions and stale delete markers; add an aws_s3_bucket_lifecycle_configuration resource tied to aws_s3_bucket.site that defines a rule to expire noncurrent versions (noncurrent_version_expiration) after a reasonable number of days (e.g., 30/90) and to expire expired_delete_markers = true so removed/overwritten objects and delete markers are cleaned up; ensure the lifecycle rule is enabled and scoped to the same bucket so the noncurrent versions from the deploy sync won’t be retained forever..github/workflows/deploy-website.yml (1)
48-50: Usenpm cifor the production build step.With a committed lockfile already in play,
npm cimakes deploys deterministic and fails fast on lock drift.npm installis looser than necessary for a production publish job. Please get workflow-owner approval before changing this workflow.As per coding guidelines `.github/**/*.{yml,yaml}`: Request approval before making changes to CI/CD workflows (`.github/`).Suggested workflow change
- - name: Install dependencies - working-directory: website - run: npm install --include=dev + - name: Install dependencies + working-directory: website + run: npm ci🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/deploy-website.yml around lines 48 - 50, Replace the "Install dependencies" step's run command that currently uses "npm install --include=dev" with "npm ci" (or "npm ci --include=dev" if dev deps are required for the build) to ensure deterministic installs based on the lockfile; update the step that contains the run command string and obtain workflow-owner approval before committing this change to the .github workflow as per CI/CD policy.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/cloudfront-functions-test.yml:
- Around line 1-27: The new GitHub Actions workflow "CloudFront Functions Tests"
(job "test", step "Run CloudFront Function unit tests") modifies CI/CD
configuration under .github and therefore requires an explicit approval before
merge; obtain an explicit approval from the appropriate reviewer/owner and
record it in the PR by either adding an official GitHub Review approval from
that approver or inserting a clear approval note in the PR description
(including approver handle and timestamp) so the required signal for this
workflow change is visible prior to merging.
In @.github/workflows/tf-plan.yml:
- Around line 68-89: The injected Terraform plan can contain backticks or `${`
sequences that break the template literal; replace the current line that builds
plan using a template literal (const plan = `${{ steps.plan.outputs.plan }}`;)
with a JSON-encoded injection so JS receives a safe string: use GitHub Actions'
toJson/JSON encoding (e.g. const plan = ${{ toJson(steps.plan.outputs.plan) }};
or const plan = JSON.parse('${{ toJson(steps.plan.outputs.plan) }}'); ), leaving
the rest of the logic (truncated, body, github.rest.issues.createComment)
unchanged so the plan is safely escaped and cannot break interpolation.
In `@infra/terraform/website/cloudfront_functions/redirects.js`:
- Around line 47-58: The redirect handler currently hardcodes "docs.iii.dev" in
the redirect(...) calls (see checks on uri and uri.indexOf('/docs/') and the
redirect function usages); change the function rendering so the host is injected
from Terraform instead: render redirects.js from a template with templatefile
and pass var.docs_domain (e.g., create cloudfront_functions/redirects.js.tftpl
that exposes a DOCS_DOMAIN placeholder), and update the redirect(...) calls in
redirects.js to use that injected DOCS_DOMAIN (or docs_domain) instead of the
hardcoded host so overrides to var.docs_domain take effect.
- Around line 28-37: The redirect helper currently drops query strings because
CloudFront Functions don't preserve them automatically; update the redirect
function signature (redirect) to accept the incoming request or its querystring
and, when constructing the Location header, append '?' +
event.request.querystring if that string is non-empty. Then update all callers
that perform redirects (the www→apex redirect and the /docs redirect branches
that call redirect) to pass the event.request.querystring (or the whole request)
so tracking/auth params are preserved in the Location header.
In `@infra/terraform/website/iam_github_oidc.tf`:
- Around line 1-20: Add an optional input variable (e.g.,
var.github_oidc_provider_arn) and stop always creating
aws_iam_openid_connect_provider.github; set resource
"aws_iam_openid_connect_provider.github" to create only when the input is empty
(use count or for_each). Create a local (e.g., local.github_oidc_provider_arn)
that selects the supplied var.github_oidc_provider_arn when present else
aws_iam_openid_connect_provider.github[0].arn, and update any trust policies or
principals that reference the provider to use local.github_oidc_provider_arn.
Ensure thumbprint_list and client_id_list remain on the created resource and
document the new variable name that replaces TF_VAR_github_oidc_provider_arn.
- Around line 38-53: The role description claims "main branch + production env"
but the trust policy uses two OR'ed sub values, so either update the
aws_iam_role.github_deploy_website description to state "any ref in the
iii-website-prod environment" or tighten the trust policy in
data.aws_iam_policy_document.github_trust by replacing the combined sub check
with explicit separate conditions: require
token.actions.githubusercontent.com:ref == "refs/heads/main" AND
token.actions.githubusercontent.com:environment == var.github_environment (i.e.,
add two condition entries/tests rather than the current combined sub values) so
the role can only be assumed from the main branch in the production environment.
---
Nitpick comments:
In @.github/workflows/deploy-website.yml:
- Around line 48-50: Replace the "Install dependencies" step's run command that
currently uses "npm install --include=dev" with "npm ci" (or "npm ci
--include=dev" if dev deps are required for the build) to ensure deterministic
installs based on the lockfile; update the step that contains the run command
string and obtain workflow-owner approval before committing this change to the
.github workflow as per CI/CD policy.
In @.github/workflows/smoke.yml:
- Around line 45-54: The assert_redirect_to function currently only checks the
Location header; update it to also retrieve and validate the HTTP status code
returned by the request (e.g., using curl to capture the status line or http
code) and ensure it is a 3xx redirect (status begins with "3") before asserting
the Location value; if the status is not 3xx, call fail with a message including
the unexpected status and any Location seen, otherwise proceed to compare
Location and pass on success.
In `@infra/terraform/_bootstrap/main.tf`:
- Around line 70-79: The DynamoDB resource aws_dynamodb_table.terraform_locks
should enable Point-in-Time Recovery; update the resource block for
terraform_locks to add a point_in_time_recovery configuration with enabled =
true so PITR is turned on for the lock table (keep billing_mode =
"PAY_PER_REQUEST" and existing attributes intact).
In `@infra/terraform/website/cloudfront.tf`:
- Around line 151-165: The ordered_cache_behavior for path_pattern
"/api/search*" currently allows dangerous write methods; update the
allowed_methods in that ordered_cache_behavior (the block named
ordered_cache_behavior where path_pattern = "/api/search*") to only include the
read methods ["GET", "HEAD", "OPTIONS"] (leave cached_methods as ["GET","HEAD"]
and keep the existing cache_policy_id, origin_request_policy_id,
response_headers_policy_id, and compress settings unchanged).
In `@infra/terraform/website/README.md`:
- Around line 6-8: Remove the reference to the reviewer’s local file path
(~/.claude/plans/lazy-nibbling-valiant.md) from the README and replace it with
either a brief summary of the migration plan or a link to a repository-available
document (e.g., website/README.md or a docs/ or CONFLUENCE URL) so other team
members can access the context; update the sentence that currently reads
"Context and rationale: see the full plan at `~/.claude/...`" to point to the
new location or include a concise summary of the key rationale directly in this
README.
In `@infra/terraform/website/s3.tf`:
- Around line 5-11: The bucket currently enables versioning via
aws_s3_bucket_versioning.site but lacks a lifecycle policy to expire noncurrent
versions and stale delete markers; add an aws_s3_bucket_lifecycle_configuration
resource tied to aws_s3_bucket.site that defines a rule to expire noncurrent
versions (noncurrent_version_expiration) after a reasonable number of days
(e.g., 30/90) and to expire expired_delete_markers = true so removed/overwritten
objects and delete markers are cleaned up; ensure the lifecycle rule is enabled
and scoped to the same bucket so the noncurrent versions from the deploy sync
won’t be retained forever.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 351fc787-85d0-47d5-9373-badb29790b3a
📒 Files selected for processing (23)
.github/workflows/cloudfront-functions-test.yml.github/workflows/deploy-website.yml.github/workflows/smoke.yml.github/workflows/tf-plan.ymlinfra/terraform/.gitignoreinfra/terraform/_bootstrap/.terraform.lock.hclinfra/terraform/_bootstrap/README.mdinfra/terraform/_bootstrap/main.tfinfra/terraform/_bootstrap/versions.tfinfra/terraform/website/.terraform.lock.hclinfra/terraform/website/README.mdinfra/terraform/website/acm.tfinfra/terraform/website/cloudfront.tfinfra/terraform/website/cloudfront_functions/redirects.jsinfra/terraform/website/cloudfront_functions/redirects.test.jsinfra/terraform/website/iam_github_oidc.tfinfra/terraform/website/main.tfinfra/terraform/website/observability.tfinfra/terraform/website/outputs.tfinfra/terraform/website/route53.tfinfra/terraform/website/s3.tfinfra/terraform/website/variables.tfinfra/terraform/website/versions.tf
| name: CloudFront Functions Tests | ||
|
|
||
| on: | ||
| pull_request: | ||
| paths: | ||
| - 'infra/terraform/website/cloudfront_functions/**' | ||
| - '.github/workflows/cloudfront-functions-test.yml' | ||
| push: | ||
| branches: [main] | ||
| paths: | ||
| - 'infra/terraform/website/cloudfront_functions/**' | ||
|
|
||
| jobs: | ||
| test: | ||
| name: node --test | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 3 | ||
|
|
||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: '22' | ||
|
|
||
| - name: Run CloudFront Function unit tests | ||
| run: node --test infra/terraform/website/cloudfront_functions/redirects.test.js |
There was a problem hiding this comment.
Please record explicit approval for this .github/ workflow change before merge.
This PR adds a new CI workflow but the required approval signal for CI/CD workflow modifications is not visible in the provided context.
As per coding guidelines, ".github/**/*.{yml,yaml}: Request approval before making changes to CI/CD workflows (.github/)."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/cloudfront-functions-test.yml around lines 1 - 27, The new
GitHub Actions workflow "CloudFront Functions Tests" (job "test", step "Run
CloudFront Function unit tests") modifies CI/CD configuration under .github and
therefore requires an explicit approval before merge; obtain an explicit
approval from the appropriate reviewer/owner and record it in the PR by either
adding an official GitHub Review approval from that approver or inserting a
clear approval note in the PR description (including approver handle and
timestamp) so the required signal for this workflow change is visible prior to
merging.
| function redirect(location) { | ||
| return { | ||
| statusCode: 301, | ||
| statusDescription: 'Moved Permanently', | ||
| headers: { | ||
| location: { value: location }, | ||
| 'cache-control': { value: 'public, max-age=3600' }, | ||
| }, | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
AWS CloudFront Functions viewer-request redirect responses: does CloudFront automatically append the incoming request query string to the Location header, or must the function include it explicitly?
💡 Result:
CloudFront does not automatically append the incoming request query string to the Location header in redirect responses generated by CloudFront Functions in viewer-request events. The function must explicitly construct the Location header value, including the query string if desired, by accessing event.request.querystring and building the string manually.
Citations:
- 1: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/example_cloudfront_functions_redirect_based_on_country_section.html
- 2: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/example-function-redirect-url.html
- 3: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/functions-event-structure.html
- 4: Redirect preserve query string aws-samples/amazon-cloudfront-functions#11
- 5: https://gist.github.com/tforster/09989b4a836815239b31241dfec5192b
- 6: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/function-code-choose-purpose.html
- 7: New Example: Preserve Query Strings on redirect aws-samples/amazon-cloudfront-functions#48
- 8: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/functions-tutorial.html
🏁 Script executed:
#!/bin/bash
# First, find and read the redirects.js file
find . -name "redirects.js" -type f | head -5Repository: iii-hq/iii
Length of output: 113
🏁 Script executed:
# Read the file with line numbers to inspect the code around lines 28-37 and 61-64
wc -l infra/terraform/website/cloudfront_functions/redirects.jsRepository: iii-hq/iii
Length of output: 114
🏁 Script executed:
# Get the full content of the file to understand the redirect function and its usage
cat -n infra/terraform/website/cloudfront_functions/redirects.jsRepository: iii-hq/iii
Length of output: 4159
🏁 Script executed:
# Look for the test file mentioned in line 26
find . -name "redirects.test.js" -type fRepository: iii-hq/iii
Length of output: 118
🏁 Script executed:
# Check all redirect() call sites to confirm querystring is not being passed
rg "redirect\(" infra/terraform/website/cloudfront_functions/redirects.js -A 1Repository: iii-hq/iii
Length of output: 316
🏁 Script executed:
# Check if there are any other CloudFront function files that might show the pattern
fd "cloudfront.*\.js" infra/terraform/website/Repository: iii-hq/iii
Length of output: 36
🏁 Script executed:
# Check the test file to see if it covers query string scenarios
cat -n infra/terraform/website/cloudfront_functions/redirects.test.js | head -100Repository: iii-hq/iii
Length of output: 4304
🏁 Script executed:
# Continue reading the test file to check for query string test cases
cat -n infra/terraform/website/cloudfront_functions/redirects.test.js | tail -100Repository: iii-hq/iii
Length of output: 4371
All redirect paths drop query parameters; CloudFront does not automatically preserve them.
Line 61's comment assumes CloudFront reattaches query strings, but AWS CloudFront Functions documentation explicitly states that query strings must be constructed explicitly in the Location header. This affects not only the www→apex redirect (lines 61–64) but also the /docs redirects (lines 48, 53, 58). Tracking and authentication parameters will be lost on all redirects without modification.
Proposed fix
+function querySuffix(querystring) {
+ if (!querystring) return ''
+ if (typeof querystring === 'string') return querystring ? `?${querystring}` : ''
+ var pairs = []
+ for (var k in querystring) {
+ if (!Object.prototype.hasOwnProperty.call(querystring, k)) continue
+ var item = querystring[k]
+ if (item && item.multiValue && item.multiValue.length) {
+ for (var i = 0; i < item.multiValue.length; i++) {
+ pairs.push(`${encodeURIComponent(k)}=${encodeURIComponent(item.multiValue[i].value || '')}`)
+ }
+ } else if (item && typeof item.value === 'string') {
+ pairs.push(`${encodeURIComponent(k)}=${encodeURIComponent(item.value)}`)
+ }
+ }
+ return pairs.length ? `?${pairs.join('&')}` : ''
+}
+
-function redirect(location) {
+function redirect(location, request) {
+ var target = `${location}${querySuffix(request && request.querystring)}`
return {
statusCode: 301,
statusDescription: 'Moved Permanently',
headers: {
- location: { value: location },
+ location: { value: target },
'cache-control': { value: 'public, max-age=3600' },
},
}
}
@@
- return redirect('https://iii.dev/docs')
+ return redirect('https://iii.dev/docs', request)
// /docs/ exact → 301 docs.iii.dev/docs/
if (uri === '/docs/') {
- return redirect('https://docs.iii.dev/docs/')
+ return redirect('https://docs.iii.dev/docs/', request)
}
// /docs/<anything> → 301 docs.iii.dev/docs/<anything> (preserves the prefix)
if (uri.indexOf('/docs/') === 0) {
- return redirect(`https://docs.iii.dev${uri}`)
+ return redirect(`https://docs.iii.dev${uri}`, request)
}
// 2. www.iii.dev → 301 apex (preserves path; querystring reattached by CF)
if (host === 'www.iii.dev') {
- return redirect(`https://iii.dev${uri}`)
+ return redirect(`https://iii.dev${uri}`, request)
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@infra/terraform/website/cloudfront_functions/redirects.js` around lines 28 -
37, The redirect helper currently drops query strings because CloudFront
Functions don't preserve them automatically; update the redirect function
signature (redirect) to accept the incoming request or its querystring and, when
constructing the Location header, append '?' + event.request.querystring if that
string is non-empty. Then update all callers that perform redirects (the
www→apex redirect and the /docs redirect branches that call redirect) to pass
the event.request.querystring (or the whole request) so tracking/auth params are
preserved in the Location header.
| if (uri === '/docs') { | ||
| return redirect('https://docs.iii.dev/docs') | ||
| } | ||
|
|
||
| // /docs/ exact → 301 docs.iii.dev/docs/ | ||
| if (uri === '/docs/') { | ||
| return redirect('https://docs.iii.dev/docs/') | ||
| } | ||
|
|
||
| // /docs/<anything> → 301 docs.iii.dev/docs/<anything> (preserves the prefix) | ||
| if (uri.indexOf('/docs/') === 0) { | ||
| return redirect(`https://docs.iii.dev${uri}`) |
There was a problem hiding this comment.
Use the Terraform docs_domain input instead of hardcoding docs.iii.dev.
infra/terraform/website/variables.tf defines docs_domain, but Line 48, Line 53, and Line 58 hardcode the host. Any override won’t affect redirect behavior.
Proposed fix
- return redirect('https://docs.iii.dev/docs')
+ return redirect(`https://${DOCS_DOMAIN}/docs`)
- return redirect('https://docs.iii.dev/docs/')
+ return redirect(`https://${DOCS_DOMAIN}/docs/`)
- return redirect(`https://docs.iii.dev${uri}`)
+ return redirect(`https://${DOCS_DOMAIN}${uri}`)# Render the function from template so DOCS_DOMAIN is injected from var.docs_domain
code = templatefile("${path.module}/cloudfront_functions/redirects.js.tftpl", {
docs_domain = var.docs_domain
})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@infra/terraform/website/cloudfront_functions/redirects.js` around lines 47 -
58, The redirect handler currently hardcodes "docs.iii.dev" in the redirect(...)
calls (see checks on uri and uri.indexOf('/docs/') and the redirect function
usages); change the function rendering so the host is injected from Terraform
instead: render redirects.js from a template with templatefile and pass
var.docs_domain (e.g., create cloudfront_functions/redirects.js.tftpl that
exposes a DOCS_DOMAIN placeholder), and update the redirect(...) calls in
redirects.js to use that injected DOCS_DOMAIN (or docs_domain) instead of the
hardcoded host so overrides to var.docs_domain take effect.
| # Reuse the existing GitHub OIDC provider if present in the account, else create one. | ||
| # Phase 0 of the runbook asks you to verify existence via | ||
| # aws iam list-open-id-connect-providers | ||
| # If a provider for token.actions.githubusercontent.com already exists, set | ||
| # TF_VAR_github_oidc_provider_arn (or edit the data source to use it) rather than | ||
| # creating a second one. | ||
|
|
||
| resource "aws_iam_openid_connect_provider" "github" { | ||
| url = "https://token.actions.githubusercontent.com" | ||
|
|
||
| client_id_list = ["sts.amazonaws.com"] | ||
|
|
||
| # GitHub's root CA thumbprints. AWS no longer validates these since 2023 (OIDC | ||
| # signature verification replaced certificate validation), but the field is still | ||
| # required by the resource schema. | ||
| thumbprint_list = [ | ||
| "6938fd4d98bab03faadb97b34396831e3780aea1", | ||
| "1c58a3a8518e8759bf075b76b750d4f2df264fcd", | ||
| ] | ||
| } |
There was a problem hiding this comment.
The existing-provider path is documented but not implemented.
The header comment says operators can reuse an existing GitHub OIDC provider via TF_VAR_github_oidc_provider_arn, but this module always creates aws_iam_openid_connect_provider.github and there is no matching variable or data source. In an account that already has the GitHub provider, first apply still needs a manual code edit or will fail. Make the provider ARN an input and switch the trust policy principal between the supplied ARN and the created resource.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@infra/terraform/website/iam_github_oidc.tf` around lines 1 - 20, Add an
optional input variable (e.g., var.github_oidc_provider_arn) and stop always
creating aws_iam_openid_connect_provider.github; set resource
"aws_iam_openid_connect_provider.github" to create only when the input is empty
(use count or for_each). Create a local (e.g., local.github_oidc_provider_arn)
that selects the supplied var.github_oidc_provider_arn when present else
aws_iam_openid_connect_provider.github[0].arn, and update any trust policies or
principals that reference the provider to use local.github_oidc_provider_arn.
Ensure thumbprint_list and client_id_list remain on the created resource and
document the new variable name that replaces TF_VAR_github_oidc_provider_arn.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
.github/workflows/tf-plan.yml (1)
73-73:⚠️ Potential issue | 🟡 MinorAvoid direct template-literal injection for plan output.
At Line 73, injecting
steps.plan.outputs.plandirectly into a JS template literal can break parsing when plan text contains backticks or${...}sequences.Proposed fix
- name: Comment plan on PR uses: actions/github-script@v7 if: github.event_name == 'pull_request' + env: + PLAN_OUTPUT: ${{ steps.plan.outputs.plan }} with: script: | - const plan = `${{ steps.plan.outputs.plan }}`; + const plan = process.env.PLAN_OUTPUT || ''; const truncated = plan.length > 60000 ? plan.slice(0, 60000) + '\n... [truncated, see workflow logs for full plan]' : plan;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/tf-plan.yml at line 73, The current line uses a JS template literal with the GitHub expression steps.plan.outputs.plan which can break when the plan contains backticks or ${...}; replace this by consuming a JSON-encoded output and parsing it instead of interpolating into a template literal. Produce/consume a safe output (e.g., set a JSON-escaped output like plan_json from the producing step) and change the assignment from const plan = `${{ steps.plan.outputs.plan }}` to something that parses the escaped output (e.g., const plan = JSON.parse('${{ steps.plan.outputs.plan_json }}')) so special characters are preserved; reference the existing const plan assignment and steps.plan.outputs.plan when making this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/tf-plan.yml:
- Around line 1-3: Add a requirement that changes to the "Terraform Plan"
workflow (tf-plan.yml) must receive maintainer approval before merge by updating
repository controls: add or update the CODEOWNERS file to include the workflow
pattern (e.g., target .github/workflows/*.yml) and assign the appropriate team
or user(s) as owners, and ensure branch protection rules enforce "Require review
from Code Owners" for the target branch so any PR that modifies the Terraform
Plan workflow cannot be merged without a code-owner/maintainer approval.
---
Duplicate comments:
In @.github/workflows/tf-plan.yml:
- Line 73: The current line uses a JS template literal with the GitHub
expression steps.plan.outputs.plan which can break when the plan contains
backticks or ${...}; replace this by consuming a JSON-encoded output and parsing
it instead of interpolating into a template literal. Produce/consume a safe
output (e.g., set a JSON-escaped output like plan_json from the producing step)
and change the assignment from const plan = `${{ steps.plan.outputs.plan }}` to
something that parses the escaped output (e.g., const plan = JSON.parse('${{
steps.plan.outputs.plan_json }}')) so special characters are preserved;
reference the existing const plan assignment and steps.plan.outputs.plan when
making this change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: cd29779e-6b63-4364-a250-7851651f472b
📒 Files selected for processing (1)
.github/workflows/tf-plan.yml
| name: Terraform Plan | ||
|
|
||
| on: |
There was a problem hiding this comment.
❓ Verification inconclusive
Script executed:
# Check if the file exists and its git status
git ls-files .github/workflows/tf-plan.yml
git status .github/workflows/tf-plan.yml 2>/dev/null || echo "Status check complete"
# Check git log for this file to see if it's new or modified
git log --oneline -n 10 -- .github/workflows/tf-plan.yml 2>/dev/null | head -20 || echo "No git history available"
# View the actual file to confirm content at specified lines
echo "=== File content (lines 1-10) ==="
head -10 .github/workflows/tf-plan.yml 2>/dev/null || echo "File not found or error reading"
echo "=== File content (lines 70-80) ==="
sed -n '70,80p' .github/workflows/tf-plan.yml 2>/dev/null || echo "Cannot read those lines"
# Check file size
wc -l .github/workflows/tf-plan.yml 2>/dev/null || echo "Cannot get line count"Repository: iii-hq/iii
Repository: iii-hq/iii
Exit code: 0
stdout:
.github/workflows/tf-plan.yml
HEAD detached at FETCH_HEAD
nothing to commit, working tree clean
42937055 feat(ci): pass TF_VAR_alarm_email to tf-plan via ALARM_EMAIL repo secret
=== File content (lines 1-10) ===
name: Terraform Plan
on:
pull_request:
paths:
- 'infra/terraform/**'
- '.github/workflows/tf-plan.yml'
permissions:
contents: read
=== File content (lines 70-80) ===
if: github.event_name == 'pull_request'
with:
script: |
const plan = `${{ steps.plan.outputs.plan }}`;
const truncated = plan.length > 60000
? plan.slice(0, 60000) + '\n... [truncated, see workflow logs for full plan]'
: plan;
const body = [
'## Terraform plan — `infra/terraform/website`',
'',
'<details><summary>Click to expand</summary>',
93 .github/workflows/tf-plan.yml
Request approval before merging changes to CI/CD workflow.
This change modifies .github/workflows/tf-plan.yml, which requires explicit maintainer approval before merge per coding guidelines.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/tf-plan.yml around lines 1 - 3, Add a requirement that
changes to the "Terraform Plan" workflow (tf-plan.yml) must receive maintainer
approval before merge by updating repository controls: add or update the
CODEOWNERS file to include the workflow pattern (e.g., target
.github/workflows/*.yml) and assign the appropriate team or user(s) as owners,
and ensure branch protection rules enforce "Require review from Code Owners" for
the target branch so any PR that modifies the Terraform Plan workflow cannot be
merged without a code-owner/maintainer approval.
Terraform plan —
|
Terraform plan —
|
Terraform plan —
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
infra/terraform/_bootstrap/main.tf (1)
11-17: Prefer S3 lockfiles over a new DynamoDB lock table.HashiCorp's current S3 backend docs mark
dynamodb_tableas deprecated and documentuse_lockfile = trueas the replacement. Since this repo requires Terraform >= 1.9 (which supports lockfiles), migrate both backends touse_lockfileand removeaws_dynamodb_table.terraform_locks.♻️ Minimal backend change
terraform { backend "s3" { bucket = "iii-terraform-state-prod-us-east-1" key = "_bootstrap/terraform.tfstate" region = "us-east-1" - dynamodb_table = "iii-terraform-locks-prod" encrypt = true + use_lockfile = true } }Apply the same change to
infra/terraform/website/main.tf, then removeaws_dynamodb_table.terraform_locksand thelock_tableoutput.Also applies to: 75-84
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@infra/terraform/_bootstrap/main.tf` around lines 11 - 17, The Terraform S3 backend is still using the deprecated dynamodb_table setting; update the backend "s3" blocks (e.g., the one with bucket "iii-terraform-state-prod-us-east-1" and key "_bootstrap/terraform.tfstate") to remove dynamodb_table and instead set use_lockfile = true, apply the same change in infra/terraform/website/main.tf, and then delete the aws_dynamodb_table.terraform_locks resource and the lock_table output so the config relies on S3 lockfiles supported by Terraform >= 1.9.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@infra/terraform/_bootstrap/main.tf`:
- Around line 11-17: The Terraform S3 backend is still using the deprecated
dynamodb_table setting; update the backend "s3" blocks (e.g., the one with
bucket "iii-terraform-state-prod-us-east-1" and key
"_bootstrap/terraform.tfstate") to remove dynamodb_table and instead set
use_lockfile = true, apply the same change in infra/terraform/website/main.tf,
and then delete the aws_dynamodb_table.terraform_locks resource and the
lock_table output so the config relies on S3 lockfiles supported by Terraform >=
1.9.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 5d3203ea-5e02-455d-869e-8efd453d9998
📒 Files selected for processing (2)
infra/terraform/_bootstrap/main.tfinfra/terraform/website/main.tf
🚧 Files skipped from review as they are similar to previous changes (1)
- infra/terraform/website/main.tf
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
infra/terraform/_bootstrap/main.tf (1)
29-39: Add lifecycle rules for state-object version sprawl.With bucket versioning enabled (line 37), the missing lifecycle policy can cause unbounded noncurrent versions and long-term cost creep.
🧹 Suggested lifecycle config
+resource "aws_s3_bucket_lifecycle_configuration" "terraform_state" { + bucket = aws_s3_bucket.terraform_state.id + + rule { + id = "expire-noncurrent-state-versions" + status = "Enabled" + + noncurrent_version_expiration { + noncurrent_days = 365 + } + } +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@infra/terraform/_bootstrap/main.tf` around lines 29 - 39, The S3 state bucket (aws_s3_bucket.terraform_state) has versioning enabled (aws_s3_bucket_versioning.terraform_state) but lacks lifecycle rules to prevent noncurrent version sprawl; add lifecycle_rule blocks on aws_s3_bucket.terraform_state to: expire noncurrent versions after a sensible number of days (e.g., 30–90), delete expired object delete markers or incomplete multipart uploads, and optionally transition older noncurrent versions to cheaper storage—ensure the lifecycle_rule(s) include noncurrent_version_expiration and abort_incomplete_multipart_upload to control cost and retention.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@infra/terraform/_bootstrap/main.tf`:
- Around line 41-49: The Terraform S3 SSE resource
aws_s3_bucket_server_side_encryption_configuration.terraform_state currently
uses SSE-S3 (AES256); replace it with KMS-backed encryption by creating or
referencing an AWS KMS key (e.g., aws_kms_key.terraform_state) and updating the
rule to use sse_algorithm = "aws:kms" and set kms_master_key_id to that key's
id; also ensure the KMS key policy and any needed IAM roles or aliases permit
Terraform/GW access. Locate
aws_s3_bucket_server_side_encryption_configuration.terraform_state and modify it
to reference the KMS key, and add a corresponding aws_kms_key resource (and
alias/policy) if one does not already exist.
---
Nitpick comments:
In `@infra/terraform/_bootstrap/main.tf`:
- Around line 29-39: The S3 state bucket (aws_s3_bucket.terraform_state) has
versioning enabled (aws_s3_bucket_versioning.terraform_state) but lacks
lifecycle rules to prevent noncurrent version sprawl; add lifecycle_rule blocks
on aws_s3_bucket.terraform_state to: expire noncurrent versions after a sensible
number of days (e.g., 30–90), delete expired object delete markers or incomplete
multipart uploads, and optionally transition older noncurrent versions to
cheaper storage—ensure the lifecycle_rule(s) include
noncurrent_version_expiration and abort_incomplete_multipart_upload to control
cost and retention.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: b33b0c7a-0e95-4f38-bf79-6c7edca0d4a2
📒 Files selected for processing (3)
infra/terraform/_bootstrap/main.tfinfra/terraform/website/main.tfinfra/terraform/website/variables.tf
✅ Files skipped from review due to trivial changes (2)
- infra/terraform/website/main.tf
- infra/terraform/website/variables.tf
| resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" { | ||
| bucket = aws_s3_bucket.terraform_state.id | ||
|
|
||
| rule { | ||
| apply_server_side_encryption_by_default { | ||
| sse_algorithm = "AES256" | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n -C3 'aws_s3_bucket_server_side_encryption_configuration|sse_algorithm|kms_master_key_id' infra/terraform/_bootstrap/main.tfRepository: iii-hq/iii
Length of output: 321
🏁 Script executed:
cat -n infra/terraform/_bootstrap/main.tfRepository: iii-hq/iii
Length of output: 2416
🏁 Script executed:
# Search for KMS usage patterns in the repository
rg -i "kms_master_key_id|aws:kms|kms_key" infra/terraform --max-count=20
# Also check if there are other S3 bucket encryption configs
rg -n "aws_s3_bucket_server_side_encryption" infra/terraformRepository: iii-hq/iii
Length of output: 277
🏁 Script executed:
# Check the website S3 bucket encryption configuration
sed -n '1,30p' infra/terraform/website/s3.tfRepository: iii-hq/iii
Length of output: 681
Use KMS-backed encryption for the production Terraform state bucket.
The S3 bucket storing production Terraform state is currently encrypted with AES256 (SSE-S3). Since Terraform state files contain sensitive data (provider credentials, database passwords, and other secrets), use AWS KMS for stronger encryption with better key control and auditability.
Proposed hardening diff
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
apply_server_side_encryption_by_default {
- sse_algorithm = "AES256"
+ sse_algorithm = "aws:kms"
+ kms_master_key_id = "alias/aws/s3"
}
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" { | |
| bucket = aws_s3_bucket.terraform_state.id | |
| rule { | |
| apply_server_side_encryption_by_default { | |
| sse_algorithm = "AES256" | |
| } | |
| } | |
| } | |
| resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" { | |
| bucket = aws_s3_bucket.terraform_state.id | |
| rule { | |
| apply_server_side_encryption_by_default { | |
| sse_algorithm = "aws:kms" | |
| kms_master_key_id = "alias/aws/s3" | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@infra/terraform/_bootstrap/main.tf` around lines 41 - 49, The Terraform S3
SSE resource aws_s3_bucket_server_side_encryption_configuration.terraform_state
currently uses SSE-S3 (AES256); replace it with KMS-backed encryption by
creating or referencing an AWS KMS key (e.g., aws_kms_key.terraform_state) and
updating the rule to use sse_algorithm = "aws:kms" and set kms_master_key_id to
that key's id; also ensure the KMS key policy and any needed IAM roles or
aliases permit Terraform/GW access. Locate
aws_s3_bucket_server_side_encryption_configuration.terraform_state and modify it
to reference the KMS key, and add a corresponding aws_kms_key resource (and
alias/policy) if one does not already exist.
Terraform plan —
|
Terraform plan —
|
4014925 to
818dfc8
Compare
Terraform plan —
|
Sets up the AWS infrastructure-as-code and CI/CD pipeline to migrate the iii.dev marketing site off Vercel onto S3 + CloudFront in the motia-prod account (600627348446, us-east-1). Code-only change. No terraform apply has been run — AWS side effects require a follow-up plan+apply against the motia-prod account. Full plan: ~/.claude/plans/lazy-nibbling-valiant.md New Terraform modules: - infra/terraform/_bootstrap: shared S3 state bucket + DynamoDB lock table - infra/terraform/website (9-file split): - S3 bucket with OAC-scoped bucket policy (private, OAC-only reads) - CloudFront distribution: 3 SANs (iii.dev, www.iii.dev, iii-preview.iii.dev), HTTP/3, compress, TLSv1.2_2021, PriceClass_All - CloudFront Function (viewer-request): www→apex 301, /docs→docs.iii.dev 301, /llms.txt redirect, SPA fallback. Attached only to default S3 behavior so /api/search* JSON 404s pass through unchanged. - CloudFront Response Headers Policy: HSTS + CSP (report-only during staging) + X-Frame-Options + Referrer-Policy + nosniff. CSP allowlist derived from website/index.html, useCookieConsent.ts, and EmailSignupForm.tsx (GTM, Tailwind CDN, esm.sh, CommonRoom, Mailmodo). - /api/search* ordered behavior proxying to iii-docs.vercel.app (temporary, until docs migration), CachingDisabled + AllViewer managed policies - ACM cert in us-east-1 with DNS validation - Route53 preview records (created immediately) + apex/www records (defined but imported in Phase 4 during cutover) - GitHub OIDC provider + deploy role scoped to iii-hq/iii main + production env - CloudWatch alarms: CF 5xxErrorRate >1% and ACM DaysToExpiry <30, via SNS email CloudFront Function: - cloudfront_functions/redirects.js (pure JS, cloudfront-js-2.0 runtime, <10KB) - cloudfront_functions/redirects.test.js (21 node --test cases, all passing) covers: /docs exact+trailing slash, /docs/path, /docs/deep/nested, /docsfoo (not redirected), /llms.txt, www→apex with and without /docs, SPA fallback for extensionless paths, pass-through for extensioned files, /.well-known exemption, missing host header defense. GitHub Actions: - .github/workflows/deploy-website.yml: npm install + build + OIDC + s3 sync (split cache-control for hashed assets vs HTML) + CF invalidation. Website is a standalone npm project — commented out of pnpm-workspace.yaml:10 — so this workflow uses npm, not pnpm. Node 22 matches _npm.yml. - .github/workflows/smoke.yml: post-deploy curl assertions against 8 endpoints (apex 200, HSTS+nosniff headers, /docs 301, /llms.txt 301, /api/search JSON content-type, 404 for missing files, SPA fallback for client routes). Triggered by workflow_run after deploy-website succeeds, or manual dispatch with a BASE_URL input (usable against iii-preview.iii.dev during staging). - .github/workflows/tf-plan.yml: PR-triggered terraform fmt/validate/plan with read-only OIDC role; posts plan as PR comment. - .github/workflows/cloudfront-functions-test.yml: runs node --test on CF function changes in PRs and on main. Verified locally: - node --test redirects.test.js → 21/21 passing - terraform fmt -check -recursive → clean - terraform validate (both modules) → Success Next steps (not in this commit): - Set up GitHub environment `production` with variables S3_BUCKET, CF_DISTRIBUTION_ID and secrets AWS_DEPLOY_ROLE_ARN, VITE_MAILMODO_FORM_URL, AWS_READONLY_ROLE_ARN. - cd infra/terraform/_bootstrap && terraform init && apply (one-time) - cd infra/terraform/website && terraform init && plan && apply (creates preview infra; Phase 4 cutover happens later via terraform import of the apex records — see infra/terraform/website/README.md).
…rnize CF function JS
Two unrelated fixes bundled because they both landed in the same review pass:
1. Rename AWS resource identifiers from `motia-*` to product-scoped names.
The account is named `motia-prod` (that's the AWS profile name and stays as-is
in `provider "aws" { profile = ... }`), but resource names inside the account
should identify the *product* they belong to, not the org. `motia-*` was a
half-informed guess on my part — the only real existing pattern in
`motia-argocd-values` is `motia-cloud-production-cluster-*`, which is
cluster-scoped and doesn't generalize.
- Bootstrap state backend:
motia-prod-terraform-state-us-east-1 → iii-terraform-state-prod-us-east-1
motia-prod-terraform-locks → iii-terraform-locks-prod
- Website module:
motia-prod-iii-website-us-east-1 → iii-website-prod-us-east-1 (S3)
motia-prod-iii-website-oac → iii-website-prod-oac
motia-prod-iii-website-redirects → iii-website-prod-redirects (CF function)
motia-prod-iii-website-security-headers → iii-website-prod-security-headers (RHP)
motia-prod-github-deploy-iii-website → iii-website-prod-github-deploy (IAM role + policy)
motia-prod-iii-website-alarms → iii-website-prod-alarms (SNS topic)
motia-prod-iii-website-cf-5xx-rate → iii-website-prod-cf-5xx-rate (alarm)
motia-prod-iii-website-acm-days-to-expiry → iii-website-prod-acm-days-to-expiry
- Added `Service = "website"` default tag to the provider.
2. Modernize cloudfront_functions/redirects.js for CloudFront Functions 2.0 runtime.
The 2.0 runtime supports `let`/`const`, template literals, and optional chaining —
I was overly conservative using `var` + string concatenation (habits from the
older cloudfront-js-1.0 runtime). Biome flagged all of them.
- var → const (uri, host, request, lastSlash, lastSegment)
- String concat → template literals (`https://docs.iii.dev${tail}`, `https://iii.dev${uri}`)
- `request.headers && request.headers.host && request.headers.host.value` →
`request.headers?.host?.value`
- Added biome-ignore comment on `function handler` — it's the CloudFront Function
entry point, called by the runtime not imported, so biome's "unused" warning
is a false positive.
Verified:
- node --test redirects.test.js → 21/21 passing
- terraform fmt -check -recursive → clean
- terraform validate (both modules) → Success
Next steps unchanged — see previous commit's next steps.
…er live-state reality check
Live verification of the production `iii.dev` state (via dig + Route53 API +
curl against the real domain) uncovered three issues in the previous design
that would have caused user-visible breakage on cutover. All three are fixed
in code here. The plan file (~/.claude/plans/lazy-nibbling-valiant.md) has a
full writeup of the reality check.
1. /docs redirect target was wrong. The plan assumed iii.dev/docs/quickstart
→ docs.iii.dev/quickstart. In reality, docs.iii.dev/quickstart returns 404
— the Mintlify project serves content only under its own /docs/ prefix,
and the k8s edge-proxy nginx has a matching /docs/docs/ deduplication
rewrite to make it work. The CF Function must preserve the /docs prefix:
iii.dev/docs → 301 https://docs.iii.dev/docs
iii.dev/docs/ → 301 https://docs.iii.dev/docs/
iii.dev/docs/quickstart → 301 https://docs.iii.dev/docs/quickstart
Updated redirects.js, redirects.test.js (20 tests still passing), and the
smoke.yml assertions.
2. /llms.txt: dropped the special-case redirect. The current live site returns
404 for iii.dev/llms.txt (Mintlify upstream doesn't have the content).
Passing it through to S3 (which also 404s) preserves today's behavior. No
point redirecting to a URL that also 404s. Removed from both the function
and the smoke workflow.
3. HSTS: reverted to match what the live site currently sends.
Before (plan): max-age=63072000; includeSubDomains; preload (2 years, preload)
After: max-age=31536000; includeSubDomains (1 year, no preload)
`preload` is a one-way door. With `includeSubDomains`, it would apply
forever to `cloud.iii.dev` (a separate Vercel custom domain project) and
the `*.iii.dev` wildcard CNAME (another Vercel catch-all). Submitting to
hstspreload.org takes months to reverse. Preload can be layered on in a
follow-up PR after auditing every subdomain.
Live state findings that don't require code changes in this commit but are
captured in the plan file for Phase 0 runbook:
- ⚠ CAA record at `iii.dev` pins cert issuance to letsencrypt.org only.
`terraform apply` on `aws_acm_certificate.site` will FAIL without an
`amazontrust.com` peer. Evidence: there's already a FAILED ACM cert for
iii.dev in the account (e6b3a168-c173-4e63-b473-6bf749a397a3) from a prior
attempt. Phase 0 runbook now has the exact aws cli command to UPSERT the
CAA record with both CAs allowed.
- Apex iii.dev is manually-owned in Route53, NOT managed by External-DNS
(no cname-iii.dev TXT ownership record). Only www.iii.dev, docs.iii.dev,
install.iii.dev are External-DNS-managed. The Phase 4 argocd-values PR now
only removes the iii-dev-www Ingress (not iii-dev). Apex rollback is a
direct aws route53 change-resource-record-sets UPSERT; no argocd coordination.
- docs.iii.dev and install.iii.dev STAY on the k8s edge proxy in this migration.
The plan is now explicit that we're NOT retiring the k8s NLB — only taking
the iii.dev apex + www out of its critical path. docs and install can be
migrated in a follow-up.
- The k8s edge proxy (iii-edge-proxy Pod in the `site` namespace) IS still
in the live iii.dev traffic path today. The Vercel headers visible in
curl responses come from the nginx UPSTREAM (proxy_pass to iii-web.vercel.app
and motiadev.mintlify.dev), not from Vercel being the edge. Earlier review
briefly got confused about this.
Verified:
- node --test redirects.test.js → 20/20 passing
- terraform fmt -check -recursive → clean
Phase 2 apply was failing with InvalidChangeBatch: "record already exists" on
aws_route53_record.{apex_a,apex_aaaa,www_a,www_aaaa}. The plan always said
these records should be imported in Phase 4 (they already exist in Route53 —
apex manually, www via External-DNS from ingress/site/iii-dev-www), but the
module had them as unconditional resources — just with a comment saying "don't
apply yet." Terraform doesn't read comments. It tried to CREATE them eagerly.
Route53's refusal to CREATE over existing records was the safety net. Nothing
was destroyed, production iii.dev still serves from the k8s edge proxy. The
preview side of Phase 2 (iii-preview.iii.dev on CloudFront, ACM cert, IAM,
SNS, alarms) completed successfully before the apex/www creates failed.
Fix: gate the four apex/www resources behind `var.manage_apex_records` with
count = var.manage_apex_records ? 1 : 0. Default is false, so Phase 2 apply
is a no-op for those records. Phase 4 imports them with the [0] index and
applies with -var='manage_apex_records=true'.
Also updated README Phase 4 runbook:
- Use [0] index on terraform import commands (matches counted resources)
- Apply -var=manage_apex_records=true explicitly
- Only remove iii-dev-www Ingress from argocd-values (not the apex iii-dev,
which was never External-DNS-owned)
- Rollback split between apex (manual UPSERT + state rm) and www (argocd
revert + state rm)
- Add Phase 5 note to clean up the iii-preview CAA workaround record
…cket for proper 404s
Two runtime bugs caught by Phase 2 smoke testing after applying the CloudFront
distribution to iii-preview.iii.dev:
1. CloudFront Function crashed at runtime with
FunctionExecutionError from cloudfront
SyntaxError: Unexpected token ";" in 43
on every request, causing HTTP 503 on all /, /docs, /llms.txt, etc. The
offending line was:
const host = request.headers?.host?.value
The biome cleanup pass in a previous commit modernized the code to use
optional chaining, assuming the cloudfront-js-2.0 runtime would accept
ES2020+ features. Despite AWS's docs claiming ES 2022 support, the actual
runtime (QuickJS-based) rejects optional chaining. Verified via
`aws cloudfront test-function` which returned the exact parse error.
Node's test runner happily accepts optional chaining and the 20 unit
tests pass locally, so the local test suite can't catch this class of bug.
Fix: replace `request.headers?.host?.value` with the traditional
request.headers && request.headers.host ? request.headers.host.value : undefined
guard. Template literals (also ES2020) DO work in the runtime, verified by
the /docs/quickstart 301 succeeding with the templated Location header.
Added a biome-ignore comment explaining the compat constraint for future
contributors. 20/20 unit tests still pass locally. Live test-function
invocations confirm the function now runs cleanly on /, /docs/quickstart,
and /some/client/route.
2. S3 returned 403 (not 404) for missing objects like /missing.jpg, /llms.txt,
and /.well-known/vercel/foo. Root cause: the bucket policy granted only
`s3:GetObject` on `bucket/*` to the CloudFront OAC principal, not
`s3:ListBucket` on the bucket itself. Without ListBucket, S3 can't
distinguish "object doesn't exist" from "access denied" and defaults to
403. This is a well-known S3+CloudFront quirk.
Fix: added a second statement to the bucket policy granting
`s3:ListBucket` on the bucket ARN (not `/*`) to the same CloudFront OAC
principal under the same SourceArn condition. This does NOT expose a
public LIST endpoint — CloudFront only issues GetObject calls for
specific keys at the viewer's request. The ListBucket permission just
changes the internal S3 error code from 403 to 404, which is what end
users correctly see for missing paths.
Phase 2 smoke now 8/8 green on iii-preview.iii.dev:
/ → 200
/docs → 301 docs.iii.dev/docs
/docs/quickstart → 301 docs.iii.dev/docs/quickstart
/docs/guide/deep/nested → 301 (prefix preserved)
/missing.jpg → 404
/some/client/route → 200 (SPA fallback)
/llms.txt → 404
/.well-known/vercel/foo → 404
HSTS + CSP headers present
S3 direct URL → 403 (bucket private)
…uction) Phase 3 wiring discovered that `Production` (stored with that capitalization) is already in use by Vercel's GitHub integration for multiple other projects. Reusing it would expose the AWS deploy secrets (AWS_DEPLOY_ROLE_ARN, VITE_MAILMODO_FORM_URL) to every existing workflow that runs in that env. Also: GitHub's OIDC sub claim uses the stored environment name case, and the IAM trust policy's StringEquals is case-sensitive. My trust policy said `environment:production` (lowercase) but the stored env name was "Production" (capitalized), which would have silently failed the assume-role call. Fix: rename the environment for this deploy to `iii-website-prod`, matching the `iii-website-prod-*` naming convention used for every other AWS resource in this module. Single-service isolation for the secrets. Changes: - .github/workflows/deploy-website.yml: `environment: production` → `environment: iii-website-prod` - infra/terraform/website/variables.tf: `github_environment` default → `"iii-website-prod"` (also documents the case-sensitivity constraint for future contributors) Already applied: `terraform apply` updated the IAM role's trust policy sub condition from `repo:iii-hq/iii:environment:production` to `repo:iii-hq/iii:environment:iii-website-prod`. GitHub env `iii-website-prod` created via `gh api PUT`, with env variables S3_BUCKET + CF_DISTRIBUTION_ID and secrets AWS_DEPLOY_ROLE_ARN + VITE_MAILMODO_FORM_URL (empty string for validation; real value to be set before merge).
Context: the _bootstrap tfstate was previously committed to the repo (commit 37288df in the branch history, which this rewrite removes) following the "chicken-and-egg bootstrap" convention. Since iii-hq/iii is a PUBLIC repo, that choice leaked account IDs, resource ARNs, and internal Terraform state metadata to the open internet. Not catastrophic — no secrets were in there — but better to not leak at all. Audit of what was in the committed file: - AWS account ID 600627348446 (embedded in DynamoDB ARN) - State bucket + lock table names - Tags and internal Terraform provider private state (timeouts, schema version) - No credentials, no secrets, no PEM material, no application data Fix: - Uploaded the existing tfstate to S3 at key _bootstrap/terraform.tfstate in the iii-terraform-state-prod-us-east-1 bucket (SSE-S3, versioned, private). - Added a `backend "s3"` block to infra/terraform/_bootstrap/main.tf that points at that key. Future applies read/write state via the bucket it itself provisioned (normal chicken-and-egg solved by "the bucket already exists after the first apply"). - Updated infra/terraform/.gitignore to ignore all *.tfstate files globally (removed the `!_bootstrap/terraform.tfstate` whitelist exception). - Rewrote the _bootstrap README with the new state-storage story and a "Re-bootstrap from scratch" runbook for the rare case of someone needing to recreate the bucket from zero (temporarily comment out the backend block, apply locally, uncomment, terraform init -migrate-state). - Deleted the local tfstate file from the working tree. Verified: - aws s3 ls s3://iii-terraform-state-prod-us-east-1/_bootstrap/terraform.tfstate → 7811 bytes, AES256 encrypted - terraform init -reconfigure → "Successfully configured the backend s3" - terraform plan → "No changes. Your infrastructure matches the configuration." - terraform fmt -check -recursive → clean This commit alone does NOT remove the tfstate from git history — the file is still reachable via the prior commit 37288df. The next step (a git rebase that drops 37288df entirely, followed by a force-push to overwrite the branch on origin) is what actually purges the file from the repo timeline.
Phase 3 wiring decision: created a dedicated read-only GitHub OIDC role so `.github/workflows/tf-plan.yml` can run `terraform plan` against any `infra/terraform/*` module on pull requests and post the output as a PR comment. Design: - Separate role from `iii-website-prod-github-deploy` (which has write actions like s3:PutObject and CreateInvalidation). Using the deploy role for plan would be over-permissive and dangerous — a malicious PR modifying tf-plan.yml to run `apply` could actually destroy things. The plan role is read-only, so even if someone swaps `plan` for `apply`, the modification calls fail at IAM. - Trust policy pins the OIDC sub claim to `repo:iii-hq/iii:pull_request`. Only pull_request-triggered jobs from this exact repo can assume it. Forks don't get OIDC tokens for PR workflows (GitHub security default), so this is effectively "PRs from the iii-hq/iii team only." - Permissions: AWS managed `arn:aws:iam::aws:policy/ReadOnlyAccess`. Broad but correct for Terraform plan, which reads across many services. Enumerating every GetX/DescribeX/ListX action would be ~80 statements and break every time a new resource type is added to any module. AWS maintains ReadOnlyAccess automatically. The blast radius of a malicious read is bounded by the code review process for PRs touching .github/workflows/. Resources added: - data.aws_iam_policy_document.github_tf_plan_trust - aws_iam_role.github_tf_plan → `iii-infra-github-tf-plan` - aws_iam_role_policy_attachment.github_tf_plan_readonly → managed ReadOnlyAccess Output added: - github_tf_plan_role_arn Side effects (outside this commit): - `terraform apply` executed; the role exists in the motia-prod account. - Repo-level secret `AWS_READONLY_ROLE_ARN` set via `gh secret set AWS_READONLY_ROLE_ARN --repo iii-hq/iii --app actions`. Effect on tf-plan.yml: - Previously failed on every PR because AWS_READONLY_ROLE_ARN was missing. - With this merge + the repo secret set, tf-plan.yml will: 1. Trigger on PRs touching infra/terraform/** 2. Assume the read-only role via OIDC 3. Run fmt-check + validate + plan 4. Post the plan as a PR comment
The previous commit `e8153b5a` accidentally deleted the tf-plan.yml workflow file due to a stale `git rm` from a cancelled command that had pre-staged the deletion in the index. The role was created correctly; the workflow file deletion was unintentional. Restored from the pre-delete tree at commit 8773da2. Same content as before — no changes to the workflow itself. Now the full wiring is in place: - iii-infra-github-tf-plan role (created in e8153b5) - AWS_READONLY_ROLE_ARN repo secret (set via `gh secret set` during e8153b5) - .github/workflows/tf-plan.yml (restored in this commit)
`var.alarm_email` in infra/terraform/website has no default (required). The value (`devops@motia.dev`) is real PII and shouldn't be committed to the public repo as a default in variables.tf. Fix: - Repo-level GitHub secret `ALARM_EMAIL` set to the current production value (via `gh secret set ALARM_EMAIL --repo iii-hq/iii --app actions`). - tf-plan.yml injects it as env var `TF_VAR_alarm_email` at the Terraform plan step. Terraform picks it up automatically. Why a REPO secret, not env-scoped: - tf-plan.yml triggers on `pull_request` and does NOT use an `environment:` block, so env-scoped secrets aren't accessible to it. Must be repo-level. - The same pattern is already used for `AWS_READONLY_ROLE_ARN`, which tf-plan also needs and which isn't an env-scoped secret either. Effect: - `terraform plan` on PRs touching infra/terraform/** now succeeds (previously would have errored with "No value for required variable" had the read-only role trust path worked earlier — we never actually ran the workflow with that role, so this is being fixed alongside the role creation). - The email value appears in TF plan output during the workflow run. GitHub Actions masks secrets in logs so it should not be visible, but treat the output as if it contained the email and don't share workflow logs publicly.
tf-plan.yml on GitHub Actions was failing with:
Error: failed to get shared config profile, motia-prod
Root cause: both `main.tf` provider blocks had `profile = "motia-prod"`
hardcoded. That forces Terraform to read `~/.aws/config` looking for that
profile, regardless of what other credentials are in the environment. CI
runners don't have a ~/.aws/config with that profile — they receive
credentials as env vars from aws-actions/configure-aws-credentials (OIDC
web identity).
Fix: remove `profile = "motia-prod"`. The AWS SDK credential chain then
resolves credentials from whichever source is present:
- GitHub Actions: OIDC env vars (AWS_WEB_IDENTITY_TOKEN_FILE,
AWS_ROLE_ARN, AWS_REGION) injected by the configure-aws-credentials step
- Local dev: AWS_PROFILE=motia-prod env var set by developer before
running terraform
Local workflow now:
export AWS_PROFILE=motia-prod
terraform apply
(Previously, the hardcoded profile let local devs skip the export. Small
regression in DX but unblocks CI, which is the actual consumer of this
workflow going forward.)
…workflows Keep WHY-comments (CSP unsafe-*, HSTS preload one-way door, S3 ListBucket 403→404, CF Functions 2.0 optional-chaining quirk, tf-plan read-only role rationale). Drop narration that restates the code.
Inlining $\{\{ steps.plan.outputs.plan \}\} into a JS template literal
breaks when the plan contains characters that terminate the literal
(backticks or \$\{...\}), producing "SyntaxError: Unexpected identifier".
Reading from process.env.PLAN keeps the untrusted content out of the
script source.
Decouples the apex iii.dev and www.iii.dev cutovers so they can be flipped independently. Limits blast radius during Phase 4 — flip www first, observe for 15-30 min, then flip apex, or vice versa. Both default to false. Current state: nothing imported, nothing changes.
818dfc8 to
4983683
Compare
Terraform plan —
|
Summary
Migrates the iii.dev marketing site from the current k8s edge-proxy → Vercel upstream path to AWS S3 + CloudFront in the motia-prod account. Phases 1 and 2 are already applied to AWS (preview infra is live). Phase 3 (CI wiring) completes when this merges.
Production iii.dev is NOT affected by this merge. The apex/www Route53 records are gated behind
var.manage_apex_records(default false), so the existing k8s → Vercel path continues to serve production. Only the preview subdomainiii-preview.iii.devis live on CloudFront today.What changed since #1469
iii-hq/iiiis a public repo, the state file leaked the AWS account ID, bucket/table names, and internal TF metadata. Audit confirmed no secrets, credentials, PEM keys, or PII were present — only public identifiers._bootstrap/terraform.tfstatein theiii-terraform-state-prod-us-east-1bucket (SSE-S3, versioned, public access blocked).backend "s3"block to_bootstrap/main.tfso future applies read/write state via the bucket the module itself provisioned (chicken-and-egg solved because the bucket already exists after the first apply).infra/terraform/.gitignoreto globally ignore all*.tfstatefiles (removed the!_bootstrap/terraform.tfstatewhitelist)._bootstrap/README.mdwith the new state-storage story and a "Re-bootstrap from scratch" runbook for the rare case of someone recreating the bucket from zero.Verified after the rewrite:
aws s3 ls s3://iii-terraform-state-prod-us-east-1/_bootstrap/terraform.tfstate→ 7811 bytes, AES256terraform init -reconfigure→ "Successfully configured the backend s3"terraform planin_bootstrap→ "No changes. Your infrastructure matches the configuration."git ls-tree -r origin/infra/deploy-website | grep terraform.tfstate→ 0 matches ✓terraform fmt -check -recursive→ cleanCurrent state in AWS (Phase 2 done)
Phase 2 smoke test results (live on iii-preview.iii.dev, 8/8 ✓)
Runtime landmines caught during Phase 2 validation
What DOES NOT happen on merge
What DOES happen on merge (Phase 3)
Test plan
Pre-merge (already done)
Post-merge (manual)
Phase 4 (later, deliberate low-traffic window — NOT part of this merge)
See `infra/terraform/website/README.md` "Phase 4 — apex cutover" for the full runbook.
Phase 5 (48–72h after Phase 4)
See `infra/terraform/website/README.md` "Phase 5 — remove preview".
Summary by CodeRabbit