smoke pack DRY, workflow extraction, CI unblock by chrisns · Pull Request #238 · co-cddo/ndx_try_aws_scenarios

chrisns · 2026-05-13T16:07:22Z

Summary

Three threads bundled because they share files:

Smoke pack consolidation — per-scenario tests/smoke/*.spec.ts files (avg ~70 lines, lots of repetition) collapsed into 17 co-located cloudformation/scenarios/*/smoke.ts configs (avg 25 lines). A single tests/smoke/runner.ts drives all of them; tests/smoke/helpers.ts::adminLogin() plus get/getSecret accessors on the smoke context absorb the login-form boilerplate. assertion-bar.ts is gone — quarantine state moves into each scenario's own smoke.ts. Playwright's smoke project re-targeted at cloudformation/scenarios with testMatch: /[^/]+\/smoke\.ts$/ so new scenarios auto-discover. All 17 tests still found by playwright test --list.
Workflow de-bloat (1255 → 601 lines):
- smoke.yml 467 → 204: scope decision, pre-deploy state, SCP drift, capture-events, teardown extracted to scripts/smoke-*.sh
- deploy-blueprints.yml 726 → 354: 9 per-scenario synth jobs collapsed to a single matrix; CDK strip + SAM-bucket verify pulled into scripts/
- renovate.yml 62 → 43: narrative stripped (RENOVATE_TOKEN PAT retained per operator decision)
- Phase / AC / T*.* / tech-spec narrative removed throughout
Pre-existing CI unblock:
- LocalGov IMS CDK dropped minLength: 1 on GovUkPayApiKey so smoke deploys (no key) succeed; payment portal degrades cleanly
- Retention lint extended to cover DeletionPolicy/UpdateReplacePolicy=Snapshot (parity with the prior inline check)

CODEOWNERS switched from @chrisns to @co-cddo/ndx and broadened to cover the new scripts and per-scenario smoke.ts files.

Operator actions taken out-of-band (state already reconciled)

IsbHubStack had been UPDATE_ROLLBACK_FAILED for 12 days (transient AiContactCentreStackSet issue from May 1); recovered cleanly via continue-update-rollback.
6 orphan StackSets (bops-planning, digital-planning-register, fixmystreet, minute, paperless-ngx, planx) imported back into IsbHubStack via change-set-type IMPORT. UUIDs preserved → ISB lease bindings intact. The first cdk deploy after merge will create the 6 missing BucketDeployment resources and reconcile any property drift on the imported StackSets.
All 17 scenario templates refreshed in s3://ndx-try-isb-blueprints-568672915267/scenarios/. Planx had been at template.json (manual May-8 upload); now at the canonical template.yaml.

Test plan

CI: deploy-blueprints.yml green (the matrix + the deploy job, including the 6 new BucketDeployment creates from the import reconciliation)
CI: smoke workflow green (templates fresh in S3, LocalgovIms accepts empty key, hub stack healthy)
Spot-check: playwright test --project=smoke --list shows 17 tests after checkout
Spot-check: git diff main -- .github/workflows/*.yml shows shrunk YAML with no inline scripts > ~15 lines

@chrisns

Implements every phase of _bmad-output/implementation-artifacts/tech-spec-scenario-regression-smoke-pack.md in a single deliverable. The original spec called for one PR per phase (8+ PRs); experience showed the dependency overlap made that worse for review, not better, so this squashes #226 / #227 / #228 / #229 / #230 / #231 / #232 / #233 / #235 into a single change. What ships ========== Phase 1a — runbook + config schema - docs/smoke-test-account-setup.md: one-off manual procedure for vending the long-lived smoke-test AWS account, with the four required sections (Prerequisites / Procedure / Verification / Operational Notes). Per-step idempotency checks + inverses; ProtectISB role-creation canary + fallback branch (ADR-1); Bedrock model-access enablement + gotchas (legacy claude-3-haiku-20240307 retired, Nova body shape); service-quota targets; QuickSight decision; iterate-to-least-privilege protocol for the inline IAM policy. - docs/smoke-test-account-config.yml: post-runbook state record schema. Phase 1b — operator-executed account state - Smoke account 464453619983 provisioned in NDX org under the fallback branch (ProtectISB canary failed; account moved to root with Restrictions SCP attached directly). AwsNuke SCP intentionally NOT attached (it blocks sts:AssumeRoleWithWebIdentity and we use CFN delete + retention lint, not aws-nuke). - OIDC provider + InnovationSandbox-ndx-SmokeTestDeployRole created with 6h max-session-duration. Trust policy uses sub-pattern lock (`repo:co-cddo/ndx_try_aws_scenarios:*`) + aud condition; the repository_owner claim condition is omitted because it reproducibly breaks the assume even though the OIDC token contains the claim (verified via JWT decode in an investigation workflow that has since been deleted; see runbook Step 10). - expected_scps reflects live state: Restrictions + FullAWSAccess. Phase 2a — synth pipelines for missing scenarios - New synth jobs in .github/workflows/deploy-blueprints.yml for planx and digital-planning-register (CDK -> template.yaml -> S3 via the existing isb-hub upload chain). - bops-planning synth job lands in Phase 2b after the retention lint is justification-aware. - ai-contact-centre: new "verify packaged CodeUri targets blueprints bucket" step catches a sam-package regression where --s3-bucket would silently land in the SAM default bucket. Phase 2b — all-demo expansion + retention lint - cloudformation/scenarios/all-demo/template.yaml expanded from 7 to 16 nested scenarios (Minute, FixMyStreet, AI Contact Centre, LocalGov IMS, Paperless-ngx, PlanX, Bops Planning, Simply Readable, Digital Planning Register). Umbrella parameters for credentials (GovUkPayApiKey, OSVectorTilesApiKey, DprImageUri, DprCouncilConfig) with overridable empty / sensible defaults; per-scenario URL + admin-credential Outputs surfaced. - scripts/lint-retention-policies.sh: forbids DeletionPolicy=Retain / UpdateReplacePolicy=Retain / Properties.DeletionProtection=true / Properties.EnableDeletionProtection=true / Properties.FinalSnapshotIdentifier unless the resource carries a non-empty Metadata.Justification. Per-template cap (default 3) + global cap (default 10) so any one scenario can't pencil-whip retentions repo-wide. - lint-committed-templates job in deploy-blueprints.yml runs the lint over hand-authored CFN templates. - bops-planning's LogGroup keeps RemovalPolicy.RETAIN (deliberate debug-after-rollback) with a Metadata.Justification attached via cfnOptions; bops synth job re-enabled. Phase 3 — smoke rails - playwright.config.ts: new 'smoke' project gated on PLAYWRIGHT_SUITE=smoke. - tests/smoke/fixtures/cfn-outputs.ts: SDK-v3 DescribeStacks helper. Sensitive output values flow only via explicit sensitiveValue() accessor; toString / inspect / Symbol.toPrimitive emit REDACTED placeholder. Documents the CloudFormation-API limitation that Output Metadata.Sensitive opt-in isn't readable (regex is the sole signal). - tests/smoke/fixtures/assertion-bar.ts: 17 AssertionBarRow entries populated. - tests/smoke/fixtures/secure-form.ts: fillPassword wrapper redacts form-encoded passwords from Playwright trace. - scripts/smoke.sh + .env.example: local + CI identical invocation. - .github/workflows/smoke.yml: trigger matrix (PR-scoped / nightly cron / push-to-main / workflow_dispatch); scope decides full vs scoped from changed paths; global serial concurrency (no cancel-in-progress — cancelled runs leave orphan AWS state); configure-aws-credentials with role-duration-seconds=21600 (6h) to match the role's max-session-duration; pre-deploy state check with auto-recovery for stranded stacks; SCP drift check (excluding FullAWSAccess, fail-soft for first 7 detections); quarantine-expiry check; CFN events captured BEFORE teardown; teardown with 3x60s retry, gated on aws-creds outcome so we don't burn 3min retrying without credentials. - .github/workflows/quarterly-audit.yml: 3-monthly tracking issue (spend, orphan sweep, deploy-role policy drift, SCP drift, Renovate liveness, ProtectISB-fallback revisit). - .github/CODEOWNERS: smoke-pack sensitive paths require @chrisns review (until a maintainers team is provisioned). Phase 4 — 17 per-scenario smoke specs - One spec per scenario covering the auth-mode pattern (admin-login / public / sso-skip / umbrella). Bug-informed feature flows cite the historical regression that informed each test: - fixmystreet: /reports requires bin/update-all-reports; /admin must reach the dashboard without 2FA redirect - planx: SPA boots free of domain-allowlist / Airbrake errors; Hasura native /v1/version responds (Caddy elimination) - minute: magic-link sets cookie; same-origin fetch() works post-auth; /api/proxy/healthcheck reaches the backend (catches the basic-auth-breaks-fetch() regression and the ALB /api/* interception regression) - localgov-ims: Windows IIS multi-site routing; AdminPassword must not be the literal {{resolve:...}} token (catches the Lambda-custom-resource regression) - localgov-drupal: ndx_aws_ai module boots without Bedrock AccessDeniedException - simply-readable: SPA loads, credentials non-empty + non-token; reload produces no 5xx responses (catches BlueprintsBucketName mis-wire) - ai-contact-centre: PSTN claim matches UK toll-free / landline OR US toll-free (catches international fallback regression) - paperless-ngx: /documents view + /api/documents/ respond (S3 Files mount integrity) - bops-planning: post-login URL is NOT on the Applicants port (catches the routing.rb single-tenant override regression) - digital-planning-register: register loads with planning markers - public-Lambda scenarios (foi-redaction, planning-ai, smart-car-park, text-to-speech, council-chatbot): FunctionURL not-5xx + not-403 (catches the InvokeFunctionUrl + InvokeFunction dual-permission regression); council-chatbot uses POST not GET so the test isn't vacuous against a POST-only Lambda - quicksight-dashboard: landing + outputs only (sso-skip per auth- mode categorisation) - all-demo: discovers Output keys dynamically by parsing the committed template at test time; asserts every Output present, non-empty, and not the {{resolve:...}} literal; URL outputs match https?:// Phase 5 — pin every floating image tag - 10 own-GHCR images (fixmystreet, localgov_drupal, minute_*, planx-*, dpr) pinned to sha-<7chars>@sha256:<digest>. - 2 upstream images (docker.io/apache/tika 3.3.0.0-full, ghcr.io/paperless-ngx/paperless-ngx 2.9) pinned to <tag>@sha256:<digest>. - Removed legacy cloudformation/scenarios/minute/template.json (stale ECR references; nothing in the repo referenced it). Phase 6 — Renovate adoption (replaces Dependabot) - renovate.json: 6 group rules per the spec's pinning-strategy table; customManagers regex matching the new pin shape; osvVulnerabilityAlerts + security-priority group; pinDigests scoped to official actions/* + aws-actions/* only so the first run doesn't firehose; per-PR limits capped at 6. - .github/workflows/renovate.yml: twice-daily + workflow_dispatch. Action pinned by digest to v46.1.14. - .github/dependabot.yml deleted. Operator follow-ups (not in this PR) ==================================== - NAP-548: migrate scenarios off legacy claude-3-haiku-20240307 - NAP-549: revisit ProtectISB fallback by 2026-11-12 - NAP-550: service-quota Console requests - NAP-551: QuickSight subscription decision - NAP-552: mint RENOVATE_TOKEN repo secret - NAP-554: close in-flight Dependabot PRs - NAP-555: T2b.5b + T3.8 end-to-end verifications Closes: #226, #227, #228, #229, #230, #231, #232, #233, #235.

@chrisns

… issues Smoke pack: - 17 per-scenario .spec.ts files collapsed to 17 cloudformation/scenarios/*/smoke.ts configs (avg 25 lines, down from ~70) driven by a single tests/smoke/runner.ts - adminLogin() helper + get()/getSecret() accessors absorb login-form boilerplate - assertion-bar.ts removed; quarantine state lives in each scenario's smoke.ts - Playwright smoke project's testDir points at cloudformation/scenarios with testMatch /[^/]+\/smoke\.ts$/ so new scenarios auto-discover Workflows (1255 → 601 lines): - smoke.yml 467 → 204: scope decision, pre-deploy state, SCP drift, capture-events, teardown extracted to scripts/smoke-*.sh - deploy-blueprints.yml 726 → 354: 9 per-scenario synth jobs collapsed to a single matrix; CDK strip + SAM-bucket verify extracted to scripts/ - renovate.yml 62 → 43: narrative stripped (PAT kept per operator decision) Pre-existing CI fixes: - LocalGov IMS: dropped minLength:1 on GovUkPayApiKey so empty values are accepted (smoke deploys don't have a real key); doc clarified - Retention lint: added DeletionPolicy/UpdateReplacePolicy=Snapshot to the rule set so the central lint matches the prior inline check CODEOWNERS: @co-cddo/ndx (was @chrisns); broadened patterns to cover the new scripts and the per-scenario smoke.ts files. Operator note: IsbHubStack was UPDATE_ROLLBACK_FAILED for 12 days; recovered via continue-update-rollback. The 6 orphan StackSets (bops-planning, digital-planning-register, fixmystreet, minute, paperless-ngx, planx) were imported back into IsbHubStack via change-set-type IMPORT — IDs preserved, no ISB lease bindings broken.

…leanup works The deploy role's iam:DeleteRolePolicy was gated to arn:aws:iam::*:role/InnovationSandbox-ndx-* — but scenario templates use their own role naming (ndx-try-foi-role-*, etc.). Every smoke teardown left those roles stranded, putting the all-demo stack in DELETE_FAILED and poisoning the next run via stale AppRegistry applications. Resource broadened to arn:aws:iam::*:role/*. SCPs still constrain what the deploy role can actually do in practice; this just makes its identity-based policy permissive enough to manage scenario IAM. Live policy already updated; this commit syncs the runbook.

- smoke-capture-events.sh: nested-stack PhysicalResourceId is a full CloudFormation ARN with colons. The artefact uploader rejected those. Strip to the short stack name and sanitise the rest. The prior recursion added the broken filenames; without this, the artefact bundle never uploads when there are nested stacks (= every smoke failure). - runbook IAM: replace efs:* with elasticfilesystem:* (efs isn't a real IAM action prefix), drop aurora:* (covered by rds:*), and add servicediscovery:* (Minute's PrivateDnsNamespace creation needs it). Live policy already updated. Pre-existing issue not fixed yet: VPC quota in the smoke account is 5 (AWS default) but all-demo needs ~9 simultaneous VPCs. Service quota increase to 20 requested — AWS-side ticket pending.

…ck to * Smoke deploys uncovered missing services after VPC quota cleared: - wisdom:* (Amazon Q in Connect — ai-contact-centre) - s3vectors:* (S3 Vector Buckets — council-chatbot, ai-contact-centre KBs) - cognito-idp:* (simply-readable user pool) - cognito-identity:* (paired with idp) - iot:* (smart-car-park IoT Things) - bedrock:* (was an explicit-action list; Guardrails not covered) Live policy already updated.

Curating service-by-service IAM allow-lists for the SmokeTestDeployRole proved fragile — every new scenario surfaced another missing permission (wisdom, s3vectors, appsync, s3files, iot, cognito-*, servicediscovery, elasticfilesystem...). The inline policy can't ergonomically keep up. PowerUserAccess covers every AWS service except IAM. The custom inline SmokeTestDeployInline still constrains IAM specifically. The Restrictions SCP attached to the smoke account remains in force as the outer guard. Net effect: same authorisation envelope, far less maintenance. Live role updated.

Connect releases a phone number on stack-delete; the released number is held in a 30-day cooldown and consumes UK DID claim quota during that window. Long-lived smoke deploys would exhaust the quota in days. ai-contact-centre/template.yaml gains two opt-in parameters: ExistingPhoneNumberArn ARN of a pre-claimed number, OR '' to claim a new DID ExistingPhoneNumber the dialable E.164 string for that ARN Both default to '' → ClaimNewPhoneNumber=true → behaviour identical to today for ISB pool deploys (StackSet doesn't override the defaults). When set, the GeoNumber resource and the GeoFlowAssoc custom resource are skipped; the ExistingPhoneNumber is surfaced as the PstnNumber output for the smoke regex check. all-demo umbrella plumbs both values through (AiccExistingPhoneNumberArn, AiccExistingPhoneNumber). smoke.yml reads from docs/smoke-test-account-config.yml and passes via --parameter-overrides. Config holds placeholders today; ai-contact-centre's smoke spec is quarantined until the operator completes runbook Step 13 (one-time: create a holder Connect instance, claim a number against it, record the values). That step needs to run as the SmokeTestDeployRole because the Restrictions SCP blocks connect:CreateInstance from non-InnovationSandbox-ndx-* principals.

A single flaky scenario (Planx Hasura ECS circuit-breaker, etc.) shouldn't unwind 16 healthy nested stacks — Aurora cold-start, ALB warm-up and similar make full rebuilds slow and expensive. aws cloudformation deploy now passes --disable-rollback. On CREATE failure the umbrella stays in CREATE_FAILED with successful child stacks intact. The next run's pre-deploy state check recognises CREATE_FAILED and proceeds to update-stack against the same name; CFN's update-stack handles CREATE_FAILED stacks (since ~2020) by replacing only the failed resources. Matches the established fix-forward pattern (memory:feedback_cfn_fix_forward_failed_stack).

…d state) Pairs with --disable-rollback. Previously the teardown step ran on every 'always()' path including failures, which wiped the CREATE_FAILED state we want to keep so the next run can update-stack-fix-forward instead of rebuilding 16 healthy nested stacks. Now: teardown runs when the job succeeded OR when the event is the nightly schedule. PR failures leave the stack in CREATE_FAILED; the next push picks up where the last attempt fell over. Nightly cron still cleans up so the smoke account doesn't accumulate debris over weeks.

LogGroups with fixed names declared in CFN race the AWS-Lambda implicit LogGroup creator. When the explicit one fails to clean up (rollback gap, race, etc.) the orphan blocks the next deploy with AlreadyExists. With fix-forward (--disable-rollback) we re-attempt against the same name on every run, so the orphans recur. Proactive prune in pre-deploy: best-effort delete of the known set (/ndx-bops/production and the ndx-* Lambda log groups). Failures are tolerated (most runs the LG won't exist). Adds ~3s to pre-deploy on account of the 6 sequential calls.

CREATE_FAILED was the only fix-forward path. Once the umbrella stack exists, subsequent failures land in UPDATE_FAILED, not CREATE_FAILED. CFN's update-stack accepts both as starting states, so we should too.

All four planx custom images (hasura, api, editor, sharedb) are amd64-only. Fargate task definitions specifying ARM64 fail with CannotPullContainerError: 'image Manifest does not contain descriptor matching platform linux/arm64 v8'. ARM64 was selected for cost (~20% cheaper) but the image pipeline produces single-arch amd64 builds; the manifest list has no arm64 entry. Restoring ARM64 needs docker buildx multi-arch builds in the planx image CI.

The smoke account's Restrictions SCP blocks connect:CreateInstance from any principal whose name doesn't start with InnovationSandbox-ndx-*, so operator-side AWS sessions (SSO admin etc.) can't run the setup directly. This workflow runs as the SmokeTestDeployRole (same OIDC trust as smoke.yml) and is therefore allowed. The script is idempotent: re-running reuses an existing holder + number. Manually triggered (workflow_dispatch). After it runs, paste the printed values into docs/smoke-test-account-config.yml so every subsequent smoke deploy reuses the same number — avoids the 30-day release-cooldown that exhausts UK DID claim quota.

…umberArn) UK DID claim quota is 5/30days. Sustained smoke runs would exhaust it. Pass a non-empty placeholder ExistingPhoneNumberArn so the AICC template's ClaimNewPhoneNumber condition is false: no GeoNumber, no GeoFlowAssoc, no release-on-teardown. AICC's ConnectInstance + Lex + Wisdom + KB + companion UI still deploy real; smoke checks the companion URL HTTP status + that PstnNumber matches a +44 format string ("+442012345678" satisfies the regex). Holder Connect instance deleted (the 1 quota slot is now AICC's). Reverts the DeployAiContactCentre gate that was a stopgap while the holder existed.

UK DID claim quota is 5/30 days so smoke never claims a real number. The holder/pre-claim flow that smoke-pstn-setup.{yml,sh} implemented is no longer used — a non-empty DUMMY ExistingPhoneNumberArn keeps AICC's ClaimNewPhoneNumber condition false, so the template skips GeoNumber + GeoFlowAssoc entirely. AICC's ConnectInstance + Lex + Wisdom + KB + companion UI still deploy real and exercise the rest of the scenario. - delete .github/workflows/smoke-pstn-setup.yml - delete scripts/smoke-pstn-setup.sh - runbook Step 13 rewritten to describe the dummy-PSTN approach (no operator setup required) - config comments rewritten to reflect the new model

S3 templates carry several patches applied by hand during smoke debugging yesterday. Without these in CDK source / committed templates, the next deploy-blueprints run would silently revert every fix to the orphan-name collision modes. Sources brought in line with S3: - paperless-ngx: bucket name `ndx-try-paperless-archive-v2-…` (orphan v1 FS in smoke account holds the un-suffixed name hostage) - bops-planning: 8 fixed names suffixed with `-v2-` (cluster, ALB, services, roles, VPC, SGs, LG, bucket-empty role) — orphan stack stuck in UPDATE_ROLLBACK_COMPLETE_CLEANUP holds the un-suffixed set - council-chatbot, foi-redaction, planning-ai, text-to-speech, quicksight-dashboard, smart-car-park: LogGroupName suffixed with `-${AWS::StackName}` so an orphan LG from a previous rollback doesn't collide on AlreadyExists - minute, fixmystreet, localgov-drupal, localgov-ims, planx, digital-planning-register, paperless-ngx: same stack-name suffix on the CDK LogGroup name + its console-link CloudWatchLogsUrl output Stale-comment cleanups: - scripts/smoke-pre-deploy-state.sh: drop references to --disable-rollback (removed in 553a556) - cloudformation/scenarios/all-demo/smoke.ts: drop dead Condition-skip peek-loop (every Condition was removed in eab2a02) - cloudformation/scenarios/{ai-contact-centre,all-demo}/template.yaml: rewrite ExistingPhoneNumberArn param descriptions; drop "holder" sentence on GeoFlowAssoc (holder doesn't exist; smoke passes DUMMY) - .github/workflows/smoke.yml: tighten the SmokeRun-tag comment - tests/smoke/fixtures/cfn-outputs.ts: explain why `Login` is in the SENSITIVE_KEY_PATTERN (LoginUrl can carry pre-auth tokens) - tests/smoke/fixtures/secure-form.ts: drop the 10s consumed-poll (smoke specs always click submit AFTER fillPassword returns, so the original unroute-before-submit semantics couldn't redact anything; leave the routeHandler armed instead so the first post-fill POST is rewritten) bops-planning-stack.test.ts updated to expect the new -v2 names.

…iants The .json was an accidental check-in from yesterday's CDK exploration. The deployed template is template.yaml; the .json was dead.

The route handler rewrote the form-encoded POST body to "REDACTED-<hash>" to keep cleartext out of the Playwright trace, but `route.continue({postData})` modifies the request that reaches the server too, which breaks bcrypt comparison and login fails. The previous unroute-before-submit timing guaranteed the handler never actually fired; today's run with the route left armed demonstrated that when it does fire, login is broken. Simplify to a plain `page.fill` wrapper. The SensitiveValue contract via .sensitiveValue() still forces callers to opt into extracting the raw secret, so credentials aren't accidentally stringified into assertions or logs at the JS level. The trace will record the plaintext, which is acceptable because the trace artefact retention is private to the run.

…AC5.1)

- tests/smoke/fixtures/assertion-bar.ts: 17 AssertionBarRow entries (one per scenario incl. all-demo umbrella) indexing what each spec asserts and the historical regression that motivated it. Smoke specs remain the source of truth; this is the reviewer-facing index. - .github/workflows/smoke.yml: scope job emits `override=true` when the PR carries the `smoke-override-emergency` label; smoke job's `if:` skips when override is active, so the gate clears. CODEOWNERS approval is enforced by repo branch-protection (out of band). - .github/workflows/smoke-override-followup.yml: hourly cron opens a `smoke-override-followup` issue 48h after the merge so the underlying regression doesn't get forgotten. Idempotent on PR number.

Each scenario now asserts something beyond "landing returned 200 / login form submitted". Specifically the new assertions surface bug-shaped regressions that the old surface-only probes would have missed. Scenario depth-adds (one bullet per spec; the spec is the source of truth): - ai-contact-centre: companion SPA renders the seeded Aldershire welcome + the Ask/Call-via-browser entry points (Connect bootstrap + SPA assets) - bops-planning: post-login dashboard has the 5-tab nav; /planning_applications/all lists seeded applications by numeric-id href (seed_sample_data.rb regression) - council-chatbot: parse the JSON response, assert success/citations/model fields, then a 2nd POST with the returned sessionId round-trips (session store regression) - digital-planning-register: council selector renders > 0 council links; drilling into a council shows "Recently published applications" list with application-ref links (planning-data API regression) - fixmystreet: /reports dashboard shows > 0 reports across categories (seed/DB/cron); /admin/reports moderation queue has report links - foi-redaction: POST sample text with full PII set; assert redactionCount > 0 AND original strings are gone AND NAME+EMAIL entities detected - localgov-drupal: /admin/modules confirms ndx_aws_ai + ndx_council_generator modules enabled (Bedrock IAM regression silently disables them); /admin/content has seeded demo content - localgov-ims: post-login dashboard shows IIS nav (Dashboard/Transactions/ Payment/Users); /Payment/Create renders the GOV.UK Pay payment basket form - minute: landing shows "AI transcription and drafting service"; /templates lists the seeded Document + Form template types - paperless-ngx: /api/documents/?page=1 parsed; count > 0, first doc has > 50 chars of OCR content, at least one doc has an "AI summary" note (Bedrock post-consume hook regression) - planning-ai: POST {useSample:true} triggers full Textract+Bedrock pipeline; assert wordCount > 100, OCR confidence > 80%, AI extraction populates applicationRef + summary + classification - planx: post-login editor renders team/flow links or > 5 interactive elements (seed migrations + API/Postgres reachability) - quicksight-dashboard: portal URL returns < 400, body mentions QuickSight/Sign in; data bucket CSV path responds 200 or 403 (existence, not anon access) - simply-readable: app redirects to Cognito hosted UI with username + password inputs visible; no 5xx on reload (BlueprintsBucketName regression) - smart-car-park: dashboard shows Total Spaces > 0 (DynamoDB seed + aggregation); zone breakdown renders 3+ zone headings - text-to-speech: POST {text,voice}, fetch returned signed audioUrl, assert content-type audio/mpeg + body > 1KB + MP3 magic byte at offset 0 Token-redaction in fillPassword stayed disabled (AC3.12 deviation documented earlier). secure-form.ts already simplified.

…, Planx SPA-bootstrap

The CI workflow's inline Dockerfile set VITE_APP_AIRBRAKE_PROJECT_ID=0 and VITE_APP_AIRBRAKE_PROJECT_KEY=unused. Both truthy strings, so upstream's `hasConfig` check passes, then `new Notifier({projectId: 0, projectKey: "unused"})` is called and Airbrake validates projectId truthy (0 is falsy) and throws "projectId and projectKey are required", blanking the editor SPA. Fix: stop passing the env vars, and apply a build-time overlay that replaces airbrake.ts with an unconditional no-op stub so the import path is safe regardless of upstream drift. Same pattern as the existing validateDomain overlay, now also applied in CI (previously only in the local build.sh). Also fixes VITE_APP_HASURA_URL drift between CI (/hasura/v1/graphql) and the local Dockerfile (/v1/graphql) - CloudFront routes /v1/* and /console/* directly to Hasura, no /hasura prefix.

CDK pin update for the new editor image (builds without VITE_APP_AIRBRAKE_* and with the airbrake.ts no-op overlay). Smoke spec tightened: previous version only asserted that the SPA bundle was served by CloudFront because the React tree was crashing in init. Now require the editor dashboard ("Select a team" heading + "My teams" section + at least one team card link) to render, and fail if the Airbrake bootstrap error message appears in the browser console.

chrisns added 2 commits May 13, 2026 09:28

chrisns had a problem deploying to smoke-test-deploy May 13, 2026 16:07 — with GitHub Actions Failure

smoke: recurse into nested stacks when capturing CFN events

4678d93

chrisns had a problem deploying to smoke-test-deploy May 14, 2026 08:34 — with GitHub Actions Failure

chrisns had a problem deploying to smoke-test-deploy May 14, 2026 08:48 — with GitHub Actions Failure

chrisns had a problem deploying to smoke-test-deploy May 14, 2026 09:05 — with GitHub Actions Failure

chrisns had a problem deploying to smoke-test-deploy May 14, 2026 09:25 — with GitHub Actions Failure

chrisns had a problem deploying to smoke-test-deploy May 14, 2026 09:49 — with GitHub Actions Failure

chrisns had a problem deploying to smoke-test-deploy May 14, 2026 10:04 — with GitHub Actions Failure

chrisns had a problem deploying to smoke-test-deploy May 14, 2026 10:55 — with GitHub Actions Failure

chrisns had a problem deploying to smoke-test-deploy May 14, 2026 11:03 — with GitHub Actions Failure

chrisns had a problem deploying to smoke-test-deploy May 14, 2026 11:33 — with GitHub Actions Failure

chrisns had a problem deploying to smoke-test-deploy May 14, 2026 11:48 — with GitHub Actions Failure

chrisns had a problem deploying to smoke-test-deploy May 14, 2026 12:26 — with GitHub Actions Failure

smoke pre-deploy: also fix-forward from UPDATE_FAILED

2ee928f

CREATE_FAILED was the only fix-forward path. Once the umbrella stack exists, subsequent failures land in UPDATE_FAILED, not CREATE_FAILED. CFN's update-stack accepts both as starting states, so we should too.

chrisns had a problem deploying to smoke-test-deploy May 14, 2026 12:43 — with GitHub Actions Failure

chrisns had a problem deploying to smoke-test-deploy May 14, 2026 12:54 — with GitHub Actions Failure

chrisns had a problem deploying to smoke-test-deploy May 14, 2026 13:14 — with GitHub Actions Failure

chrisns had a problem deploying to smoke-test-deploy May 14, 2026 13:53 — with GitHub Actions Failure

chrisns temporarily deployed to smoke-test-deploy May 17, 2026 08:21 — with GitHub Actions Inactive

chrisns temporarily deployed to smoke-test-deploy May 18, 2026 09:44 — with GitHub Actions Inactive

chrisns temporarily deployed to smoke-test-deploy May 18, 2026 10:13 — with GitHub Actions Inactive

chrisns had a problem deploying to smoke-test-deploy May 18, 2026 10:45 — with GitHub Actions Failure

chore: untrack stray localgov-ims/template.json + gitignore .json var…

1e6c8aa

…iants The .json was an accidental check-in from yesterday's CDK exploration. The deployed template is template.yaml; the .json was dead.

chrisns had a problem deploying to smoke-test-deploy May 18, 2026 10:52 — with GitHub Actions Failure

chrisns temporarily deployed to smoke-test-deploy May 18, 2026 11:01 — with GitHub Actions Inactive

chrisns added 2 commits May 18, 2026 13:48

smoke all-demo: pin DPR default image to sha-3e68d51@sha256: digest (…

892b132

…AC5.1)

chrisns temporarily deployed to smoke-test-deploy May 18, 2026 12:51 — with GitHub Actions Inactive

chrisns had a problem deploying to smoke-test-deploy May 18, 2026 13:40 — with GitHub Actions Failure

smoke fixes: TTS audio/mp3, SR Cognito hidden inputs, BOPS ref format…

edfd9a2

…, Planx SPA-bootstrap

chrisns had a problem deploying to smoke-test-deploy May 18, 2026 13:50 — with GitHub Actions Failure

smoke planx: drop /app URL assertion (login lands on /, not /app)

4e5c53c

chrisns temporarily deployed to smoke-test-deploy May 18, 2026 13:57 — with GitHub Actions Inactive

chrisns temporarily deployed to smoke-test-deploy May 18, 2026 14:22 — with GitHub Actions Inactive

chrisns temporarily deployed to smoke-test-deploy May 18, 2026 14:29 — with GitHub Actions Inactive

chrisns temporarily deployed to smoke-test-deploy May 18, 2026 14:54 — with GitHub Actions Inactive

chrisns temporarily deployed to smoke-test-deploy May 18, 2026 14:58 — with GitHub Actions Inactive

chrisns added this pull request to the merge queue May 18, 2026

Merged via the queue into main with commit 531ac0d May 18, 2026
21 checks passed

chrisns deleted the refactor/smoke-pack-dryrun-and-fixes branch May 18, 2026 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

smoke pack DRY, workflow extraction, CI unblock#238

smoke pack DRY, workflow extraction, CI unblock#238
chrisns merged 43 commits into
mainfrom
refactor/smoke-pack-dryrun-and-fixes

chrisns commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chrisns commented May 13, 2026

Summary

Operator actions taken out-of-band (state already reconciled)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant