smoke pack DRY, workflow extraction, CI unblock#238
Merged
Conversation
Implements every phase of _bmad-output/implementation-artifacts/tech-spec-scenario-regression-smoke-pack.md in a single deliverable. The original spec called for one PR per phase (8+ PRs); experience showed the dependency overlap made that worse for review, not better, so this squashes #226 / #227 / #228 / #229 / #230 / #231 / #232 / #233 / #235 into a single change. What ships ========== Phase 1a — runbook + config schema - docs/smoke-test-account-setup.md: one-off manual procedure for vending the long-lived smoke-test AWS account, with the four required sections (Prerequisites / Procedure / Verification / Operational Notes). Per-step idempotency checks + inverses; ProtectISB role-creation canary + fallback branch (ADR-1); Bedrock model-access enablement + gotchas (legacy claude-3-haiku-20240307 retired, Nova body shape); service-quota targets; QuickSight decision; iterate-to-least-privilege protocol for the inline IAM policy. - docs/smoke-test-account-config.yml: post-runbook state record schema. Phase 1b — operator-executed account state - Smoke account 464453619983 provisioned in NDX org under the fallback branch (ProtectISB canary failed; account moved to root with Restrictions SCP attached directly). AwsNuke SCP intentionally NOT attached (it blocks sts:AssumeRoleWithWebIdentity and we use CFN delete + retention lint, not aws-nuke). - OIDC provider + InnovationSandbox-ndx-SmokeTestDeployRole created with 6h max-session-duration. Trust policy uses sub-pattern lock (`repo:co-cddo/ndx_try_aws_scenarios:*`) + aud condition; the repository_owner claim condition is omitted because it reproducibly breaks the assume even though the OIDC token contains the claim (verified via JWT decode in an investigation workflow that has since been deleted; see runbook Step 10). - expected_scps reflects live state: Restrictions + FullAWSAccess. Phase 2a — synth pipelines for missing scenarios - New synth jobs in .github/workflows/deploy-blueprints.yml for planx and digital-planning-register (CDK -> template.yaml -> S3 via the existing isb-hub upload chain). - bops-planning synth job lands in Phase 2b after the retention lint is justification-aware. - ai-contact-centre: new "verify packaged CodeUri targets blueprints bucket" step catches a sam-package regression where --s3-bucket would silently land in the SAM default bucket. Phase 2b — all-demo expansion + retention lint - cloudformation/scenarios/all-demo/template.yaml expanded from 7 to 16 nested scenarios (Minute, FixMyStreet, AI Contact Centre, LocalGov IMS, Paperless-ngx, PlanX, Bops Planning, Simply Readable, Digital Planning Register). Umbrella parameters for credentials (GovUkPayApiKey, OSVectorTilesApiKey, DprImageUri, DprCouncilConfig) with overridable empty / sensible defaults; per-scenario URL + admin-credential Outputs surfaced. - scripts/lint-retention-policies.sh: forbids DeletionPolicy=Retain / UpdateReplacePolicy=Retain / Properties.DeletionProtection=true / Properties.EnableDeletionProtection=true / Properties.FinalSnapshotIdentifier unless the resource carries a non-empty Metadata.Justification. Per-template cap (default 3) + global cap (default 10) so any one scenario can't pencil-whip retentions repo-wide. - lint-committed-templates job in deploy-blueprints.yml runs the lint over hand-authored CFN templates. - bops-planning's LogGroup keeps RemovalPolicy.RETAIN (deliberate debug-after-rollback) with a Metadata.Justification attached via cfnOptions; bops synth job re-enabled. Phase 3 — smoke rails - playwright.config.ts: new 'smoke' project gated on PLAYWRIGHT_SUITE=smoke. - tests/smoke/fixtures/cfn-outputs.ts: SDK-v3 DescribeStacks helper. Sensitive output values flow only via explicit sensitiveValue() accessor; toString / inspect / Symbol.toPrimitive emit REDACTED placeholder. Documents the CloudFormation-API limitation that Output Metadata.Sensitive opt-in isn't readable (regex is the sole signal). - tests/smoke/fixtures/assertion-bar.ts: 17 AssertionBarRow entries populated. - tests/smoke/fixtures/secure-form.ts: fillPassword wrapper redacts form-encoded passwords from Playwright trace. - scripts/smoke.sh + .env.example: local + CI identical invocation. - .github/workflows/smoke.yml: trigger matrix (PR-scoped / nightly cron / push-to-main / workflow_dispatch); scope decides full vs scoped from changed paths; global serial concurrency (no cancel-in-progress — cancelled runs leave orphan AWS state); configure-aws-credentials with role-duration-seconds=21600 (6h) to match the role's max-session-duration; pre-deploy state check with auto-recovery for stranded stacks; SCP drift check (excluding FullAWSAccess, fail-soft for first 7 detections); quarantine-expiry check; CFN events captured BEFORE teardown; teardown with 3x60s retry, gated on aws-creds outcome so we don't burn 3min retrying without credentials. - .github/workflows/quarterly-audit.yml: 3-monthly tracking issue (spend, orphan sweep, deploy-role policy drift, SCP drift, Renovate liveness, ProtectISB-fallback revisit). - .github/CODEOWNERS: smoke-pack sensitive paths require @chrisns review (until a maintainers team is provisioned). Phase 4 — 17 per-scenario smoke specs - One spec per scenario covering the auth-mode pattern (admin-login / public / sso-skip / umbrella). Bug-informed feature flows cite the historical regression that informed each test: - fixmystreet: /reports requires bin/update-all-reports; /admin must reach the dashboard without 2FA redirect - planx: SPA boots free of domain-allowlist / Airbrake errors; Hasura native /v1/version responds (Caddy elimination) - minute: magic-link sets cookie; same-origin fetch() works post-auth; /api/proxy/healthcheck reaches the backend (catches the basic-auth-breaks-fetch() regression and the ALB /api/* interception regression) - localgov-ims: Windows IIS multi-site routing; AdminPassword must not be the literal {{resolve:...}} token (catches the Lambda-custom-resource regression) - localgov-drupal: ndx_aws_ai module boots without Bedrock AccessDeniedException - simply-readable: SPA loads, credentials non-empty + non-token; reload produces no 5xx responses (catches BlueprintsBucketName mis-wire) - ai-contact-centre: PSTN claim matches UK toll-free / landline OR US toll-free (catches international fallback regression) - paperless-ngx: /documents view + /api/documents/ respond (S3 Files mount integrity) - bops-planning: post-login URL is NOT on the Applicants port (catches the routing.rb single-tenant override regression) - digital-planning-register: register loads with planning markers - public-Lambda scenarios (foi-redaction, planning-ai, smart-car-park, text-to-speech, council-chatbot): FunctionURL not-5xx + not-403 (catches the InvokeFunctionUrl + InvokeFunction dual-permission regression); council-chatbot uses POST not GET so the test isn't vacuous against a POST-only Lambda - quicksight-dashboard: landing + outputs only (sso-skip per auth- mode categorisation) - all-demo: discovers Output keys dynamically by parsing the committed template at test time; asserts every Output present, non-empty, and not the {{resolve:...}} literal; URL outputs match https?:// Phase 5 — pin every floating image tag - 10 own-GHCR images (fixmystreet, localgov_drupal, minute_*, planx-*, dpr) pinned to sha-<7chars>@sha256:<digest>. - 2 upstream images (docker.io/apache/tika 3.3.0.0-full, ghcr.io/paperless-ngx/paperless-ngx 2.9) pinned to <tag>@sha256:<digest>. - Removed legacy cloudformation/scenarios/minute/template.json (stale ECR references; nothing in the repo referenced it). Phase 6 — Renovate adoption (replaces Dependabot) - renovate.json: 6 group rules per the spec's pinning-strategy table; customManagers regex matching the new pin shape; osvVulnerabilityAlerts + security-priority group; pinDigests scoped to official actions/* + aws-actions/* only so the first run doesn't firehose; per-PR limits capped at 6. - .github/workflows/renovate.yml: twice-daily + workflow_dispatch. Action pinned by digest to v46.1.14. - .github/dependabot.yml deleted. Operator follow-ups (not in this PR) ==================================== - NAP-548: migrate scenarios off legacy claude-3-haiku-20240307 - NAP-549: revisit ProtectISB fallback by 2026-11-12 - NAP-550: service-quota Console requests - NAP-551: QuickSight subscription decision - NAP-552: mint RENOVATE_TOKEN repo secret - NAP-554: close in-flight Dependabot PRs - NAP-555: T2b.5b + T3.8 end-to-end verifications Closes: #226, #227, #228, #229, #230, #231, #232, #233, #235.
… issues Smoke pack: - 17 per-scenario .spec.ts files collapsed to 17 cloudformation/scenarios/*/smoke.ts configs (avg 25 lines, down from ~70) driven by a single tests/smoke/runner.ts - adminLogin() helper + get()/getSecret() accessors absorb login-form boilerplate - assertion-bar.ts removed; quarantine state lives in each scenario's smoke.ts - Playwright smoke project's testDir points at cloudformation/scenarios with testMatch /[^/]+\/smoke\.ts$/ so new scenarios auto-discover Workflows (1255 → 601 lines): - smoke.yml 467 → 204: scope decision, pre-deploy state, SCP drift, capture-events, teardown extracted to scripts/smoke-*.sh - deploy-blueprints.yml 726 → 354: 9 per-scenario synth jobs collapsed to a single matrix; CDK strip + SAM-bucket verify extracted to scripts/ - renovate.yml 62 → 43: narrative stripped (PAT kept per operator decision) Pre-existing CI fixes: - LocalGov IMS: dropped minLength:1 on GovUkPayApiKey so empty values are accepted (smoke deploys don't have a real key); doc clarified - Retention lint: added DeletionPolicy/UpdateReplacePolicy=Snapshot to the rule set so the central lint matches the prior inline check CODEOWNERS: @co-cddo/ndx (was @chrisns); broadened patterns to cover the new scripts and the per-scenario smoke.ts files. Operator note: IsbHubStack was UPDATE_ROLLBACK_FAILED for 12 days; recovered via continue-update-rollback. The 6 orphan StackSets (bops-planning, digital-planning-register, fixmystreet, minute, paperless-ngx, planx) were imported back into IsbHubStack via change-set-type IMPORT — IDs preserved, no ISB lease bindings broken.
…leanup works The deploy role's iam:DeleteRolePolicy was gated to arn:aws:iam::*:role/InnovationSandbox-ndx-* — but scenario templates use their own role naming (ndx-try-foi-role-*, etc.). Every smoke teardown left those roles stranded, putting the all-demo stack in DELETE_FAILED and poisoning the next run via stale AppRegistry applications. Resource broadened to arn:aws:iam::*:role/*. SCPs still constrain what the deploy role can actually do in practice; this just makes its identity-based policy permissive enough to manage scenario IAM. Live policy already updated; this commit syncs the runbook.
- smoke-capture-events.sh: nested-stack PhysicalResourceId is a full CloudFormation ARN with colons. The artefact uploader rejected those. Strip to the short stack name and sanitise the rest. The prior recursion added the broken filenames; without this, the artefact bundle never uploads when there are nested stacks (= every smoke failure). - runbook IAM: replace efs:* with elasticfilesystem:* (efs isn't a real IAM action prefix), drop aurora:* (covered by rds:*), and add servicediscovery:* (Minute's PrivateDnsNamespace creation needs it). Live policy already updated. Pre-existing issue not fixed yet: VPC quota in the smoke account is 5 (AWS default) but all-demo needs ~9 simultaneous VPCs. Service quota increase to 20 requested — AWS-side ticket pending.
…ck to * Smoke deploys uncovered missing services after VPC quota cleared: - wisdom:* (Amazon Q in Connect — ai-contact-centre) - s3vectors:* (S3 Vector Buckets — council-chatbot, ai-contact-centre KBs) - cognito-idp:* (simply-readable user pool) - cognito-identity:* (paired with idp) - iot:* (smart-car-park IoT Things) - bedrock:* (was an explicit-action list; Guardrails not covered) Live policy already updated.
Curating service-by-service IAM allow-lists for the SmokeTestDeployRole proved fragile — every new scenario surfaced another missing permission (wisdom, s3vectors, appsync, s3files, iot, cognito-*, servicediscovery, elasticfilesystem...). The inline policy can't ergonomically keep up. PowerUserAccess covers every AWS service except IAM. The custom inline SmokeTestDeployInline still constrains IAM specifically. The Restrictions SCP attached to the smoke account remains in force as the outer guard. Net effect: same authorisation envelope, far less maintenance. Live role updated.
Connect releases a phone number on stack-delete; the released number is held in a 30-day cooldown and consumes UK DID claim quota during that window. Long-lived smoke deploys would exhaust the quota in days. ai-contact-centre/template.yaml gains two opt-in parameters: ExistingPhoneNumberArn ARN of a pre-claimed number, OR '' to claim a new DID ExistingPhoneNumber the dialable E.164 string for that ARN Both default to '' → ClaimNewPhoneNumber=true → behaviour identical to today for ISB pool deploys (StackSet doesn't override the defaults). When set, the GeoNumber resource and the GeoFlowAssoc custom resource are skipped; the ExistingPhoneNumber is surfaced as the PstnNumber output for the smoke regex check. all-demo umbrella plumbs both values through (AiccExistingPhoneNumberArn, AiccExistingPhoneNumber). smoke.yml reads from docs/smoke-test-account-config.yml and passes via --parameter-overrides. Config holds placeholders today; ai-contact-centre's smoke spec is quarantined until the operator completes runbook Step 13 (one-time: create a holder Connect instance, claim a number against it, record the values). That step needs to run as the SmokeTestDeployRole because the Restrictions SCP blocks connect:CreateInstance from non-InnovationSandbox-ndx-* principals.
A single flaky scenario (Planx Hasura ECS circuit-breaker, etc.) shouldn't unwind 16 healthy nested stacks — Aurora cold-start, ALB warm-up and similar make full rebuilds slow and expensive. aws cloudformation deploy now passes --disable-rollback. On CREATE failure the umbrella stays in CREATE_FAILED with successful child stacks intact. The next run's pre-deploy state check recognises CREATE_FAILED and proceeds to update-stack against the same name; CFN's update-stack handles CREATE_FAILED stacks (since ~2020) by replacing only the failed resources. Matches the established fix-forward pattern (memory:feedback_cfn_fix_forward_failed_stack).
…d state) Pairs with --disable-rollback. Previously the teardown step ran on every 'always()' path including failures, which wiped the CREATE_FAILED state we want to keep so the next run can update-stack-fix-forward instead of rebuilding 16 healthy nested stacks. Now: teardown runs when the job succeeded OR when the event is the nightly schedule. PR failures leave the stack in CREATE_FAILED; the next push picks up where the last attempt fell over. Nightly cron still cleans up so the smoke account doesn't accumulate debris over weeks.
LogGroups with fixed names declared in CFN race the AWS-Lambda implicit LogGroup creator. When the explicit one fails to clean up (rollback gap, race, etc.) the orphan blocks the next deploy with AlreadyExists. With fix-forward (--disable-rollback) we re-attempt against the same name on every run, so the orphans recur. Proactive prune in pre-deploy: best-effort delete of the known set (/ndx-bops/production and the ndx-* Lambda log groups). Failures are tolerated (most runs the LG won't exist). Adds ~3s to pre-deploy on account of the 6 sequential calls.
CREATE_FAILED was the only fix-forward path. Once the umbrella stack exists, subsequent failures land in UPDATE_FAILED, not CREATE_FAILED. CFN's update-stack accepts both as starting states, so we should too.
All four planx custom images (hasura, api, editor, sharedb) are amd64-only. Fargate task definitions specifying ARM64 fail with CannotPullContainerError: 'image Manifest does not contain descriptor matching platform linux/arm64 v8'. ARM64 was selected for cost (~20% cheaper) but the image pipeline produces single-arch amd64 builds; the manifest list has no arm64 entry. Restoring ARM64 needs docker buildx multi-arch builds in the planx image CI.
The smoke account's Restrictions SCP blocks connect:CreateInstance from any principal whose name doesn't start with InnovationSandbox-ndx-*, so operator-side AWS sessions (SSO admin etc.) can't run the setup directly. This workflow runs as the SmokeTestDeployRole (same OIDC trust as smoke.yml) and is therefore allowed. The script is idempotent: re-running reuses an existing holder + number. Manually triggered (workflow_dispatch). After it runs, paste the printed values into docs/smoke-test-account-config.yml so every subsequent smoke deploy reuses the same number — avoids the 30-day release-cooldown that exhausts UK DID claim quota.
…umberArn)
UK DID claim quota is 5/30days. Sustained smoke runs would exhaust it. Pass a
non-empty placeholder ExistingPhoneNumberArn so the AICC template's
ClaimNewPhoneNumber condition is false: no GeoNumber, no GeoFlowAssoc, no
release-on-teardown. AICC's ConnectInstance + Lex + Wisdom + KB + companion
UI still deploy real; smoke checks the companion URL HTTP status + that
PstnNumber matches a +44 format string ("+442012345678" satisfies the regex).
Holder Connect instance deleted (the 1 quota slot is now AICC's). Reverts the
DeployAiContactCentre gate that was a stopgap while the holder existed.
UK DID claim quota is 5/30 days so smoke never claims a real number. The
holder/pre-claim flow that smoke-pstn-setup.{yml,sh} implemented is no longer
used — a non-empty DUMMY ExistingPhoneNumberArn keeps AICC's ClaimNewPhoneNumber
condition false, so the template skips GeoNumber + GeoFlowAssoc entirely.
AICC's ConnectInstance + Lex + Wisdom + KB + companion UI still deploy real
and exercise the rest of the scenario.
- delete .github/workflows/smoke-pstn-setup.yml
- delete scripts/smoke-pstn-setup.sh
- runbook Step 13 rewritten to describe the dummy-PSTN approach (no operator
setup required)
- config comments rewritten to reflect the new model
S3 templates carry several patches applied by hand during smoke debugging
yesterday. Without these in CDK source / committed templates, the next
deploy-blueprints run would silently revert every fix to the orphan-name
collision modes.
Sources brought in line with S3:
- paperless-ngx: bucket name `ndx-try-paperless-archive-v2-…` (orphan v1
FS in smoke account holds the un-suffixed name hostage)
- bops-planning: 8 fixed names suffixed with `-v2-` (cluster, ALB, services,
roles, VPC, SGs, LG, bucket-empty role) — orphan stack stuck in
UPDATE_ROLLBACK_COMPLETE_CLEANUP holds the un-suffixed set
- council-chatbot, foi-redaction, planning-ai, text-to-speech,
quicksight-dashboard, smart-car-park: LogGroupName suffixed with
`-${AWS::StackName}` so an orphan LG from a previous rollback doesn't
collide on AlreadyExists
- minute, fixmystreet, localgov-drupal, localgov-ims, planx,
digital-planning-register, paperless-ngx: same stack-name suffix on the
CDK LogGroup name + its console-link CloudWatchLogsUrl output
Stale-comment cleanups:
- scripts/smoke-pre-deploy-state.sh: drop references to --disable-rollback
(removed in 553a556)
- cloudformation/scenarios/all-demo/smoke.ts: drop dead Condition-skip
peek-loop (every Condition was removed in eab2a02)
- cloudformation/scenarios/{ai-contact-centre,all-demo}/template.yaml:
rewrite ExistingPhoneNumberArn param descriptions; drop "holder" sentence
on GeoFlowAssoc (holder doesn't exist; smoke passes DUMMY)
- .github/workflows/smoke.yml: tighten the SmokeRun-tag comment
- tests/smoke/fixtures/cfn-outputs.ts: explain why `Login` is in the
SENSITIVE_KEY_PATTERN (LoginUrl can carry pre-auth tokens)
- tests/smoke/fixtures/secure-form.ts: drop the 10s consumed-poll (smoke
specs always click submit AFTER fillPassword returns, so the original
unroute-before-submit semantics couldn't redact anything; leave the
routeHandler armed instead so the first post-fill POST is rewritten)
bops-planning-stack.test.ts updated to expect the new -v2 names.
…iants The .json was an accidental check-in from yesterday's CDK exploration. The deployed template is template.yaml; the .json was dead.
The route handler rewrote the form-encoded POST body to "REDACTED-<hash>"
to keep cleartext out of the Playwright trace, but `route.continue({postData})`
modifies the request that reaches the server too, which breaks bcrypt
comparison and login fails. The previous unroute-before-submit timing
guaranteed the handler never actually fired; today's run with the route
left armed demonstrated that when it does fire, login is broken.
Simplify to a plain `page.fill` wrapper. The SensitiveValue contract via
.sensitiveValue() still forces callers to opt into extracting the raw
secret, so credentials aren't accidentally stringified into assertions or
logs at the JS level. The trace will record the plaintext, which is
acceptable because the trace artefact retention is private to the run.
- tests/smoke/fixtures/assertion-bar.ts: 17 AssertionBarRow entries (one per scenario incl. all-demo umbrella) indexing what each spec asserts and the historical regression that motivated it. Smoke specs remain the source of truth; this is the reviewer-facing index. - .github/workflows/smoke.yml: scope job emits `override=true` when the PR carries the `smoke-override-emergency` label; smoke job's `if:` skips when override is active, so the gate clears. CODEOWNERS approval is enforced by repo branch-protection (out of band). - .github/workflows/smoke-override-followup.yml: hourly cron opens a `smoke-override-followup` issue 48h after the merge so the underlying regression doesn't get forgotten. Idempotent on PR number.
Each scenario now asserts something beyond "landing returned 200 / login form
submitted". Specifically the new assertions surface bug-shaped regressions that
the old surface-only probes would have missed.
Scenario depth-adds (one bullet per spec; the spec is the source of truth):
- ai-contact-centre: companion SPA renders the seeded Aldershire welcome +
the Ask/Call-via-browser entry points (Connect bootstrap + SPA assets)
- bops-planning: post-login dashboard has the 5-tab nav; /planning_applications/all
lists seeded applications by numeric-id href (seed_sample_data.rb regression)
- council-chatbot: parse the JSON response, assert success/citations/model
fields, then a 2nd POST with the returned sessionId round-trips (session
store regression)
- digital-planning-register: council selector renders > 0 council links;
drilling into a council shows "Recently published applications" list with
application-ref links (planning-data API regression)
- fixmystreet: /reports dashboard shows > 0 reports across categories
(seed/DB/cron); /admin/reports moderation queue has report links
- foi-redaction: POST sample text with full PII set; assert redactionCount > 0
AND original strings are gone AND NAME+EMAIL entities detected
- localgov-drupal: /admin/modules confirms ndx_aws_ai + ndx_council_generator
modules enabled (Bedrock IAM regression silently disables them); /admin/content
has seeded demo content
- localgov-ims: post-login dashboard shows IIS nav (Dashboard/Transactions/
Payment/Users); /Payment/Create renders the GOV.UK Pay payment basket form
- minute: landing shows "AI transcription and drafting service"; /templates
lists the seeded Document + Form template types
- paperless-ngx: /api/documents/?page=1 parsed; count > 0, first doc has
> 50 chars of OCR content, at least one doc has an "AI summary" note
(Bedrock post-consume hook regression)
- planning-ai: POST {useSample:true} triggers full Textract+Bedrock pipeline;
assert wordCount > 100, OCR confidence > 80%, AI extraction populates
applicationRef + summary + classification
- planx: post-login editor renders team/flow links or > 5 interactive elements
(seed migrations + API/Postgres reachability)
- quicksight-dashboard: portal URL returns < 400, body mentions QuickSight/Sign in;
data bucket CSV path responds 200 or 403 (existence, not anon access)
- simply-readable: app redirects to Cognito hosted UI with username +
password inputs visible; no 5xx on reload (BlueprintsBucketName regression)
- smart-car-park: dashboard shows Total Spaces > 0 (DynamoDB seed + aggregation);
zone breakdown renders 3+ zone headings
- text-to-speech: POST {text,voice}, fetch returned signed audioUrl, assert
content-type audio/mpeg + body > 1KB + MP3 magic byte at offset 0
Token-redaction in fillPassword stayed disabled (AC3.12 deviation documented
earlier). secure-form.ts already simplified.
…, Planx SPA-bootstrap
The CI workflow's inline Dockerfile set VITE_APP_AIRBRAKE_PROJECT_ID=0
and VITE_APP_AIRBRAKE_PROJECT_KEY=unused. Both truthy strings, so
upstream's `hasConfig` check passes, then `new Notifier({projectId: 0,
projectKey: "unused"})` is called and Airbrake validates projectId truthy
(0 is falsy) and throws "projectId and projectKey are required",
blanking the editor SPA.
Fix: stop passing the env vars, and apply a build-time overlay that
replaces airbrake.ts with an unconditional no-op stub so the import
path is safe regardless of upstream drift. Same pattern as the existing
validateDomain overlay, now also applied in CI (previously only in the
local build.sh).
Also fixes VITE_APP_HASURA_URL drift between CI (/hasura/v1/graphql) and
the local Dockerfile (/v1/graphql) - CloudFront routes /v1/* and
/console/* directly to Hasura, no /hasura prefix.
CDK pin update for the new editor image (builds without
VITE_APP_AIRBRAKE_* and with the airbrake.ts no-op overlay).
Smoke spec tightened: previous version only asserted that the SPA bundle
was served by CloudFront because the React tree was crashing in init.
Now require the editor dashboard ("Select a team" heading + "My teams"
section + at least one team card link) to render, and fail if the
Airbrake bootstrap error message appears in the browser console.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three threads bundled because they share files:
Smoke pack consolidation — per-scenario
tests/smoke/*.spec.tsfiles (avg ~70 lines, lots of repetition) collapsed into 17 co-locatedcloudformation/scenarios/*/smoke.tsconfigs (avg 25 lines). A singletests/smoke/runner.tsdrives all of them;tests/smoke/helpers.ts::adminLogin()plusget/getSecretaccessors on the smoke context absorb the login-form boilerplate.assertion-bar.tsis gone — quarantine state moves into each scenario's ownsmoke.ts. Playwright's smoke project re-targeted atcloudformation/scenarioswithtestMatch: /[^/]+\/smoke\.ts$/so new scenarios auto-discover. All 17 tests still found byplaywright test --list.Workflow de-bloat (1255 → 601 lines):
smoke.yml467 → 204: scope decision, pre-deploy state, SCP drift, capture-events, teardown extracted toscripts/smoke-*.shdeploy-blueprints.yml726 → 354: 9 per-scenario synth jobs collapsed to a single matrix; CDK strip + SAM-bucket verify pulled intoscripts/renovate.yml62 → 43: narrative stripped (RENOVATE_TOKENPAT retained per operator decision)T*.*/ tech-spec narrative removed throughoutPre-existing CI unblock:
minLength: 1onGovUkPayApiKeyso smoke deploys (no key) succeed; payment portal degrades cleanlyDeletionPolicy/UpdateReplacePolicy=Snapshot(parity with the prior inline check)CODEOWNERSswitched from@chrisnsto@co-cddo/ndxand broadened to cover the new scripts and per-scenariosmoke.tsfiles.Operator actions taken out-of-band (state already reconciled)
IsbHubStackhad beenUPDATE_ROLLBACK_FAILEDfor 12 days (transientAiContactCentreStackSetissue from May 1); recovered cleanly viacontinue-update-rollback.IsbHubStackviachange-set-type IMPORT. UUIDs preserved → ISB lease bindings intact. The firstcdk deployafter merge will create the 6 missingBucketDeploymentresources and reconcile any property drift on the imported StackSets.s3://ndx-try-isb-blueprints-568672915267/scenarios/. Planx had been attemplate.json(manual May-8 upload); now at the canonicaltemplate.yaml.Test plan
deploy-blueprints.ymlgreen (the matrix + the deploy job, including the 6 newBucketDeploymentcreates from the import reconciliation)smokeworkflow green (templates fresh in S3, LocalgovIms accepts empty key, hub stack healthy)playwright test --project=smoke --listshows 17 tests after checkoutgit diff main -- .github/workflows/*.ymlshows shrunk YAML with no inline scripts > ~15 lines