Skip to content

Commit cf5e293

Browse files
committed
feat(infra): move app secrets to sst.Secret, Cloudflare creds to SSM
7 app secrets (OIDC client id, OIDC management id/secret, PostHog, Svix, SSH host/private keys) now resolve from the SST secret store per-stage instead of the gitignored .env. Cloudflare provider creds cannot be sst.Secret because the provider initializes in app() before run() exists, so scripts/sst-with-cloudflare.mjs loads them from AWS SSM SecureString and exports them before invoking sst; the dev/deploy/remove/sst npm scripts route through it. README gains a Secrets and credentials section and the commands now use npm run. Also fixes a non-ASCII em-dash in the RunnerSecurityGroup description that EC2 rejects.
1 parent f394033 commit cf5e293

4 files changed

Lines changed: 197 additions & 36 deletions

File tree

apps/infra/README.md

Lines changed: 63 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -24,18 +24,63 @@ EC2 runner with nested KVM, RDS Postgres, ElastiCache Redis, S3, CloudFront.
2424

2525
```bash
2626
cd apps/infra
27-
cp .env.example .env
28-
# Fill in STACK_DOMAIN, CLOUDFLARE_*, OIDC_* (see .env.example comments)
2927
npm install
30-
npx sst deploy --stage dev
28+
cp .env.example .env # non-secret config: STACK_DOMAIN, OIDC_ISSUER_BASE_URL, OIDC_AUDIENCE
29+
30+
# Cloudflare provider credentials live in SSM (per stage) — see "Secrets & credentials":
31+
aws ssm put-parameter --region ap-southeast-1 --type SecureString \
32+
--name /boxlite/dev/cloudflare-api-token --value "<token>"
33+
aws ssm put-parameter --region ap-southeast-1 --type SecureString \
34+
--name /boxlite/dev/cloudflare-account-id --value "<account-id>"
35+
36+
npm run deploy -- --stage dev # the wrapper loads the Cloudflare creds, then runs sst deploy
3137
```
3238

39+
App secrets (SSH keys, Auth0 Management API, Svix, PostHog) are optional and set
40+
per-stage in the SST secret store — see [Secrets & credentials](#secrets--credentials).
41+
3342
First deploy: 10–15 minutes. Output prints service URLs + CloudFront domain.
3443

3544
If the build fails with a transient `auth.docker.io` EOF or Debian mirror
36-
`502 Bad Gateway`, just rerun `npx sst deploy --stage dev` — SST resumes
45+
`502 Bad Gateway`, just rerun `npm run deploy -- --stage dev` — SST resumes
3746
from the failed step.
3847

48+
## Secrets & credentials
49+
50+
Three homes, one access gate — **AWS IAM**. Nothing secret lives in git or a
51+
single laptop's `.env`:
52+
53+
| What | Where | Set with |
54+
|---|---|---|
55+
| **App secrets** — SSH host/private keys, Auth0 Management API id + secret, `SVIX_AUTH_TOKEN`, `POSTHOG_API_KEY`, `OIDC_CLIENT_ID` | SST secret store (encrypted in SST state, per stage) | `sst secret set <NAME> "<value>" --stage <stage>` |
56+
| **Cloudflare provider creds**`CLOUDFLARE_API_TOKEN`, `CLOUDFLARE_DEFAULT_ACCOUNT_ID` | AWS SSM (`SecureString`, per stage) | `aws ssm put-parameter --type SecureString --name /boxlite/<stage>/cloudflare-…` |
57+
| **Non-secret config**`STACK_DOMAIN`, `OIDC_ISSUER_BASE_URL`, `OIDC_AUDIENCE`, toggles | local `.env` (gitignored) | edit `.env` |
58+
59+
The Cloudflare creds can't be `sst.Secret`: the provider initializes in `app()`
60+
before `run()` (where secrets exist), so it reads them from the environment.
61+
`scripts/sst-with-cloudflare.mjs` — wired into `npm run dev`/`deploy`/`remove`
62+
and `npm run sst` — fetches them from SSM and exports them before invoking sst.
63+
**Run sst through these npm scripts**, not bare `npx sst`, so the creds load.
64+
65+
### App secrets
66+
67+
```bash
68+
sst secret set SVIX_AUTH_TOKEN "<value>" --stage dev # set one
69+
sst secret load .env --stage dev # bulk-load a dotenv (names match 1:1)
70+
npm run secrets -- --stage dev # list what's set
71+
```
72+
73+
Secret names match the env keys the services expect. Unset optional secrets
74+
resolve to empty (feature off); `OIDC_CLIENT_ID` defaults to `boxlite`. A changed
75+
value takes effect on the next `npm run deploy`.
76+
77+
### Onboarding / offboarding
78+
79+
Access is **AWS IAM only**: anyone who can deploy (read SST state + SSM, run
80+
`sst deploy`) can read every secret. Onboard by granting that AWS access;
81+
offboard by revoking it. There's no secret file or vault to hand over. Secret
82+
values and the SSM params are **per-stage** — seed each stage you run.
83+
3984
## After first deploy
4085

4186
Nothing needs to be fed back into `.env`. The runner EC2 self-registers with the
@@ -50,7 +95,7 @@ count and redeploy:
5095

5196
```bash
5297
echo "RUNNERS=3" >> .env # default runner (#1) + runner-2 + runner-3
53-
npx sst deploy --stage dev
98+
npm run deploy -- --stage dev
5499
```
55100

56101
Each extra runner gets its own EC2 + minted token. Because the API only
@@ -192,20 +237,23 @@ For Auth0 specifically:
192237
| **ClickHouse Cloud** | Managed OTel storage | external service; configured by env |
193238
| **ClickStack** | Logs/traces/metrics explorer | external ClickHouse Cloud UI |
194239

195-
Run `npx sst deploy --stage dev` without changes to reprint all URLs. See
240+
Run `npm run deploy -- --stage dev` without changes to reprint all URLs. See
196241
[Public hostnames](#public-hostnames) below for the rationale behind the
197242
dashboard-vs-API split.
198243

199244
## Common commands
200245

201246
```bash
202-
npx sst deploy --stage dev # deploy / update
203-
npx sst diff --stage dev # preview changes
204-
npx sst unlock --stage dev # recover from "concurrent update detected"
205-
npx sst shell --stage dev # open shell with SST-linked env vars
206-
npx sst remove --stage dev # destroy everything
247+
npm run deploy -- --stage dev # deploy / update
248+
npm run sst -- diff --stage dev # preview changes
249+
npm run sst -- unlock --stage dev # recover from "concurrent update detected"
250+
npm run sst -- shell --stage dev # open shell with SST-linked env vars
251+
npm run remove -- --stage dev # destroy everything
207252
```
208253

254+
> These route through `scripts/sst-with-cloudflare.mjs` so the Cloudflare provider
255+
> creds load from SSM. Bare `npx sst …` skips that and can't reach Cloudflare.
256+
209257
## Runner lifecycle
210258

211259
The Runner EC2 instance (`tag:Name=boxlite-runner`) holds load-bearing state:
@@ -247,7 +295,7 @@ operation by design:
247295
1. Verify no `running` boxes are pinned to this Runner (DB query against
248296
`box.runnerId`).
249297
2. Edit `sst.config.ts`: change `protect: true` to `protect: false` on the
250-
Runner resource. Run `npx sst deploy --stage <stage>`. This only updates
298+
Runner resource. Run `npm run deploy -- --stage <stage>`. This only updates
251299
the resource metadata; the EC2 is not yet touched.
252300
3. Destroy the EC2:
253301

@@ -256,7 +304,7 @@ operation by design:
256304
```
257305

258306
4. Edit `sst.config.ts`: change `protect: false` back to `protect: true`. Run
259-
`npx sst deploy` again — a new Runner is created with fresh state.
307+
`npm run deploy` again — a new Runner is created with fresh state.
260308

261309
This is deliberate by construction: three code edits across two deploys. If
262310
you find yourself doing this often, look at the future drain API (tracked
@@ -307,7 +355,7 @@ Auth: OIDC provider (Auth0/Okta/Keycloak/Dex/…) ← Api validates JWT via JWKS
307355

308356
## Troubleshooting
309357

310-
**"concurrent update detected"** — run `npx sst unlock --stage dev` and retry.
358+
**"concurrent update detected"** — run `npm run sst -- unlock --stage dev` and retry.
311359

312360
**Service stuck at `rolloutState: FAILED` with 1 running task** — stale event
313361
from an earlier failed deploy. If `runningCount == desiredCount` the service
@@ -375,6 +423,5 @@ initial setup: `aws ecs update-service --force-new-deployment --service Proxy`.
375423
| **Total** | **~$570** |
376424

377425
Figures are approximate (ap-southeast-1 on-demand). The **Runner and the load
378-
balancers dominate** — the NAT is ~$16, not a headline cost. `npx sst remove
379-
--stage dev` tears it all down; S3 buckets and RDS snapshots are retained in
426+
balancers dominate** — the NAT is ~$16, not a headline cost. `npm run remove -- --stage dev` tears it all down; S3 buckets and RDS snapshots are retained in
380427
production stage (`--stage production`) per SST's default.

apps/infra/package.json

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,11 @@
44
"private": true,
55
"type": "module",
66
"scripts": {
7-
"dev": "sst dev",
8-
"deploy": "sst deploy",
9-
"remove": "sst remove"
7+
"dev": "node scripts/sst-with-cloudflare.mjs dev",
8+
"deploy": "node scripts/sst-with-cloudflare.mjs deploy",
9+
"remove": "node scripts/sst-with-cloudflare.mjs remove",
10+
"sst": "node scripts/sst-with-cloudflare.mjs",
11+
"secrets": "sst secret list"
1012
},
1113
"devDependencies": {
1214
"@pulumi/aws": "^7.24.0",
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
// SPDX-License-Identifier: AGPL-3.0-only
2+
// Copyright (c) 2026 BoxLite AI
3+
4+
/*
5+
* Run an `sst` command with the Cloudflare provider credentials loaded.
6+
*
7+
* The Cloudflare provider initializes inside `app()` (sst.config.ts), before
8+
* `run()` exists — so its credentials can't be `sst.Secret` like the app
9+
* secrets. They live in AWS SSM Parameter Store (SecureString) instead, keyed
10+
* per stage, and this wrapper fetches + exports them just before invoking sst.
11+
* App secrets are NOT handled here; sst resolves those from its own store.
12+
*
13+
* Wired into the dev/deploy/remove npm scripts, plus a passthrough:
14+
* npm run deploy -- --stage dev → node sst-with-cloudflare.mjs deploy --stage dev
15+
* npm run sst -- diff --stage dev → any other subcommand
16+
*
17+
* One access gate: a deployer already needs AWS credentials to deploy, so the
18+
* same credentials fetch the Cloudflare token from SSM — nothing extra to share.
19+
*
20+
* Seed the parameters once per stage:
21+
* aws ssm put-parameter --region ap-southeast-1 --type SecureString \
22+
* --name /boxlite/<stage>/cloudflare-api-token --value "<token>"
23+
* aws ssm put-parameter --region ap-southeast-1 --type SecureString \
24+
* --name /boxlite/<stage>/cloudflare-account-id --value "<account-id>"
25+
*
26+
* A credential already in the environment is used as-is (works offline / before
27+
* the params are seeded). Missing creds are a warning, not a hard stop: commands
28+
* that don't touch Cloudflare (e.g. `unlock`) still run, and sst surfaces its own
29+
* error for one that does.
30+
*/
31+
32+
import { execFileSync, spawnSync } from 'node:child_process'
33+
34+
const REGION = process.env.AWS_REGION || 'ap-southeast-1'
35+
36+
// SSM param consulted only when the matching env var is unset.
37+
const CREDS = [
38+
{ env: 'CLOUDFLARE_API_TOKEN', param: 'cloudflare-api-token' },
39+
{ env: 'CLOUDFLARE_DEFAULT_ACCOUNT_ID', param: 'cloudflare-account-id' },
40+
]
41+
42+
const sstArgs = process.argv.slice(2)
43+
if (sstArgs.length === 0) {
44+
console.error('sst-with-cloudflare: expected an sst subcommand (e.g. "deploy --stage dev")')
45+
process.exit(1)
46+
}
47+
48+
// Resolve the stage from the sst args (--stage x or --stage=x); the SSM path is
49+
// per-stage. Falls back to SST_STAGE then "dev".
50+
function resolveStage(args) {
51+
for (let i = 0; i < args.length; i++) {
52+
if (args[i] === '--stage' && args[i + 1]) return args[i + 1]
53+
const m = args[i].match(/^--stage=(.+)$/)
54+
if (m) return m[1]
55+
}
56+
return process.env.SST_STAGE || 'dev'
57+
}
58+
59+
function fetchFromSsm(name) {
60+
try {
61+
const out = execFileSync(
62+
'aws',
63+
['ssm', 'get-parameter', '--region', REGION, '--name', name, '--with-decryption', '--query', 'Parameter.Value', '--output', 'text'],
64+
{ encoding: 'utf8', stdio: ['ignore', 'pipe', 'pipe'] },
65+
).trim()
66+
return out && out !== 'None' ? out : null
67+
} catch (err) {
68+
if (err.code === 'ENOENT') console.warn('sst-with-cloudflare: `aws` CLI not found; skipping SSM lookup')
69+
return null // ParameterNotFound / auth error → warn below and let sst decide
70+
}
71+
}
72+
73+
const stage = resolveStage(sstArgs)
74+
75+
for (const { env, param } of CREDS) {
76+
if (process.env[env]) continue // already provided — don't touch
77+
const name = `/boxlite/${stage}/${param}`
78+
const value = fetchFromSsm(name)
79+
if (value) {
80+
process.env[env] = value
81+
} else {
82+
console.warn(
83+
`sst-with-cloudflare: ${env} not in env and ${name} not in SSM (${REGION}); ` +
84+
`seed it with: aws ssm put-parameter --region ${REGION} --type SecureString --name ${name} --value <...>`,
85+
)
86+
}
87+
}
88+
89+
// node_modules/.bin is on PATH because this runs via `npm run`, so `sst` resolves.
90+
const result = spawnSync('sst', sstArgs, { stdio: 'inherit', env: process.env })
91+
if (result.error) {
92+
console.error(`sst-with-cloudflare: failed to launch sst: ${result.error.message}`)
93+
process.exit(1)
94+
}
95+
process.exit(result.status ?? 1)

apps/infra/sst.config.ts

Lines changed: 34 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,22 @@ export default $config({
157157
const defaultRunnerApiKey = randomKey('DefaultRunnerApiKey')
158158
const pgAdminPassword = randomKey('PgAdminPassword', 24)
159159

160+
// App secrets — set via `sst secret set <NAME> --stage <stage>` (or bulk
161+
// `sst secret load <dotenv>`); stored encrypted in SST state and shared
162+
// per-stage by anyone with deploy access. Names match the env keys so a
163+
// dotenv `sst secret load` maps 1:1. Optional ones carry an empty-string
164+
// placeholder, so "unset" reads as '' — the same "empty = off" contract the
165+
// SSH keys already relied on. NB: the Cloudflare provider creds can't live
166+
// here (the provider initializes in app() before run() exists); they're
167+
// injected from SSM by scripts/sst-with-cloudflare.mjs.
168+
const oidcClientId = new sst.Secret('OIDC_CLIENT_ID', 'boxlite')
169+
const oidcMgmtClientId = new sst.Secret('OIDC_MANAGEMENT_API_CLIENT_ID')
170+
const oidcMgmtClientSecret = new sst.Secret('OIDC_MANAGEMENT_API_CLIENT_SECRET')
171+
const posthogApiKey = new sst.Secret('POSTHOG_API_KEY', '')
172+
const svixAuthToken = new sst.Secret('SVIX_AUTH_TOKEN', '')
173+
const sshPrivateKey = new sst.Secret('SSH_PRIVATE_KEY_B64', '')
174+
const sshHostKey = new sst.Secret('SSH_HOST_KEY_B64', '')
175+
160176
// ─── 2. PLATFORM ─────────────────────────────────────────────────────────
161177
// Network model + rationale (subnets / NAT / egress-only public IP, AWS citations): ./NETWORKING.md
162178
// NAT instance (fck-nat, ~10× cheaper than a managed NAT Gateway). The Fargate
@@ -487,7 +503,7 @@ export default $config({
487503
ENCRYPTION_SALT: envOr('ENCRYPTION_SALT', encryptionSalt.result),
488504

489505
// OIDC — external provider (Auth0/Okta/etc.)
490-
OIDC_CLIENT_ID: envOr('OIDC_CLIENT_ID', 'boxlite'),
506+
OIDC_CLIENT_ID: oidcClientId.value,
491507
OIDC_AUDIENCE: envOr('OIDC_AUDIENCE', 'boxlite'),
492508
OIDC_ISSUER_BASE_URL: requireOidcIssuer(),
493509
...(process.env.PUBLIC_OIDC_DOMAIN && {
@@ -496,8 +512,12 @@ export default $config({
496512
// Optional: Auth0 Management API (enables account linking etc.)
497513
...(process.env.OIDC_MANAGEMENT_API_ENABLED === 'true' && {
498514
OIDC_MANAGEMENT_API_ENABLED: 'true',
499-
OIDC_MANAGEMENT_API_CLIENT_ID: requireEnv('OIDC_MANAGEMENT_API_CLIENT_ID', 'when OIDC_MANAGEMENT_API_ENABLED=true'),
500-
OIDC_MANAGEMENT_API_CLIENT_SECRET: requireEnv('OIDC_MANAGEMENT_API_CLIENT_SECRET', 'when OIDC_MANAGEMENT_API_ENABLED=true'),
515+
// Client id/secret come from the SST secret store now. If the feature
516+
// is enabled but a secret is unset, the value resolves to '' and the
517+
// Api errors at runtime — instead of the old deploy-time requireEnv
518+
// throw (Output values can't be guarded at config-build time).
519+
OIDC_MANAGEMENT_API_CLIENT_ID: oidcMgmtClientId.value,
520+
OIDC_MANAGEMENT_API_CLIENT_SECRET: oidcMgmtClientSecret.value,
501521
OIDC_MANAGEMENT_API_AUDIENCE: requireEnv('OIDC_MANAGEMENT_API_AUDIENCE', 'when OIDC_MANAGEMENT_API_ENABLED=true'),
502522
}),
503523
// RP-initiated logout fallback. Safe to set unconditionally: the API
@@ -608,17 +628,14 @@ export default $config({
608628
DEFAULT_RUNNER_API_URL: runnerEndpoint('DEFAULT_RUNNER_API_URL', PORTS.RUNNER, 'http://'),
609629
DEFAULT_RUNNER_PROXY_URL: runnerEndpoint('DEFAULT_RUNNER_PROXY_URL', PORTS.PROXY, 'http://'),
610630

611-
// PostHog (enables the dashboard's "Create Box" feature flag)
612-
...(process.env.POSTHOG_API_KEY && {
613-
POSTHOG_API_KEY: process.env.POSTHOG_API_KEY,
614-
POSTHOG_HOST: envOr('POSTHOG_HOST', 'https://us.posthog.com'),
615-
}),
631+
// PostHog (enables the dashboard's "Create Box" feature flag). Token is a
632+
// secret (empty = off); host stays plain config.
633+
POSTHOG_API_KEY: posthogApiKey.value,
634+
POSTHOG_HOST: envOr('POSTHOG_HOST', 'https://us.posthog.com'),
616635

617-
// Svix (webhook delivery; without this dashboard logs cosmetic errors)
618-
...(process.env.SVIX_AUTH_TOKEN && {
619-
SVIX_AUTH_TOKEN: process.env.SVIX_AUTH_TOKEN,
620-
...(process.env.SVIX_SERVER_URL && { SVIX_SERVER_URL: process.env.SVIX_SERVER_URL }),
621-
}),
636+
// Svix (webhook delivery; empty token = off → dashboard logs cosmetic errors)
637+
SVIX_AUTH_TOKEN: svixAuthToken.value,
638+
...(process.env.SVIX_SERVER_URL && { SVIX_SERVER_URL: process.env.SVIX_SERVER_URL }),
622639
},
623640
})
624641

@@ -678,7 +695,7 @@ export default $config({
678695
PROXY_API_KEY: envOr('PROXY_API_KEY', proxyApiKey.result),
679696
// api-client-go appends paths like "/config" directly → include /api suffix
680697
BOXLITE_API_URL: $interpolate`${stripTrailingSlash(api.url)}/api`,
681-
OIDC_CLIENT_ID: envOr('OIDC_CLIENT_ID', 'boxlite'),
698+
OIDC_CLIENT_ID: oidcClientId.value,
682699
OIDC_AUDIENCE: envOr('OIDC_AUDIENCE', 'boxlite'),
683700
OIDC_DOMAIN: requireOidcIssuer(),
684701
},
@@ -698,8 +715,8 @@ export default $config({
698715
// must use the API base path rather than the raw ALB root.
699716
API_URL: $interpolate`${stripTrailingSlash(api.url)}/api`,
700717
API_KEY: envOr('SSH_GATEWAY_API_KEY', sshGatewayApiKey.result), // NB: not SSH_GATEWAY_API_KEY
701-
SSH_PRIVATE_KEY: envOr('SSH_PRIVATE_KEY_B64', ''),
702-
SSH_HOST_KEY: envOr('SSH_HOST_KEY_B64', ''),
718+
SSH_PRIVATE_KEY: sshPrivateKey.value,
719+
SSH_HOST_KEY: sshHostKey.value,
703720
},
704721
})
705722

@@ -830,7 +847,7 @@ export default $config({
830847
// nothing on the internet can reach the runner.
831848
const runnerSecurityGroup = new aws.ec2.SecurityGroup('RunnerSecurityGroup', {
832849
vpcId: vpc.nodes.vpc.id,
833-
description: 'BoxLite runner inbound only on the runner API port from within the VPC',
850+
description: 'BoxLite runner - inbound only on the runner API port from within the VPC',
834851
ingress: [
835852
{
836853
protocol: 'tcp',

0 commit comments

Comments
 (0)