Skip to content

Commit 3dc854b

Browse files
committed
Add dev deploy workflow
1 parent 07fa30f commit 3dc854b

7 files changed

Lines changed: 1090 additions & 2 deletions

File tree

.github/workflows/deploy-dev.yml

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
name: Deploy Dev
2+
3+
on:
4+
workflow_dispatch:
5+
inputs:
6+
deploy_mode:
7+
description: 'SST action to run'
8+
required: true
9+
type: choice
10+
default: diff-only
11+
options:
12+
- diff-only
13+
- deploy
14+
runner_mode:
15+
description: 'Runner rollout strategy'
16+
required: true
17+
type: choice
18+
default: skip
19+
options:
20+
- skip
21+
- existing-release
22+
- temporary-build
23+
- rollback
24+
runner_ref:
25+
description: 'Runner release version for existing-release, e.g. 0.9.5'
26+
required: false
27+
type: string
28+
confirm:
29+
description: 'Type dev to confirm this targets the dev environment'
30+
required: true
31+
type: string
32+
33+
permissions:
34+
contents: read
35+
id-token: write
36+
37+
concurrency:
38+
group: deploy-dev
39+
cancel-in-progress: false
40+
41+
jobs:
42+
config:
43+
uses: ./.github/workflows/config.yml
44+
45+
deploy:
46+
name: Deploy dev
47+
needs: config
48+
runs-on: ubuntu-latest
49+
environment: dev
50+
env:
51+
AWS_REGION: ${{ vars.AWS_REGION }}
52+
STACK_DOMAIN: ${{ vars.DEV_STACK_DOMAIN }}
53+
RUNNER_ARTIFACT_BUCKET: ${{ vars.DEV_RUNNER_ARTIFACT_BUCKET }}
54+
RUNNER_MANIFEST_PREFIX: ${{ vars.DEV_RUNNER_MANIFEST_PREFIX }}
55+
ADMIN_API_KEY: ${{ secrets.DEV_ADMIN_API_KEY }}
56+
CLOUDFLARE_DEFAULT_ACCOUNT_ID: ${{ vars.CLOUDFLARE_DEFAULT_ACCOUNT_ID }}
57+
CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
58+
OIDC_ISSUER_BASE_URL: ${{ vars.DEV_OIDC_ISSUER_BASE_URL }}
59+
OIDC_CLIENT_ID: ${{ vars.DEV_OIDC_CLIENT_ID }}
60+
OIDC_AUDIENCE: ${{ vars.DEV_OIDC_AUDIENCE }}
61+
PUBLIC_OIDC_DOMAIN: ${{ vars.DEV_PUBLIC_OIDC_DOMAIN }}
62+
PUBLIC_OIDC_AUDIENCE: ${{ vars.DEV_PUBLIC_OIDC_AUDIENCE }}
63+
PUBLIC_OIDC_CLIENT_ID: ${{ vars.DEV_PUBLIC_OIDC_CLIENT_ID }}
64+
GHCR_USERNAME: ${{ vars.GHCR_USERNAME }}
65+
GHCR_TOKEN: ${{ secrets.GHCR_TOKEN }}
66+
POSTHOG_API_KEY: ${{ secrets.DEV_POSTHOG_API_KEY }}
67+
POSTHOG_HOST: ${{ vars.DEV_POSTHOG_HOST }}
68+
CLICKHOUSE_WRITER_ENDPOINT: ${{ vars.DEV_CLICKHOUSE_WRITER_ENDPOINT }}
69+
CLICKHOUSE_WRITER_DATABASE: ${{ vars.DEV_CLICKHOUSE_WRITER_DATABASE }}
70+
CLICKHOUSE_WRITER_USERNAME: ${{ vars.DEV_CLICKHOUSE_WRITER_USERNAME }}
71+
CLICKHOUSE_WRITER_PASSWORD: ${{ secrets.DEV_CLICKHOUSE_WRITER_PASSWORD }}
72+
CLICKHOUSE_CREATE_SCHEMA: ${{ vars.DEV_CLICKHOUSE_CREATE_SCHEMA }}
73+
CLICKHOUSE_COMPRESS: ${{ vars.DEV_CLICKHOUSE_COMPRESS }}
74+
GH_TOKEN: ${{ github.token }}
75+
76+
steps:
77+
- name: Validate dispatch inputs
78+
run: |
79+
set -euo pipefail
80+
test "${{ inputs.confirm }}" = "dev"
81+
if [ "${{ inputs.deploy_mode }}" = "diff-only" ] && [ "${{ inputs.runner_mode }}" != "skip" ]; then
82+
echo "runner rollout requires deploy_mode=deploy" >&2
83+
exit 1
84+
fi
85+
if [ "${{ inputs.runner_mode }}" = "existing-release" ] && [ -z "${{ inputs.runner_ref }}" ]; then
86+
echo "runner_ref is required for existing-release" >&2
87+
exit 1
88+
fi
89+
90+
- name: Checkout
91+
uses: actions/checkout@v4
92+
with:
93+
submodules: recursive
94+
95+
- name: Set up Node.js
96+
uses: actions/setup-node@v4
97+
with:
98+
node-version: ${{ needs.config.outputs.node-build-version }}
99+
100+
- name: Set up Go for temporary runner builds
101+
if: inputs.runner_mode == 'temporary-build'
102+
uses: actions/setup-go@v5
103+
with:
104+
go-version: ${{ needs.config.outputs.go-version }}
105+
106+
- name: Set up Rust for temporary runner builds
107+
if: inputs.runner_mode == 'temporary-build'
108+
uses: actions-rust-lang/setup-rust-toolchain@v1
109+
with:
110+
toolchain: ${{ needs.config.outputs.rust-toolchain }}
111+
112+
- name: Install temporary runner build dependencies
113+
if: inputs.runner_mode == 'temporary-build'
114+
run: sudo apt-get update && sudo apt-get install -y libx11-dev libxtst-dev libxinerama-dev
115+
116+
- name: Configure AWS credentials
117+
uses: aws-actions/configure-aws-credentials@v4
118+
with:
119+
role-to-assume: ${{ vars.BOXLITE_DEV_DEPLOY_ROLE_ARN }}
120+
aws-region: ${{ env.AWS_REGION || 'ap-southeast-1' }}
121+
role-session-name: deploy-dev-${{ github.run_id }}
122+
123+
- name: Install infra dependencies
124+
run: npm ci
125+
working-directory: apps/infra
126+
127+
- name: Deploy dev
128+
env:
129+
DEPLOY_MODE: ${{ inputs.deploy_mode }}
130+
RUNNER_MODE: ${{ inputs.runner_mode }}
131+
RUNNER_REF: ${{ inputs.runner_ref }}
132+
CONFIRM: ${{ inputs.confirm }}
133+
run: |
134+
set -euo pipefail
135+
scripts/deploy/dev-full.sh \
136+
--deploy-mode "$DEPLOY_MODE" \
137+
--runner-mode "$RUNNER_MODE" \
138+
--runner-ref "$RUNNER_REF" \
139+
--stage dev \
140+
--confirm "$CONFIRM"
141+
142+
- name: Write summary
143+
if: always()
144+
run: |
145+
{
146+
echo "### Deploy Dev"
147+
echo
148+
echo "- Branch: \`${GITHUB_REF_NAME}\`"
149+
echo "- Commit: \`${GITHUB_SHA}\`"
150+
echo "- Deploy mode: \`${{ inputs.deploy_mode }}\`"
151+
echo "- Runner mode: \`${{ inputs.runner_mode }}\`"
152+
echo "- Runner ref: \`${{ inputs.runner_ref }}\`"
153+
echo "- Stage: \`dev\`"
154+
echo "- Stack domain: \`${STACK_DOMAIN:-unset}\`"
155+
} >> "$GITHUB_STEP_SUMMARY"

apps/infra/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,11 @@ npx sst shell --stage dev # open shell with SST-linked env vars
208208
npx sst remove --stage dev # destroy everything
209209
```
210210

211+
For the standard dev release path, prefer the GitHub Actions workflow
212+
`Deploy Dev`. It wraps SST deploy plus optional runner rollout modes:
213+
`skip`, `existing-release`, `temporary-build`, and `rollback`. See
214+
[`docs/deploy/dev-deploy.md`](../../docs/deploy/dev-deploy.md).
215+
211216
## Runner lifecycle
212217

213218
The Runner EC2 instance (`tag:Name=boxlite-runner`) holds load-bearing state:

apps/infra/sst.config.ts

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,24 @@ export default $config({
156156
const db = new sst.aws.Postgres('Database', { vpc, instance: 't4g.micro', storage: '20 GB' })
157157
const redis = new sst.aws.Redis('Cache', { vpc, cluster: false }) // NestJS uses SELECT (multi-DB)
158158
const storage = new sst.aws.Bucket('Storage')
159+
const deployArtifactsBucketName = $interpolate`${$app.name}-${$app.stage}-deploy-artifacts-${aws.getCallerIdentityOutput().accountId}-${REGION}`
160+
const deployArtifacts = new aws.s3.Bucket('DeployArtifacts', {
161+
bucket: deployArtifactsBucketName,
162+
forceDestroy: input?.stage !== 'production',
163+
tags: { App: $app.name, Stage: $app.stage, Role: 'deploy-artifacts' },
164+
})
165+
new aws.s3.BucketLifecycleConfigurationV2('DeployArtifactsLifecycle', {
166+
bucket: deployArtifacts.id,
167+
rules: [
168+
{
169+
id: 'expire-runner-temp-artifacts',
170+
status: 'Enabled',
171+
filter: { prefix: 'runner-temp/' },
172+
expiration: { days: 30 },
173+
abortIncompleteMultipartUpload: { daysAfterInitiation: 1 },
174+
},
175+
],
176+
})
159177
const cluster = new sst.aws.Cluster('Cluster', { vpc, forceUpgrade: 'v2' })
160178

161179
// ─── 3. IAM ──────────────────────────────────────────────────────────────
@@ -735,6 +753,7 @@ export default $config({
735753
]).apply(([apiUrl, token, otelEndpoint, ghcrSecretArn]) =>
736754
buildRunnerUserData({ apiUrl, token, otelEndpoint, ghcrSecretArn: ghcrSecretArn || undefined, ghcrUsername }),
737755
)
756+
const runnerTags = (name: string) => ({ App: $app.name, Stage: $app.stage, Role: 'runner', Name: name })
738757

739758
// Runner holds load-bearing box state (/var/lib/boxlite + in-memory
740759
// libkrun VMs). Two Pulumi resource options keep it persistent across
@@ -761,7 +780,7 @@ export default $config({
761780
associatePublicIpAddress: true,
762781
userDataBase64: runnerUserData,
763782
rootBlockDevice: { volumeSize: RUNNER.rootDiskGB },
764-
tags: { Name: 'boxlite-runner' },
783+
tags: runnerTags('boxlite-runner'),
765784
},
766785
{
767786
ignoreChanges: ['ami', 'userDataBase64'],
@@ -801,7 +820,7 @@ export default $config({
801820
buildRunnerUserData({ apiUrl, token, otelEndpoint, ghcrSecretArn: ghcrSecretArn || undefined, ghcrUsername }),
802821
),
803822
rootBlockDevice: { volumeSize: RUNNER.rootDiskGB },
804-
tags: { Name: `boxlite-runner-${name}` },
823+
tags: runnerTags(`boxlite-runner-${name}`),
805824
},
806825
{
807826
ignoreChanges: ['ami', 'userDataBase64'],

docs/deploy/dev-deploy.md

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# Dev Deploy Workflow
2+
3+
`Deploy Dev` is the operator entrypoint for deploying the SST dev stage and,
4+
optionally, rolling out a runner binary to the dev runner EC2 instances.
5+
6+
## User Flow
7+
8+
Open GitHub Actions, select `Deploy Dev`, then click `Run workflow`.
9+
10+
The branch selector is the source checkout. Use `main` for normal dev deploys
11+
or a PR branch for dev validation.
12+
13+
Inputs:
14+
15+
| Input | Values | Meaning |
16+
| ------------- | --------------------------------------------------------- | ------------------------------------------- |
17+
| `deploy_mode` | `diff-only`, `deploy` | Preview SST changes or actually deploy dev. |
18+
| `runner_mode` | `skip`, `existing-release`, `temporary-build`, `rollback` | Runner rollout strategy. |
19+
| `runner_ref` | `0.9.5`, empty | Required only for `existing-release`. |
20+
| `confirm` | `dev` | Required safety confirmation. |
21+
22+
Common runs:
23+
24+
| Goal | Inputs |
25+
| ------------------------------------------------------ | ---------------------------------------------------------------------------- |
26+
| Preview dev infra/service changes | `deploy_mode=diff-only`, `runner_mode=skip` |
27+
| Deploy SST dev only | `deploy_mode=deploy`, `runner_mode=skip` |
28+
| Deploy dev and install a released runner | `deploy_mode=deploy`, `runner_mode=existing-release`, `runner_ref=<version>` |
29+
| Deploy dev and test the selected branch's runner | `deploy_mode=deploy`, `runner_mode=temporary-build` |
30+
| Roll back dev runner to the previous recorded artifact | `deploy_mode=deploy`, `runner_mode=rollback` |
31+
32+
`diff-only` never rolls out a runner.
33+
34+
## Runner Artifact Model
35+
36+
There are three artifact classes:
37+
38+
| Class | Storage | Lifetime | Intended use |
39+
| ------------------- | ------------------------------ | -------------------------------------- | -------------------------------------------------------- |
40+
| Official release | GitHub Release asset | Long-lived | Stage/prod and durable rollbacks. |
41+
| Temporary dev build | Dev deploy-artifacts S3 bucket | Short-lived, default lifecycle 30 days | Dev-only branch validation without publishing a release. |
42+
| Rollout manifest | Dev deploy-artifacts S3 bucket | Long-lived | Current/previous runner deployment record. |
43+
44+
Temporary builds are versioned as:
45+
46+
```text
47+
<Cargo.toml version>-dev.<commit-sha>
48+
```
49+
50+
Dirty local worktree builds get a `-dirty` suffix. The GitHub Action normally
51+
checks out a clean commit, so dev deploys from Actions should not be dirty.
52+
53+
## What The Workflow Does
54+
55+
1. Checks out the selected branch.
56+
2. Confirms `confirm=dev`.
57+
3. Assumes the dev AWS deploy role.
58+
4. Runs `npx sst diff --stage dev`.
59+
5. Runs `npx sst deploy --stage dev` when `deploy_mode=deploy`.
60+
6. Applies the selected runner mode:
61+
- `skip`: no runner changes.
62+
- `existing-release`: checks the GitHub Release asset and installs it.
63+
- `temporary-build`: builds C SDK + daemon + computer-use + runner from the
64+
selected commit, uploads the tarball to S3, presigns it, and installs it.
65+
- `rollback`: reads the previous runner manifest and reinstalls that source.
66+
7. Verifies public API health.
67+
8. If `DEV_ADMIN_API_KEY` is configured, verifies the Admin runner overview.
68+
69+
Runner rollout uses AWS SSM Run Command. It does not SSH into instances and does
70+
not replace EC2 instances. The runner service is stopped, the binary is replaced,
71+
and the service is started again. `/var/lib/boxlite` is untouched.
72+
73+
## Required GitHub Configuration
74+
75+
Create a GitHub Environment named `dev`.
76+
77+
Required variables:
78+
79+
| Name | Purpose |
80+
| ------------------------------- | ----------------------------------------- |
81+
| `BOXLITE_DEV_DEPLOY_ROLE_ARN` | AWS IAM role assumed by the workflow. |
82+
| `DEV_STACK_DOMAIN` | Dev domain, for example `dev.boxlite.ai`. |
83+
| `CLOUDFLARE_DEFAULT_ACCOUNT_ID` | Cloudflare account for SST DNS. |
84+
| `DEV_OIDC_ISSUER_BASE_URL` | OIDC issuer URL required by the API. |
85+
86+
Required secrets:
87+
88+
| Name | Purpose |
89+
| ---------------------- | --------------------------------------- |
90+
| `CLOUDFLARE_API_TOKEN` | Lets SST manage Cloudflare DNS records. |
91+
92+
Optional variables/secrets:
93+
94+
| Name | Purpose |
95+
| ----------------------------------- | ---------------------------------------------------- |
96+
| `DEV_RUNNER_ARTIFACT_BUCKET` | Override the default deploy-artifacts bucket name. |
97+
| `DEV_RUNNER_MANIFEST_PREFIX` | Override the manifest prefix, default `deployments`. |
98+
| `DEV_ADMIN_API_KEY` | Enables Admin runner overview verification. |
99+
| `GHCR_USERNAME`, `GHCR_TOKEN` | Runner image pull credentials when needed. |
100+
| `DEV_POSTHOG_*`, `DEV_CLICKHOUSE_*` | Optional runtime integrations. |
101+
102+
The default artifact bucket name is:
103+
104+
```text
105+
boxlite-dev-deploy-artifacts-<aws-account-id>-ap-southeast-1
106+
```
107+
108+
SST creates this bucket as part of the dev stack. Temporary runner artifacts are
109+
stored under `runner-temp/`; rollout manifests are stored under
110+
`deployments/dev/runner/`.
111+
112+
## Local Fallback
113+
114+
The same flow can run from a Linux deploy host:
115+
116+
```bash
117+
scripts/deploy/dev-full.sh \
118+
--deploy-mode deploy \
119+
--runner-mode skip \
120+
--stage dev \
121+
--confirm dev
122+
```
123+
124+
Use GitHub Actions for `temporary-build`. Local temporary builds require Linux
125+
amd64 because the runner uses CGO.
126+
127+
## Safety Boundaries
128+
129+
- This workflow only supports `stage=dev`.
130+
- Runner rollout requires `deploy_mode=deploy`.
131+
- Runner EC2 discovery requires SST-managed tags:
132+
`App=boxlite`, `Stage=dev`, `Role=runner`.
133+
- Production deploys need a separate workflow with approval and canary rules.

0 commit comments

Comments
 (0)