What: Cross-repo orchestration for the PATCH VMP (Vulnerability
Mitigation Platform) demo. TapirXL, BlueFlow, and Viper are all TA1 components
of the same VMP; this repo is the glue that runs them together. It owns the
compose.yaml, the synthetic PCAP, the replay image, and the demo runbook.
It does not own application code — each TA1 service ships its own
demo-<semver> image that this repo pins exactly.
Source of truth (external):
TapirXL/docs/ARCHITECTURE.md— log shipper / VRL / wire contract (§12)TapirXL/.cursor/context/demo_playbook.md— the canonical compose shape (§4) and runbook (§5–§9)TapirXL/.cursor/context/demo_critical_path.md— phased delivery items (A/B/C series)TapirXL/.cursor/context/state_of_the_union.md— decisions D1–D7
This file distills what an agent working in the demo repo must know. When
behaviour disagrees with PLAYBOOK.md (local runbook) or TapirXL's
demo_playbook.md (compose shape), the playbook wins; update this file in the
same PR.
demo/
├── compose.yaml # Phase 1 + Phase 2 (replay behind `live` profile)
├── compose.override.yaml # optional dev tweaks (gitignored or local)
├── .env.example # BLUEFLOW_API_TOKEN, image tag versions
├── pcap/
│ └── synthetic_philips_demo.pcap # canonical demo capture (committed)
├── replay/
│ ├── Dockerfile # alpine + tcpreplay
│ └── entrypoint.sh # tcpreplay --intf1=eth0 ...
├── init/ # host + container scripts (see init/README.md)
├── justfile # thin recipes → init/*.sh on host
├── PLAYBOOK.md # operator-facing runbook
└── README.md # quick start (points to PLAYBOOK.md)
The compose file is the canonical reference; anything else is convenience.
# First-time setup
cp .env.example .env # fill image tags + BLUEFLOW_API_TOKEN
docker compose pull # pull all pinned demo-<tag> images
# Phase 1 (engineering smoke; PCAP one-shot)
just parse # optional: PCAP → JSON only (no upload)
just boot # start stack + seed BlueFlow
just capture # one-shot PCAP ingest
# Phase 2 pre-flight (VIPER_API_KEY — manual paste, not a just recipe)
docker compose exec viper npm run db:create-test-api-key
export VIPER_API_KEY=<key printed above>
just integrate # registers BlueFlow ↔ Viper webhook (§5.3)
just demo # live replay + tapirxl listener
# Teardown
just fresh # docker compose --profile live down --volumesRequired tools:
docker≥ 24 with Compose v2 pluginjust,curl,jq(for §5.3 integration registration and healthchecks)
TapirXL and BlueFlow are pulled as published images from virtalabsinc/* on Docker Hub. Pin exact tags in .env; never use latest. Tag scheme is demo-<semver> (e.g. demo-0.3.4 for BlueFlow, demo-0.3.1 for TapirXL). Viper, viper-psql, and inngest are currently built from local source (viper repo); replay is built from this repo.
| Service | Image / Source | Role | Pinned where |
|---|---|---|---|
blueflow-psql |
postgres:16-alpine |
BlueFlow Postgres | upstream tag |
blueflow-redis |
redis:7-alpine |
BlueFlow Celery broker + result backend | upstream tag |
blueflow |
virtalabsinc/blueflow:demo-<semver> |
Django REST API; mounts /api/assets/upsert/. Auto-creates admin/admin. |
.env: BLUEFLOW_TAG |
blueflow-worker |
virtalabsinc/blueflow:demo-<semver> |
Celery worker (not started by just boot/just demo; CELERY_TASK_ALWAYS_EAGER=True in dev runs the task inside blueflow) |
same as blueflow |
viper-psql |
built from source (viper repo docker/db/) |
Viper Postgres | local build |
viper |
built from source (viper repo docker/viper/) |
Viper UI (Better-Auth, Next.js) | local build |
inngest |
built from source (viper repo docker/inngest/) |
Inngest dev server; drives Viper sync cron and triggered syncs | local build |
tapirxl |
virtalabsinc/tapirxl:demo-<semver> |
Parser + Vector shipper. cap_add: [NET_ADMIN] |
.env: TAPIRXL_TAG |
replay |
built here (replay/Dockerfile) |
tcpreplay sidecar; shares tapirxl's netns. profiles: ["live"] |
local build |
Image origins:
- TapirXL: built and pushed by
TapirXL/.github/workflows/release.ymlon annotated tagdemo-v<semver>. - BlueFlow: other TA1 team; coordinate version bumps in advance.
- Viper /
viper-psql/inngest: built from local viper repo source. Coordinate path and tag when a published image ships. replay: this repo,replay/Dockerfile. Ship asvirtalabsinc/replay:demo-<semver>once stable.
The static demo token is the linchpin between TapirXL and BlueFlow. Both sides must read the same value at startup.
.env .env
BLUEFLOW_API_TOKEN=demo-XXX BLUEFLOW_API_TOKEN=demo-XXX
│ │
▼ ▼
blueflow service tapirxl service
ENV: API_TOKEN=$BLUEFLOW_API_TOKEN ENV: BLUEFLOW_TOKEN=$BLUEFLOW_API_TOKEN
│ │
▼ ▼
bootstrap script: Vector http sink:
if [ -n "$BLUEFLOW_API_TOKEN" ]; then Authorization: Token $BLUEFLOW_TOKEN
use it (configs/upload-vector.toml inside image)
else
generate via Token.objects.get_or_create
Compose snippet:
blueflow:
environment:
API_TOKEN: ${BLUEFLOW_API_TOKEN} # passed to bootstrap.sh
tapirxl:
environment:
BLUEFLOW_TOKEN: ${BLUEFLOW_API_TOKEN} # consumed by VectorN3 below is the binding rule: never split these into two values.
Single bridge network clinical_demo. Service names resolve via Docker DNS;
no static IPs unless the audience demands them (the reference compose at
TapirXL/.cursor/context/docker-compose.yaml shows the static-IP variant).
┌────────────────── clinical_demo (bridge) ────────────────────┐
│ │
│ replay ─netns──► tapirxl ──http──► blueflow ──celery─► │
│ (live profile) (eth0) :8000 blueflow- │
│ │ worker │
│ ▼ │ │
│ blueflow-psql │ │
│ blueflow-redis │ │
│ ▼ │
│ viper :3000 ◄──┘ │
│ │ │
│ ▼ │
│ viper-psql │
└──────────────────────────────────────────────────────────────┘
host:8000 → blueflow API
host:3000 → viper UI
replayusesnetwork_mode: "service:tapirxl"— packets land on tapirxl's eth0.tapirxlis the only service withcap_add: [NET_ADMIN]. Required for raw-socket capture (Phase 2).- Host-published ports only on
blueflow:8000andviper:3000.
| Phase | Audience | Mode | Command |
|---|---|---|---|
| 1 | Engineering / QA | TAPIRXL_MODE=pcap (one-shot mounted PCAP) |
just capture |
| 2 | Audience | TAPIRXL_MODE=live + replay profile (live netns capture) |
just demo |
Phase 1 must pass cleanly before Phase 2 work begins. Phase 1 does not need
the replay service, the §5.3 BlueFlow ↔ Viper webhook, or any Viper UI walk-through.
Phase 2 narrative: PLAYBOOK.md Phase 2. Pre-flight is just integrate
(§5.3 Steps A–C); do not start Phase 2 until those run cleanly on the current
volume set.
| # | Rule |
|---|---|
| N1 | Pin exact demo-<semver> tags in compose.yaml. Never latest. Demo image versions are listed in .env; CI / just targets must read them from there, not hardcode. |
| N2 | All VMP component images are consumed as published images. Do not build: TapirXL inside this repo. The only image built locally is replay. |
| N3 | BLUEFLOW_API_TOKEN from .env is the authentication seam. It populates blueflow.environment.API_TOKEN and tapirxl.environment.BLUEFLOW_TOKEN from the same value. Do not introduce a second token, do not generate one at runtime, do not commit it as a literal. |
| N4 | cap_add: [NET_ADMIN] lives on tapirxl only. Do not add it to replay; the shared netns inherits it. |
| N5 | replay is gated by profiles: ["live"] so docker compose up (default) brings up the Phase 1 stack only. Do not change this without updating just capture. |
| N6 | The PCAP under pcap/synthetic_philips_demo.pcap is the canonical fixture. Replacing it requires regenerating golden_synthetic_philips_assets.jsonl in TapirXL — coordinate the bump there first. |
| N7 | The compose file does not mutate TapirXL behaviour by remounting Vector configs. The tapirxl:demo-<tag> image bakes upload-vector.toml (long-running) and upload-vector.stdin.toml (one-shot, was upload-vector.pcap.toml in demo-0.3.0); rely on TAPIRXL_MODE to select the right one. |
| N8 | tapirxl mounts the PCAP read-only (:ro). The container runs as uid 10001 — never chmod/chown the host directory to fix mount perms; either rely on world-readable bits or fix the file owner. |
| N9 | Healthchecks are required on blueflow-psql, blueflow-redis, blueflow, viper-psql, viper. depends_on: { ...: { condition: service_healthy } } is the merge gate, not optimistic ordering. |
| N10 | Volumes for state are named (tapirxl-spool, blueflow-pgdata, viper-pgdata). Anonymous volumes are forbidden — docker compose down --volumes must reliably reset the demo. |
| N11 | Phase 2 audience demo requires §5.3 (BlueFlow ↔ Viper webhook registration) to run before ingest. Encode this as just integrate; never expect the presenter to run curl by hand. |
| N12 | latest-tagged images may exist on Docker Hub for convenience, but the demo compose.yaml MUST use the immutable demo-<semver> tag. CI guard: grep -E ':latest' compose.yaml must return empty. |
These are the binding promises external images make to the demo. Failures here are upstream bugs, not compose tweaks.
- Single ENTRYPOINT switching on
$TAPIRXL_MODE:pcap(default) →tapirxl parse $TAPIRXL_PCAP_PATH --json | vector --config-toml /etc/vector/upload-vector.stdin.toml, then exits.live→ long-running raw-socket capture on$TAPIRXL_INTERFACE. Verified working indemo-0.3.1.
- Bakes Vector configs in
/etc/vector/:upload-vector.toml(compose long-running, file source)upload-vector.stdin.toml(one-shot, stdin source — wasupload-vector.pcap.tomlindemo-0.3.0)upload-vector.vrl(shared translation;$UPLOAD_VECTOR_VRL_PATHalready set)
- Reads from env:
BLUEFLOW_URL,BLUEFLOW_TOKEN,TAPIRXL_PCAP_PATH(pcap mode),TAPIRXL_INTERFACE(live mode). - Non-root user uid 10001.
Authoritative version of this contract: TapirXL/packaging/docker/README.md "Unified demo image" + demo_playbook.md §4.
- Auto-creates
admin/adminon first boot (DEFAULT_USERNAME+DEFAULT_PASSWORD). - Runs migrations when
RUN_MIGRATIONS=1. - Honors
API_TOKENenv var: if set, the bootstrap installs that exact value as theadminuser's DRF token (Token.objects.get_or_create); otherwise generates one and prints to stdout. The demo requires the env-var path so the token is deterministic across container restarts (N3). - Bundles the
redisPython package (sincedemo-0.3.1; theuv pip install redisshim is removed). - Exposes
PUT /api/assets/upsert/accepting theAssetUpsertPayloadshape fromTapirXL/configs/upload-vector.vrl. Must acceptAuthorization: Token <hex>(DRF, not Bearer) andContent-Type: application/json. viper_webhookCelery task (CELERY_TASK_ALWAYS_EAGER=Truein dev): serializessinceto ISO-8601 forrequests.post(json=...)(B4 fixed). OutgoingintegrationUploadpages useitems[]with camelCase keys (ip,upstreamApi,vendorId,status: Active, …) matching Viper's contract (B5 fixed). Do not bind-mountinit/blueflow-patches/tasks.py.- Healthcheck on
/api/.
Older tags (demo-0.3.0–demo-0.3.1) required a tasks.py bind-mount for Phase 2; see .claude/BLUEFLOW_BUGS.md.
- Better-Auth + Next.js UI on port 3000.
- tRPC endpoint
/api/trpc/integrations.createfor §5.3 Step A. - REST endpoint
/api/v1/assets/integrationUploadfor BlueFlow's webhook callback (§5.3 Step B). - Healthcheck on
/api/.
- ~20 MB image:
alpine+tcpreplay. - Runs
tcpreplay --intf1=eth0 --loop=$REPLAY_LOOP --multiplier=$REPLAY_RATE $REPLAY_PCAP. - Exits
0whenREPLAY_LOOP=0(one pass); loops forever when1. - Reads
REPLAY_PCAP,REPLAY_RATE,REPLAY_LOOPfrom env. - No NET_ADMIN of its own (inherits from shared netns).
docker compose pull
docker compose up -d \
blueflow-psql blueflow-redis blueflow \
viper-psql viper
# Wait for healthchecks.
docker compose ps # all should show (healthy)
# Sanity-check BlueFlow + auth.
curl -sS -H "Authorization: Token $BLUEFLOW_API_TOKEN" \
http://127.0.0.1:8000/api/assets/ | jq .count # expect 0
# One-shot ingest (TAPIRXL_MODE=pcap is the default in the image).
docker compose run --rm tapirxl
# Verify upsert.
curl -sS -H "Authorization: Token $BLUEFLOW_API_TOKEN" \
http://127.0.0.1:8000/api/assets/ | jq '.count, .results[].display_name'Pass criteria (matches TapirXL CI smoke):
- 8 ×
201 Createdon first run, 8 ×200 OKon re-run (idempotent). - All MAC addresses present in
GET /api/assets/, withmanufacturer,model,category,open_ports_tcp,external_keys.tapirxl_confidencepopulated.
Pre-flight (§5.3) is mandatory before ingest. After Phase 2 boot:
docker compose --profile live up -d tapirxl replay
docker compose logs -f tapirxl blueflowAudience-visible state changes:
- BlueFlow
GET /api/assets/populates as TapirXL classifies. - Viper UI at
http://127.0.0.1:3000/mirrors BlueFlow within seconds (Celery webhook).
Manual sync is forbidden during the talk track — if Viper stays empty, the integration registration (§5.3) is the first place to look, not TapirXL.
| Symptom | Likely cause | Fix |
|---|---|---|
tapirxl exits with 401 |
BLUEFLOW_API_TOKEN mismatch between blueflow and tapirxl |
Both must reference ${BLUEFLOW_API_TOKEN} from .env (N3) |
tapirxl exits with 415 Unsupported Media Type |
Stale demo image without explicit Content-Type header |
docker compose pull tapirxl |
Vector logs failed to lookup address information: blueflow |
Services not on the same network | docker network inspect tapirxl-demo_default; both must be members |
| Phase 1 container hangs (180s+) | Compose accidentally mounting upload-vector.toml (file source) over the stdin config |
N7: never override the image's baked configs in pcap mode |
just capture exits 78 with no output |
init/tapirxl-pretty-ingest.sh is referencing an old config name. In demo-0.3.1 the one-shot config is upload-vector.stdin.toml (renamed from upload-vector.pcap.toml). |
Ensure the script references /etc/vector/upload-vector.stdin.toml. |
| BlueFlow has assets, Viper stays empty | §5.3 webhook not registered, or sync never triggered | Re-run just integrate; check just logs |
TypeError: datetime is not JSON serializable in viper_webhook |
BLUEFLOW_TAG < demo-0.3.4 or tasks.py bind-mount removed but image not pulled |
BLUEFLOW_TAG=demo-0.3.4, remove tasks.py volume, docker compose pull blueflow && just fresh |
BlueFlow 8 assets; fewer in Viper with upstreamApi → BlueFlow |
Webhook 202 + task succeeded; Viper item handling |
Re-run just integrate; compare MACs. Upstream BlueFlow payload is valid in demo-0.3.4 — likely Viper-side. |
viper_webhook ~0.05s, nothing in Viper |
B3: last_pinged null on all assets |
just integrate (runs backfill) or bash init/backfill-last-pinged.sh |
| Celery 4xx to Viper callback | Stale auth_token / wrong webhook_url |
Wipe Postgres volumes, re-register from scratch (just fresh && just integrate) |
| Replay starts before tapirxl is listening | depends_on: service_started is too eager for live mode |
Add a healthcheck to tapirxl's live mode; increase replay startup sleep |
WARN ... file too small to fingerprint |
Vector file source race (compose long-running mode only) | Benign; ignore. Pcap mode does not have this. |
- Do NOT build
tapirxl,blueflow, orviperfrom source in this repo. Pull pinned images. - Do NOT use
latesttags incompose.yaml. Demo stability depends on immutable digests. - Do NOT split
BLUEFLOW_API_TOKENinto two distinct values forblueflowandtapirxl. The token is one fact, sourced from.env. - Do NOT bind-mount Vector configs from this repo over the baked image configs. The image owns its config selection via
TAPIRXL_MODE. - Do NOT
--network=hostin Phase 2. Use the dedicated bridge soreplaycan sharetapirxl's netns. - Do NOT write integration registration steps (§5.3) into the README only. Encode them as
just integrate; presenters do not curl by hand. - Do NOT add
cap_add: [NET_ADMIN]to any service other thantapirxl. - Do NOT commit the
.envfile. Commit.env.examplewith placeholder token values; agents and operators copy it to.envlocally. - Do NOT introduce a Phase 1.5 mode. The boundary between Phase 1 (engineering) and Phase 2 (audience) is the
liveprofile +TAPIRXL_MODE. - Do NOT modify the canonical PCAP without coordinating a TapirXL golden regenerate (N6).
- Do NOT bake credentials, API tokens, or webhook secrets into committed YAML/scripts. Everything sensitive flows through
.env. - Do NOT bind-mount
init/blueflow-patches/tasks.pywhen usingvirtalabsinc/blueflow:demo-0.3.4+. The image ships the fixedviper_webhooktask.
Track these in your repo's issue tracker; they're the demo's open blockers across TA1 teams.
| ID | Item | Owner | Blocking |
|---|---|---|---|
| C2 | virtalabsinc/blueflow:demo-<tag> honors BLUEFLOW_API_TOKEN env var (env-driven, not runtime-generated) |
TA1 BlueFlow | Satisfied in demo-0.3.4 |
| C3 | virtalabsinc/viper:demo-<tag> published with stable tRPC + integrationUpload contract |
TA1 Viper | Phase 2 |
| C4 | BlueFlow no-op short-circuit on identical-state writes | TA1 BlueFlow | Phase 2 hygiene (eliminates duplicate historicalasset rows) |
| C7 | Talk-track timing pass with stopwatch | Demo presenter | Phase 2 rehearsal |
TapirXL-side items (A1–A3, B1, B2) live in TapirXL/.cursor/context/demo_critical_path.md and are not this repo's concern.
- New service added to
compose.yaml→ add row to Service Inventory table. - New
BLUEFLOW_*orVIPER_*env var introduced → update Authentication section. - Image contract changes (e.g. TapirXL adds a new
TAPIRXL_MODEvalue) → mirror it under Image Contracts. - New common failure → add a row to Common Failure Modes; never remove rows for failures still possible.
When PLAYBOOK.md (local operator runbook) drifts from this file, update this
file in the same PR.