Skip to content

feat(demo): public demo mode for demo.opensop.ai#29

Open
Chosen9115 wants to merge 10 commits into
mainfrom
feat/demo-app
Open

feat(demo): public demo mode for demo.opensop.ai#29
Chosen9115 wants to merge 10 commits into
mainfrom
feat/demo-app

Conversation

@Chosen9115

Copy link
Copy Markdown
Owner

Summary

Builds a public OpenSOP demo at demo.opensop.ai, gated by DEMO_MODE=true with zero behavioral change when the flag is unset.

  • Sample processes (6 total) under processes/examples/: existing customer-onboarding, lead-qualification plus four new ones — expense-approval, support-ticket-triage, release-deploy, agent-pr-review. Each header comment is honest about which step types are stubbed in v0.1.
  • Opensop::DemoMode is the single source of truth — enabled?, api_token (prefers OPENSOP_API_TOKEN so one secret powers both Sop::ApplicationController auth and the homepage display), reset-schedule constants.
  • Demo::SeedLoader + Demo::ResetJob keep demo state idempotent and reset daily at 3:00 UTC via Solid Queue's recurring config.
  • DemoReadOnly controller concern blocks Sop::ProcessesController#register in demo mode — instance lifecycle (start/submit/cancel) stays fully interactive.
  • rack-attack rules: 60 req/min/IP on /sop/, 10 starts/hr/IP, 120 req/min/IP on UI, 5-min ban on >1000 req/5min. All no-ops outside DEMO_MODE.
  • Ui::DocsController at /docs(/*path) serves project markdown via Commonmarker with strict path-traversal guards. Distinct from the existing /api-docs (API reference).
  • Ui::Demo::HomeController is the front-door homepage — hero, copy-able token, sample-process grid, CTAs to /docs and /api-docs. Routed conditionally so non-demo deploys still land on the existing dashboard.
  • Sticky amber banner rendered in both application and docs layouts when DEMO_MODE is on. Heroicon + i18n key paths per project conventions.
  • fly.demo.toml is a separate Fly config (app opensop-demo, region ord, 256MB shared-cpu-1x, Solid Queue in Puma) with the existing fly.toml untouched.
  • docs/demo-deploy.md runbook walks the operator through Fly app + Postgres provisioning, secrets, cert, and DNS handoff.

Stats

  • 38 files, +2489 lines
  • 50 new specs covering DemoMode, SeedLoader, ResetJob, DemoReadOnly guard, rate-limiting, docs rendering, banner partial, homepage routing
  • Full RSpec suite: 852 examples, 0 failures
  • No Coba data, no secrets, no Coba file paths in the diff

Multi-agent execution (per the request)

Wave 1: 3 parallel Sonnet agents (sample SOPs, Fly config, DemoMode helper).
Wave 2: 4 parallel agents (Sonnet × 3 for SeedLoader/ResetJob, DocsController + layout, DemoReadOnly guard; Haiku × 1 for rack-attack initializer) then a final Sonnet agent for banner + homepage.
Wave 3: rails-tests specialist in green phase added 23 specs.
Audit: Opus.

Test plan

  • flyctl apps create opensop-demo then follow docs/demo-deploy.md
  • After deploy, hit https://demo.opensop.ai/up (200) and / (homepage)
  • Curl with the public token: curl -H "X-SOP-Token: demo-public-token-resets-daily" https://demo.opensop.ai/sop/
  • Browse a process card → confirm "Try it" flow works
  • Visit /docs/architecture and /docs/process-authoring
  • Run a non-DEMO_MODE deploy of this branch (e.g. main app) — confirm zero behavioral difference vs current main

DNS handoff to repo owner

After Fly provisioning, add a CNAME demo → opensop-demo.fly.dev to opensop.ai DNS. flyctl certs create demo.opensop.ai --app opensop-demo and wait ~5 min for cert validation.

Things I deferred

  • @tailwindcss/typography plugin: this app uses Tailwind v4 CSS-only and the plugin needs npm tooling that isn't wired here. Used arbitrary-selector classes ([&>h1]:text-3xl, etc.) on the docs <article> instead. Looks fine; can be upgraded later.
  • Per-visitor sandboxing: shipped a shared instance pool with daily reset (per the spec we agreed on). A future PR can add a cookie-bound demo_session_id.

🤖 Generated with Claude Code

Carlos and others added 10 commits May 6, 2026 19:33
…nfig

Adds the foundation for a public OpenSOP demo at demo.opensop.ai:

- `Opensop::DemoMode` module: env-driven `enabled?` flag, `api_token`
  accessor, and reset-schedule constants used as the single source of
  truth for the demo wiring.
- Four new sample SOPs under `processes/examples/`: expense-approval,
  support-ticket-triage, release-deploy, agent-pr-review. Each exercises
  a different mix of step types (form / automated / judgment / approval /
  webhook / notification) and is honest in its header comment about
  which steps are stubbed in v0.1.
- Three accompanying step scripts (categorize-ticket, stamp-release,
  post-pr-comment) follow the existing stdin/stdout JSON convention.
- `fly.demo.toml` is a separate Fly config for the `opensop-demo` app —
  keeps the existing `fly.toml` untouched and makes
  `flyctl deploy --config fly.demo.toml` the only way to ship the demo.
- `docs/demo-deploy.md` is the runbook for provisioning the demo (Fly
  app + Postgres, secrets, cert, DNS handoff to the user).
…et job

Builds the runtime layer of the demo on top of the Wave 1 scaffold:

- `Demo::SeedLoader` (idempotent) loads every YAML under processes/examples/
  via `Opensop::Registry.load_all`. Validates that `OPENSOP_API_TOKEN`
  matches the homepage-displayed `Opensop::DemoMode.api_token` and warns
  on mismatch — there is no `ApiToken` model; auth is env-based.
- `Demo::ResetJob` runs daily at 3:00 UTC via Solid Queue's recurring
  config. Truncates instance + step + event + callback tables in
  dependency order inside a transaction, then re-runs `Demo::SeedLoader`.
- `DemoReadOnly` controller concern blocks process-definition mutations
  (`Sop::ProcessesController#register`) when DEMO_MODE is on. Instance
  lifecycle (start/submit/cancel) stays interactive — visitors need it.
- rack-attack rules: 60 req/min/IP on `/sop/`, 10 starts/hr/IP on
  `/sop/*/start`, 120 req/min/IP on UI, plus a 5-min IP ban on > 1000
  req in 5 min. All throttles no-op outside DEMO_MODE.
- `Ui::DocsController` serves project docs (architecture, deploy,
  process-authoring, etc.) at `/docs(/*path)` rendered via Commonmarker
  with strict path-traversal protection.
- `Ui::Demo::HomeController` is the front-door homepage for
  demo.opensop.ai — hero, copy-able API token, sample-process grid, CTAs
  to /docs and /api-docs. Routed conditionally so non-demo deploys still
  land on the existing dashboard.
- `_demo_banner` partial — sticky amber banner rendered in both the
  application and docs layouts when DEMO_MODE is on.
- `Opensop::DemoMode.api_token` now prefers `OPENSOP_API_TOKEN` so a
  single secret powers both auth and homepage display.

27/27 new specs pass.
Adds 23 specs covering the seam between the demo runtime and the
process engine:

- Service spec for `Demo::SeedLoader` (no-op outside DEMO_MODE,
  idempotent re-load, mismatch warning when OPENSOP_API_TOKEN is unset)
- Job spec for `Demo::ResetJob` (no-op outside DEMO_MODE; clears all
  instances and reseeds processes when on)
- Request spec for rack-attack throttling (11th `/sop/*/start` returns
  429 with `rate_limit` body in DEMO_MODE; off-mode passes through).
  Swaps to a fresh `ActiveSupport::Cache::MemoryStore` per example
  because the test env's null cache store would otherwise suppress
  throttle counters.
- View spec for the demo banner partial (renders only in DEMO_MODE,
  contains the localized message and GitHub link)
- System smoke spec walking the homepage and one process card

Brings the demo-feature spec total to 50 examples, all passing.
The admin UI is gated by HTTP-basic auth in production via
OPENSOP_UI_USER / OPENSOP_UI_PASSWORD — enforced both at boot
(config/initializers/admin_ui_auth.rb) and on every request
(Ui::ApplicationController#authenticate_admin_ui!).

For the public demo at demo.opensop.ai, the UI is intentionally public:
visitors browse processes and drive instances without credentials.
Mutation surfaces visitors can reach are already locked down by the
DemoReadOnly concern (definition mutations → 403) and rack-attack
throttles (per-IP limits + abuse ban). Adding a credential gate would
force visitors to find published creds in a separate doc — which is
worse than just trusting DemoReadOnly + rack-attack.

Both gates now early-return when Opensop::DemoMode.enabled? is true.
The error message in the boot guard now mentions DEMO_MODE as a valid
deployment path so the next operator hits a helpful nudge.

Caught when the first deploy of opensop-demo failed its db:prepare
release command.
Lets us curl the deployed demo directly via the Fly hostname for
post-deploy verification, independent of the demo.opensop.ai DNS
+ cert path.
The sidebar's Library section had two problems:

1. The template rendered every item as a `<div>` (never a link), even
   though the data layer exposed a `disabled: false` flag. So Templates,
   Webhooks, and API & SDK looked like links but did nothing on click.
   The Library template now mirrors the Workspace pattern — link_to for
   enabled items, div for disabled — so the disabled flag finally has
   teeth.

2. Templates and Webhooks render in the demo but only show empty/
   passive content (the daily reset clears any callback receipts;
   templates is a read-only list of the same processes shown in the
   main grid). They're greyed with the existing "Coming soon" tooltip
   when DEMO_MODE is on.

3. The third Library slot now shows "Docs" → /docs (the project docs
   rendered from docs/*.md by Ui::DocsController) when DEMO_MODE is on,
   instead of "API & SDK" → /api-docs. Demo visitors get a broader
   first surface (architecture, process-authoring, deploy guides); the
   API reference is still one click away from the homepage CTA.

Production behavior is unchanged for non-demo deploys: all three
Library items remain `disabled: false` and now correctly render as
clickable links to their respective controllers.
Demo went down today after a cascade: Postgres flaked → Solid Queue
(in-Puma) lost its DB connection → Solid Queue's graceful-shutdown
handler fired → Puma exited → port 3001 went dark. Fly never
auto-restarted because `min_machines_running = 0` + `auto_stop_machines`
doesn't trigger restart-on-unhealthy on idle machines.

Two changes:
- min_machines_running: 0 → 1, so Fly's standard restart-on-unhealthy
  applies and a flaky-Postgres burst can't park the app indefinitely.
- grace_period: 10s → 30s. Cold boot of Rails 8 + Thruster + Solid
  Queue + bootsnap warm-up routinely exceeded 10s, which produced
  brief 502 bursts as the Fly proxy started routing before Puma was
  ready.

Postgres health remains the underlying weakness (the demo's pg
machine was in role:error / 3-of-3 critical when investigated). That
needs separate follow-up — likely either a memory bump on the pg VM
or a switch to managed-postgres. Tracked as a follow-up; this commit
keeps the demo alive across the next Postgres flake.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…w2Ban

Root cause of 2026-05-08 demo outage: every request was triggering 2-3
writes to Solid Cache (Postgres-backed) because rack-attack's `cache.store`
was `Rails.cache`, AND the `Allow2Ban.filter(...)` block returned `true`
unconditionally — counting every request toward the 1000/5min ban
threshold whether the request was legitimate or not.

That sustained PG-write pressure on a 256MB Postgres VM (free RAM was
~7MB at idle) drove the DB into `role: error`. Investigation memo at
~/Documents/coba-twin/postgres-flake-investigation-2026-05-09.md.

Changes:
- Switch rack-attack store to ActiveSupport::Cache::MemoryStore. Counters
  no longer touch Postgres at all. Reset on Puma restart, which is fine
  for demo abuse prevention (per-minute throttles still hold).
- Remove the Allow2Ban block. The per-minute throttles (60 req/min /sop/
  + 120 req/min UI = max ~900 req/5min/IP under perfect pacing) already
  bound below the 1000-threshold the ban was guarding. Comment in the
  file documents how to re-add a properly-scoped ban if needed.

Pairs with a separate ops change: opensop-demo-db memory bumped from
256MB to 1024MB (`flyctl machine update --memory 1024`). The two fixes
together address the cause (write storm) and the resilience floor
(VM had no headroom for autovacuum or repmgr recovery).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant