Deploy

PDSE ships as a SaaS: Next.js on Vercel → proxies to this Python backend on Railway. This doc covers the Railway backend (FastAPI web + Celery worker

Redis + Postgres + a persistent volume for the vector DB). The Next.js front end deploys separately on Vercel and is out of scope here.

Status: config + runbook only. No deploy has been run — the Railway CLI is installed but not logged in. The owner must run railway login (interactive, opens a browser) before anything below touches Railway.

Architecture

                        ┌────────────────────────────────────────────┐
   Browser ── HTTPS ──▶ │ Vercel (Next.js)  ── proxy ──▶ Railway      │
                        │                                  backend     │
                        └────────────────────────────────────────────┘
                                                  │
        ┌─────────────────────────────────────────┼───────────────────────────┐
        │ Railway project                          ▼                           │
        │                                                                       │
        │  ┌──────────────────┐   enqueue job   ┌──────────────────┐           │
        │  │ web (FastAPI)    │ ───────────────▶│ worker (Celery)  │  (T1)     │
        │  │ uvicorn          │   via Redis     │ celery worker    │           │
        │  │ /search /health  │◀─── job status ─│ ingest pipeline  │           │
        │  └────────┬─────────┘                 └────────┬─────────┘           │
        │           │ HttpClient (CHROMA_SERVER_URL)      │                     │
        │           └─────────────────┬───────────────────┘                     │
        │                             ▼                                         │
        │   ┌───────────────────────────────────────────────────┐             │
        │   │ chroma  (chromadb/chroma image) — own service      │             │
        │   │ HTTP :8000     Volume → its data dir (the vectors) │             │
        │   └───────────────────────────────────────────────────┘             │
        │                                                                       │
        │   ┌──────────────┐   ┌──────────────┐                                │
        │   │ Redis plugin │   │ Postgres     │  (auth/accounts, usage meter)  │
        │   │ broker+result│   │ plugin       │                                │
        │   └──────────────┘   └──────────────┘                                │
        └───────────────────────────────────────────────────────────────────┘

Both web and worker are built from the same Dockerfile in this repo; they differ only by start command (see below). T1 adds the Celery app at src/worker.py; until then only web is deployable.

Services

Service	Built from	Start command	Notes
web	`Dockerfile`	`uvicorn server:app --app-dir src --host 0.0.0.0 --port $PORT`	Healthcheck `GET /health`. `railway.json` is its config.
worker	`Dockerfile`	`celery -A worker.celery_app worker --loglevel=info --concurrency=2`	T1 (built). `src/worker.py` defines `celery_app`. Set `PYTHONPATH=src`. No healthcheck. Linux prefork is fine.
chroma	`chromadb/chroma` image	(image default — serves HTTP on `:8000`)	Standalone vector DB. Own Railway service from the official image. Attach the persistent volume HERE (its data dir). web + worker connect via `CHROMA_SERVER_URL` (see Volume + Env).
Redis	Railway plugin	—	Celery broker + result backend. Injects `REDIS_URL`.
Postgres	Railway plugin	—	Accounts/auth + usage metering. Injects `DATABASE_URL`.

The worker is built: src/worker.py exposes celery_app + the worker.run_ingest task. No Dockerfile change vs web; only the start command differs. worker.py wires the worker_process_init signal to db.reset_engine() so each forked prefork child gets its own SQLAlchemy connection pool (the SQLAlchemy + fork footgun). On Linux (the deploy target) prefork is correct and fast.

Local macOS dev only: Apple's Objective-C runtime aborts (SIGABRT) if a process forks after touching ObjC-backed libs, which yt-dlp/SSL do. Run the worker with OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES or --pool=solo when developing on macOS. This does NOT affect the Linux container.

Vector DB: standalone Chroma server (critical — read this)

web and worker are separate containers with separate disks. If each opened its own on-disk Chroma (the old PersistentClient(path="data/chroma") default), the worker would ingest into ITS disk and the web tier would search ITS own empty disk — the live index_empty bug (ingest reports chunks_indexed=1, /search returns index_empty). The fix: run Chroma as one standalone service that both connect to over the private network.

Add a chroma service from the official chromadb/chroma image. It serves the Chroma HTTP API on :8000 by default. Pin the image to a version compatible with the client we ship (chromadb 1.5.x — see pyproject.toml). HttpClient validates the tenant/database at connect time against the server's API; a large client↔server version skew surfaces as Could not connect to tenant default_tenant / 404 / 422 at startup (chroma-core/chroma #3410, #1392). Same major.minor is the safe bet.
Startup ordering: unlike the old on-disk client, HttpClient does a real network round-trip when VectorStore is first constructed (lifespan log + first /health//search). If chroma isn't reachable yet, web/worker raise at first use; Railway's restart-on-failure recovers once chroma is up. The client never caches a failed connection, so a transient blip self-heals on the next request — no manual restart needed.
Attach the persistent volume to the chroma service (mounted at the image's data dir — chromadb/chroma defaults to /data; set IS_PERSISTENT=1 / PERSIST_DIRECTORY per the image docs). The vectors live HERE now. This is the only service that needs a volume.
Set CHROMA_SERVER_URL on BOTH web and worker to the service's private URL: CHROMA_SERVER_URL=http://chroma.railway.internal:8000 (Railway private networking; plain http, no TLS). VectorStore (src/rag/vectorstore.py) then opens a chromadb.HttpClient against it instead of local disk — same collections, same cosine config, same per-tenant isolation, just over the wire.
web + worker no longer need the /app/data/chroma volume — remove the old per-service volume. With CHROMA_SERVER_URL set, the data/chroma path is ignored entirely; nothing is written to the web/worker container disk.
Populate the index by running ingest against the SAME server: a user /ingest job (the worker, with CHROMA_SERVER_URL set) writes to the chroma service, and the web tier's /search reads it back immediately.

Local dev: the repo's docker-compose.yml runs this whole topology (postgres + redis + chroma + web + worker) with the containers' CHROMA_SERVER_URL pre-wired to the chroma service — see README "Run it locally". For tests / python src/ingest.py / the pure RAG path: leave CHROMA_SERVER_URL unset. VectorStore falls back to the on-disk PersistentClient at data/chroma exactly as before — no server required. A set-but-blank value is treated as unset. A configured-but-unparseable URL raises (fail loud) rather than silently reading an empty local disk.

Captions + metadata backends (critical for the worker on Railway)

YouTube blocks unauthenticated extraction from datacenter IPs — the "Sign in to confirm you're not a bot" wall. This is an IP-reputation block that cookies, proxies, and PO-tokens don't reliably beat. yt-dlp and youtube-transcript-api both hit YouTube's timedtext endpoints, so both fail from Railway where a residential IP succeeds. The ingest worker therefore can't rely on yt-dlp in prod.

src/rag/transcript.py runs an env-driven fallback chain that prefers backends whose egress is from clean infrastructure:

fetch_segments(url)  ── captions ──▶
   1. youtube-transcript-api (free, no key)  ── IP-blocked on Railway ─▶ fall through
   2. Supadata  (SUPADATA_API_KEY)           ── reliable from any IP ──▶ timed segments
   3. yt-dlp    (local dev only)             ── blocked on Railway

fetch_metadata(url) ── title ──▶
   1. YouTube Data API v3 (YOUTUBE_DATA_API) ── 200 from any IP ───────▶ title
   2. yt-dlp --dump-json (local dev only)    ── blocked on Railway

SUPADATA_API_KEY (https://supadata.ai) is the path that actually works from Railway — Supadata owns the YouTube IP battle and returns timed segments (it converts ms offsets to seconds for citations). Long videos (>20 min — every podcast) come back as an async jobId the worker polls. Without this key the worker can only fall to yt-dlp, which YouTube blocks from the datacenter — so set it on the worker service.
YOUTUBE_DATA_API (Google Cloud console → enable "YouTube Data API v3") fetches the video title from any IP, free 10k units/day. Set it on worker.
Both are optional for local dev (unset ⇒ the chain falls back to yt-dlp, which works from a residential IP — the test suite and python src/ingest.py behave unchanged with no keys). Lazy + env-gated: no API is called unless its key is set.
The yt-dlp anti-bot env (YT_DLP_COOKIES_B64 / YT_DLP_COOKIES_FILE / YT_DLP_PROXY) remains as the local fallback's plumbing; in prod prefer the API backends above over fighting the bot wall with cookies.

Environment variables

Set per-service in the Railway dashboard (or railway variables). Plugin URLs (REDIS_URL, DATABASE_URL) are injected automatically when you add the plugin.

Var	web	worker	Source / notes
`OPENAI_API_KEY`	✅	✅	Embeddings + LLM synthesis. Required.
`PORT`	✅	—	Injected by Railway; uvicorn binds it. Do not hardcode.
`REDIS_URL`	✅	✅	From the Redis plugin. Web enqueues, worker consumes.
`CELERY_BROKER_URL`	✅	✅	T1. Set to `${{Redis.REDIS_URL}}` (broker).
`CELERY_RESULT_BACKEND`	✅	✅	T1. Set to `${{Redis.REDIS_URL}}` (results), or a Postgres URL.
`DATABASE_URL`	✅	✅	From the Postgres plugin. Accounts/auth + usage metering.
`CHROMA_SERVER_URL`	✅	✅	Required in prod. Points web + worker at the standalone `chroma` service so they share ONE index. `http://chroma.railway.internal:8000` (private network, http). Unset ⇒ local on-disk `PersistentClient` (dev/tests only; broken across two containers).
`SUPADATA_API_KEY`	—	✅	Required on worker in prod. Reliable captions from a datacenter IP (yt-dlp/timedtext are bot-walled on Railway). Unset ⇒ worker can only fall to yt-dlp, which YouTube blocks. See "Captions + metadata backends".
`YOUTUBE_DATA_API`	—	✅	Recommended on worker. YouTube Data API v3 key — fetches the video title from any IP (free 10k/day). Unset ⇒ falls back to yt-dlp `--dump-json` (bot-walled on Railway).
`SIM_FLOOR`	◻︎	◻︎	Optional retrieval-gate override (default `0.35`).
`TOP_K`	◻︎	◻︎	Optional (default `5`).
`LLM_MODEL`	◻︎	◻︎	Optional (default `gpt-4o-mini`).
`EMBED_MODEL`	◻︎	◻︎	Optional (default `text-embedding-3-small`).
`LANGFUSE_PUBLIC_KEY`	◻︎	◻︎	Optional tracing. Unset → tracing off, zero cost.
`LANGFUSE_SECRET_KEY`	◻︎	◻︎	Optional tracing.
`LANGFUSE_BASE_URL`	◻︎	◻︎	Optional tracing collector URL.

Security (T3 — auth, CORS, rate limiting). The web tier verifies Clerk session JWTs on the request path; the worker does no auth (it consumes jobs the web tier already authorized):

Var	web	worker	Notes
`CLERK_JWT_ISSUER`	✅	—	Clerk Frontend API origin. JWKS = `${ISSUER}/.well-known/jwks.json`. Required — unset ⇒ every Bearer token rejected (fail closed). e.g. `https://your-app.clerk.accounts.dev`.
`ALLOWED_ORIGINS`	✅	—	Comma-separated CORS allowlist + Clerk `azp` allowlist. Set to your Vercel frontend origin in prod (e.g. `https://<app>.vercel.app`). NEVER `*`. Defaults to `http://localhost:3000`.
`AUTH_DEV_TRUST_HEADER`	◻︎	—	Dev only. `1` ⇒ also trust an `X-User-Id` header as identity. MUST be `0`/unset in prod (else trivial impersonation). Default OFF.
`RATE_LIMIT_INGEST_PER_HOUR`	◻︎	—	Per-user ingest cap (default `10`). Redis-backed; fails OPEN if Redis is down.
`RATE_LIMIT_SEARCH_PER_MINUTE`	◻︎	—	Per-user search cap for signed-in users (default `30`).

Usage metering (internal). The worker records each successful job's embedding tokens + estimated cost to the usage_ledger table (and on the jobs row); no external billing provider is involved and no extra env is required.

Auth (Clerk — Vercel front end). The front end signs requests with the Clerk session JWT (Authorization: Bearer …); the backend (CLERK_JWT_ISSUER, above) verifies it server-side and derives the user from the token sub:

Var	Used by	Notes
`CLERK_SECRET_KEY`	Vercel	Server-side Clerk SDK (`auth()`, `getToken()`).
`NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY`	Vercel	Clerk client.
`CLERK_JWT_ISSUER`	web (backend)	JWKS issuer for backend verification (see Security table above).

✅ required · ◻︎ optional · — not applicable

Identity flow (T3): browser → Vercel Route Handler / server component (web-next/src/lib/backend.ts) attaches the Clerk session JWT as Authorization: Bearer <jwt> → FastAPI require_user/optional_user verifies it (RS256 vs cached JWKS, iss/exp/azp) and resolves sub to the internal user. The browser never sends the identity itself.

Deploy runbook (owner runs this)

Prereqs: Railway CLI v5+ (installed), a Railway account.

# 1. Authenticate (INTERACTIVE — opens a browser). Owner must run this; an
#    automated agent cannot. Everything below depends on it.
railway login

# 2. Create / link the project (run from the repo root).
railway init            # or: railway link   (to attach to an existing project)

# 3. Add the managed plugins.
railway add --plugin redis
railway add --plugin postgres

# 4. Set secrets on the web service (repeat --set per var; see the table above).
railway variables --set "OPENAI_API_KEY=sk-..."
#    REDIS_URL / DATABASE_URL are injected by the plugins automatically.

# 5. Create the persistent volume and mount it at /app/data/chroma (see "Volume"
#    above for WHY this exact path). Volume mount is set in the dashboard
#    (Service → Settings → Volumes) or:
railway volume add --mount-path /app/data/chroma

# 6. Deploy the web service (uses railway.json: Dockerfile build + /health check).
railway up

# 7. Populate the index on the volume (one-off; or let the T1 worker do it).
railway run uv run python src/ingest.py

# 8. (T1) Add the worker service from the SAME repo/image, override its start
#    command to the Celery command in the Services table, and give it the same
#    OPENAI_API_KEY / REDIS_URL / DATABASE_URL.

railway.json configures the web service (Dockerfile builder, start command, /health healthcheck, restart-on-failure). The worker is a second Railway service pointed at this same repo with the start command overridden.

Verify after deploy

curl -fsS "https://<your-web-service>.up.railway.app/health"
# → {"status":"ok","episodes_indexed":N,"chunks_indexed":M}

If chunks_indexed is 0, the volume mount path is wrong (see "Volume") or the index wasn't ingested onto the volume.

Local container check (no Railway needed)

Proves the image builds and the web role serves /health:

docker build -t pdse-web .
docker run --rm -e PORT=8000 -e OPENAI_API_KEY=dummy -p 8000:8000 pdse-web
# in another shell:
curl -fsS http://127.0.0.1:8000/health

/health works without a real key or index (it returns counts, runs no search).

Benchmark (latency baseline)

See benchmarks/README.md. One command:

uv run python benchmarks/bench_search.py

Times search() over the golden queries against data/chroma/, writes p50/p95 to benchmarks/baseline.json. Skips cleanly (exit 0) with no key/index.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploy

Architecture

Services

Vector DB: standalone Chroma server (critical — read this)

Captions + metadata backends (critical for the worker on Railway)

Environment variables

Deploy runbook (owner runs this)

Verify after deploy

Local container check (no Railway needed)

Benchmark (latency baseline)

FilesExpand file tree

DEPLOY.md

Latest commit

History

DEPLOY.md

File metadata and controls

Deploy

Architecture

Services

Vector DB: standalone Chroma server (critical — read this)

Captions + metadata backends (critical for the worker on Railway)

Environment variables

Deploy runbook (owner runs this)

Verify after deploy

Local container check (no Railway needed)

Benchmark (latency baseline)