PDSE ships as a SaaS: Next.js on Vercel → proxies to this Python backend on Railway. This doc covers the Railway backend (FastAPI web + Celery worker
- Redis + Postgres + a persistent volume for the vector DB). The Next.js front end deploys separately on Vercel and is out of scope here.
Status: config + runbook only. No deploy has been run — the Railway CLI is installed but not logged in. The owner must run
railway login(interactive, opens a browser) before anything below touches Railway.
┌────────────────────────────────────────────┐
Browser ── HTTPS ──▶ │ Vercel (Next.js) ── proxy ──▶ Railway │
│ backend │
└────────────────────────────────────────────┘
│
┌─────────────────────────────────────────┼───────────────────────────┐
│ Railway project ▼ │
│ │
│ ┌──────────────────┐ enqueue job ┌──────────────────┐ │
│ │ web (FastAPI) │ ───────────────▶│ worker (Celery) │ (T1) │
│ │ uvicorn │ via Redis │ celery worker │ │
│ │ /search /health │◀─── job status ─│ ingest pipeline │ │
│ └────────┬─────────┘ └────────┬─────────┘ │
│ │ HttpClient (CHROMA_SERVER_URL) │ │
│ └─────────────────┬───────────────────┘ │
│ ▼ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ chroma (chromadb/chroma image) — own service │ │
│ │ HTTP :8000 Volume → its data dir (the vectors) │ │
│ └───────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Redis plugin │ │ Postgres │ (auth/accounts, usage meter) │
│ │ broker+result│ │ plugin │ │
│ └──────────────┘ └──────────────┘ │
└───────────────────────────────────────────────────────────────────┘
Both web and worker are built from the same Dockerfile in this repo;
they differ only by start command (see below). T1 adds the Celery app at
src/worker.py; until then only web is deployable.
| Service | Built from | Start command | Notes |
|---|---|---|---|
| web | Dockerfile |
uvicorn server:app --app-dir src --host 0.0.0.0 --port $PORT |
Healthcheck GET /health. railway.json is its config. |
| worker | Dockerfile |
celery -A worker.celery_app worker --loglevel=info --concurrency=2 |
T1 (built). src/worker.py defines celery_app. Set PYTHONPATH=src. No healthcheck. Linux prefork is fine. |
| chroma | chromadb/chroma image |
(image default — serves HTTP on :8000) |
Standalone vector DB. Own Railway service from the official image. Attach the persistent volume HERE (its data dir). web + worker connect via CHROMA_SERVER_URL (see Volume + Env). |
| Redis | Railway plugin | — | Celery broker + result backend. Injects REDIS_URL. |
| Postgres | Railway plugin | — | Accounts/auth + usage metering. Injects DATABASE_URL. |
The worker is built: src/worker.py exposes celery_app + the worker.run_ingest
task. No Dockerfile change vs web; only the start command differs. worker.py
wires the worker_process_init signal to db.reset_engine() so each forked
prefork child gets its own SQLAlchemy connection pool (the SQLAlchemy + fork
footgun). On Linux (the deploy target) prefork is correct and fast.
Local macOS dev only: Apple's Objective-C runtime aborts (SIGABRT) if a process forks after touching ObjC-backed libs, which yt-dlp/SSL do. Run the worker with
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YESor--pool=solowhen developing on macOS. This does NOT affect the Linux container.
web and worker are separate containers with separate disks. If each opened
its own on-disk Chroma (the old PersistentClient(path="data/chroma") default),
the worker would ingest into ITS disk and the web tier would search ITS own empty
disk — the live index_empty bug (ingest reports chunks_indexed=1, /search
returns index_empty). The fix: run Chroma as one standalone service that
both connect to over the private network.
- Add a
chromaservice from the officialchromadb/chromaimage. It serves the Chroma HTTP API on:8000by default. Pin the image to a version compatible with the client we ship (chromadb1.5.x — seepyproject.toml).HttpClientvalidates the tenant/database at connect time against the server's API; a large client↔server version skew surfaces asCould not connect to tenant default_tenant/ 404 / 422 at startup (chroma-core/chroma #3410, #1392). Same major.minor is the safe bet. - Startup ordering: unlike the old on-disk client,
HttpClientdoes a real network round-trip whenVectorStoreis first constructed (lifespan log + first/health//search). Ifchromaisn't reachable yet, web/worker raise at first use; Railway's restart-on-failure recovers oncechromais up. The client never caches a failed connection, so a transient blip self-heals on the next request — no manual restart needed. - Attach the persistent volume to the
chromaservice (mounted at the image's data dir —chromadb/chromadefaults to/data; setIS_PERSISTENT=1/PERSIST_DIRECTORYper the image docs). The vectors live HERE now. This is the only service that needs a volume. - Set
CHROMA_SERVER_URLon BOTH web and worker to the service's private URL:CHROMA_SERVER_URL=http://chroma.railway.internal:8000(Railway private networking; plain http, no TLS).VectorStore(src/rag/vectorstore.py) then opens achromadb.HttpClientagainst it instead of local disk — same collections, same cosine config, same per-tenant isolation, just over the wire. - web + worker no longer need the
/app/data/chromavolume — remove the old per-service volume. WithCHROMA_SERVER_URLset, thedata/chromapath is ignored entirely; nothing is written to the web/worker container disk. - Populate the index by running ingest against the SAME server: a user
/ingestjob (the worker, withCHROMA_SERVER_URLset) writes to thechromaservice, and the web tier's/searchreads it back immediately.
Local dev: the repo's
docker-compose.ymlruns this whole topology (postgres + redis + chroma + web + worker) with the containers'CHROMA_SERVER_URLpre-wired to the chroma service — see README "Run it locally". For tests /python src/ingest.py/ the pure RAG path: leaveCHROMA_SERVER_URLunset.VectorStorefalls back to the on-diskPersistentClientatdata/chromaexactly as before — no server required. A set-but-blank value is treated as unset. A configured-but-unparseable URL raises (fail loud) rather than silently reading an empty local disk.
YouTube blocks unauthenticated extraction from datacenter IPs — the "Sign in
to confirm you're not a bot" wall. This is an IP-reputation block that cookies,
proxies, and PO-tokens don't reliably beat. yt-dlp and youtube-transcript-api
both hit YouTube's timedtext endpoints, so both fail from Railway where a
residential IP succeeds. The ingest worker therefore can't rely on yt-dlp in prod.
src/rag/transcript.py runs an env-driven fallback chain that prefers
backends whose egress is from clean infrastructure:
fetch_segments(url) ── captions ──▶
1. youtube-transcript-api (free, no key) ── IP-blocked on Railway ─▶ fall through
2. Supadata (SUPADATA_API_KEY) ── reliable from any IP ──▶ timed segments
3. yt-dlp (local dev only) ── blocked on Railway
fetch_metadata(url) ── title ──▶
1. YouTube Data API v3 (YOUTUBE_DATA_API) ── 200 from any IP ───────▶ title
2. yt-dlp --dump-json (local dev only) ── blocked on Railway
SUPADATA_API_KEY(https://supadata.ai) is the path that actually works from Railway — Supadata owns the YouTube IP battle and returns timed segments (it converts ms offsets to seconds for citations). Long videos (>20 min — every podcast) come back as an asyncjobIdthe worker polls. Without this key the worker can only fall to yt-dlp, which YouTube blocks from the datacenter — so set it on theworkerservice.YOUTUBE_DATA_API(Google Cloud console → enable "YouTube Data API v3") fetches the video title from any IP, free 10k units/day. Set it onworker.- Both are optional for local dev (unset ⇒ the chain falls back to yt-dlp,
which works from a residential IP — the test suite and
python src/ingest.pybehave unchanged with no keys). Lazy + env-gated: no API is called unless its key is set. - The yt-dlp anti-bot env (
YT_DLP_COOKIES_B64/YT_DLP_COOKIES_FILE/YT_DLP_PROXY) remains as the local fallback's plumbing; in prod prefer the API backends above over fighting the bot wall with cookies.
Set per-service in the Railway dashboard (or railway variables). Plugin URLs
(REDIS_URL, DATABASE_URL) are injected automatically when you add the plugin.
| Var | web | worker | Source / notes |
|---|---|---|---|
OPENAI_API_KEY |
✅ | ✅ | Embeddings + LLM synthesis. Required. |
PORT |
✅ | — | Injected by Railway; uvicorn binds it. Do not hardcode. |
REDIS_URL |
✅ | ✅ | From the Redis plugin. Web enqueues, worker consumes. |
CELERY_BROKER_URL |
✅ | ✅ | T1. Set to ${{Redis.REDIS_URL}} (broker). |
CELERY_RESULT_BACKEND |
✅ | ✅ | T1. Set to ${{Redis.REDIS_URL}} (results), or a Postgres URL. |
DATABASE_URL |
✅ | ✅ | From the Postgres plugin. Accounts/auth + usage metering. |
CHROMA_SERVER_URL |
✅ | ✅ | Required in prod. Points web + worker at the standalone chroma service so they share ONE index. http://chroma.railway.internal:8000 (private network, http). Unset ⇒ local on-disk PersistentClient (dev/tests only; broken across two containers). |
SUPADATA_API_KEY |
— | ✅ | Required on worker in prod. Reliable captions from a datacenter IP (yt-dlp/timedtext are bot-walled on Railway). Unset ⇒ worker can only fall to yt-dlp, which YouTube blocks. See "Captions + metadata backends". |
YOUTUBE_DATA_API |
— | ✅ | Recommended on worker. YouTube Data API v3 key — fetches the video title from any IP (free 10k/day). Unset ⇒ falls back to yt-dlp --dump-json (bot-walled on Railway). |
SIM_FLOOR |
◻︎ | ◻︎ | Optional retrieval-gate override (default 0.35). |
TOP_K |
◻︎ | ◻︎ | Optional (default 5). |
LLM_MODEL |
◻︎ | ◻︎ | Optional (default gpt-4o-mini). |
EMBED_MODEL |
◻︎ | ◻︎ | Optional (default text-embedding-3-small). |
LANGFUSE_PUBLIC_KEY |
◻︎ | ◻︎ | Optional tracing. Unset → tracing off, zero cost. |
LANGFUSE_SECRET_KEY |
◻︎ | ◻︎ | Optional tracing. |
LANGFUSE_BASE_URL |
◻︎ | ◻︎ | Optional tracing collector URL. |
Security (T3 — auth, CORS, rate limiting). The web tier verifies Clerk session JWTs on the request path; the worker does no auth (it consumes jobs the web tier already authorized):
| Var | web | worker | Notes |
|---|---|---|---|
CLERK_JWT_ISSUER |
✅ | — | Clerk Frontend API origin. JWKS = ${ISSUER}/.well-known/jwks.json. Required — unset ⇒ every Bearer token rejected (fail closed). e.g. https://your-app.clerk.accounts.dev. |
ALLOWED_ORIGINS |
✅ | — | Comma-separated CORS allowlist + Clerk azp allowlist. Set to your Vercel frontend origin in prod (e.g. https://<app>.vercel.app). NEVER *. Defaults to http://localhost:3000. |
AUTH_DEV_TRUST_HEADER |
◻︎ | — | Dev only. 1 ⇒ also trust an X-User-Id header as identity. MUST be 0/unset in prod (else trivial impersonation). Default OFF. |
RATE_LIMIT_INGEST_PER_HOUR |
◻︎ | — | Per-user ingest cap (default 10). Redis-backed; fails OPEN if Redis is down. |
RATE_LIMIT_SEARCH_PER_MINUTE |
◻︎ | — | Per-user search cap for signed-in users (default 30). |
Usage metering (internal). The worker records each successful job's embedding
tokens + estimated cost to the usage_ledger table (and on the jobs row); no
external billing provider is involved and no extra env is required.
Auth (Clerk — Vercel front end). The front end signs requests with the Clerk
session JWT (Authorization: Bearer …); the backend (CLERK_JWT_ISSUER,
above) verifies it server-side and derives the user from the token sub:
| Var | Used by | Notes |
|---|---|---|
CLERK_SECRET_KEY |
Vercel | Server-side Clerk SDK (auth(), getToken()). |
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY |
Vercel | Clerk client. |
CLERK_JWT_ISSUER |
web (backend) | JWKS issuer for backend verification (see Security table above). |
✅ required · ◻︎ optional · — not applicable
Identity flow (T3): browser → Vercel Route Handler / server component (
web-next/src/lib/backend.ts) attaches the Clerk session JWT asAuthorization: Bearer <jwt>→ FastAPIrequire_user/optional_userverifies it (RS256 vs cached JWKS,iss/exp/azp) and resolvessubto the internal user. The browser never sends the identity itself.
Prereqs: Railway CLI v5+ (installed), a Railway account.
# 1. Authenticate (INTERACTIVE — opens a browser). Owner must run this; an
# automated agent cannot. Everything below depends on it.
railway login
# 2. Create / link the project (run from the repo root).
railway init # or: railway link (to attach to an existing project)
# 3. Add the managed plugins.
railway add --plugin redis
railway add --plugin postgres
# 4. Set secrets on the web service (repeat --set per var; see the table above).
railway variables --set "OPENAI_API_KEY=sk-..."
# REDIS_URL / DATABASE_URL are injected by the plugins automatically.
# 5. Create the persistent volume and mount it at /app/data/chroma (see "Volume"
# above for WHY this exact path). Volume mount is set in the dashboard
# (Service → Settings → Volumes) or:
railway volume add --mount-path /app/data/chroma
# 6. Deploy the web service (uses railway.json: Dockerfile build + /health check).
railway up
# 7. Populate the index on the volume (one-off; or let the T1 worker do it).
railway run uv run python src/ingest.py
# 8. (T1) Add the worker service from the SAME repo/image, override its start
# command to the Celery command in the Services table, and give it the same
# OPENAI_API_KEY / REDIS_URL / DATABASE_URL.railway.json configures the web service (Dockerfile builder, start command,
/health healthcheck, restart-on-failure). The worker is a second Railway
service pointed at this same repo with the start command overridden.
curl -fsS "https://<your-web-service>.up.railway.app/health"
# → {"status":"ok","episodes_indexed":N,"chunks_indexed":M}If chunks_indexed is 0, the volume mount path is wrong (see "Volume") or the
index wasn't ingested onto the volume.
Proves the image builds and the web role serves /health:
docker build -t pdse-web .
docker run --rm -e PORT=8000 -e OPENAI_API_KEY=dummy -p 8000:8000 pdse-web
# in another shell:
curl -fsS http://127.0.0.1:8000/health/health works without a real key or index (it returns counts, runs no search).
See benchmarks/README.md. One command:
uv run python benchmarks/bench_search.pyTimes search() over the golden queries against data/chroma/, writes p50/p95
to benchmarks/baseline.json. Skips cleanly (exit 0) with no key/index.