Skip to content

fix(fallbacks): T7 — prod fail-fast on missing config + tenant gate safe-by-default#808

Open
beveradb wants to merge 1 commit into
mainfrom
fix/fallback-t7-env-config
Open

fix(fallbacks): T7 — prod fail-fast on missing config + tenant gate safe-by-default#808
beveradb wants to merge 1 commit into
mainfrom
fix/fallback-t7-env-config

Conversation

@beveradb

Copy link
Copy Markdown
Collaborator

Summary

Theme 7 of the silent-fallback audit (docs/archive/2026-06-09-fallback-audit-plan.md, in #805). A single unset/rotated env var used to silently produce wrong behaviour in prod with only a warning. Now production fails fast / is safe-by-default. Verified against Pulumi infra: prod sets ENVIRONMENT=production, CLOUD_RUN_SERVICE_URL, GCS_BUCKET_NAME, GOOGLE_CLOUD_PROJECT, and Postmark (secret); Cloud Run also auto-sets K_SERVICE.

Changes

  • config.is_production() + validate_production_config() (called from main.py lifespan): production refuses to boot if GOOGLE_CLOUD_PROJECT, GCS_BUCKET_NAME, or CLOUD_RUN_SERVICE_URL are unset — instead of silently using a dev default (wrong project for secret lookups / wrong bucket / localhost worker URL). This subsumes 7.3 (worker_service localhost fallback). is_production() treats Cloud Run's auto-set K_SERVICE as production, so it's robust even if ENVIRONMENT were unset.
  • Tenant middleware (7.4, security): the ?tenant= override (a spoofing vector) is now disabled whenever K_SERVICE is present, not only when ENVIRONMENT=production — production-safe by default.
  • dropbox_service (7.1): GOOGLE_CLOUD_PROJECT default was the wrong literal "karaoke-gen" (real project is nomadkaraoke); replaced with an empty default (prod requires it via the startup gate).
  • email_service (7.5): a missing POSTMARK_SERVER_TOKEN in production used to silently use the console provider (sends "succeed" while emails vanish). Now raises in production; console fallback remains for dev/test.

Deferred (flagged)

7.2 GCS_TEMP_BUCKET / GCS_OUTPUT_BUCKET — these are not set via env in infra, so their config defaults are load-bearing. Requiring or changing them risks prod, so deferred pending confirmation of whether those buckets are actually used.

Testing

  • New backend/tests/test_fallback_prod_config.py (is_production, validate gate raise/pass/no-op, email prod-raise vs dev-console, tenant K_SERVICE)
  • 111 related tests pass (email, tenant, config) — no regressions, no cross-test pollution from the tenant module reload

Review

  • CodeRabbit CLI review completed locally (0 findings)

@coderabbitai ignore


🤖 Generated with Claude Code

…afe-by-default

Part of the silent-fallback audit (Theme 7). A single unset/rotated env var used
to silently produce wrong behaviour in prod with only a warning. Now:

- config.is_production() + validate_production_config(): at startup (main.py
  lifespan) production refuses to boot if GOOGLE_CLOUD_PROJECT, GCS_BUCKET_NAME,
  or CLOUD_RUN_SERVICE_URL are unset — rather than silently using a dev default
  (wrong project for secret lookups / wrong bucket / localhost worker URL). This
  subsumes 7.3 (worker_service localhost fallback): prod can't boot without
  CLOUD_RUN_SERVICE_URL. is_production() also treats Cloud Run's auto-set
  K_SERVICE as production, so the gate is robust even if ENVIRONMENT were unset.
- tenant middleware (7.4, security): the `?tenant=` override (a spoofing vector)
  is now disabled whenever K_SERVICE is present, not only when ENVIRONMENT is
  explicitly "production" — production-safe by default.
- dropbox_service (7.1): GOOGLE_CLOUD_PROJECT default was the WRONG literal
  "karaoke-gen" (real project is "nomadkaraoke"); replaced with an empty default
  (prod requires it via the startup gate).
- email_service (7.5): a missing POSTMARK_SERVER_TOKEN in production used to
  silently use the console provider (every send "succeeded" while emails
  vanished). Now raises in production; console fallback remains for dev/test.

Deferred: 7.2 GCS_TEMP_BUCKET / GCS_OUTPUT_BUCKET — these are NOT set via env in
infra, so their config defaults are load-bearing; requiring/changing them risks
prod. Flagged for a follow-up after confirming whether those buckets are used.

Tests: new test_fallback_prod_config.py (is_production, validate gate, email prod
raise, tenant K_SERVICE). 111 related tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant