fix(fallbacks): T7 — prod fail-fast on missing config + tenant gate safe-by-default#808
Open
beveradb wants to merge 1 commit into
Open
fix(fallbacks): T7 — prod fail-fast on missing config + tenant gate safe-by-default#808beveradb wants to merge 1 commit into
beveradb wants to merge 1 commit into
Conversation
…afe-by-default Part of the silent-fallback audit (Theme 7). A single unset/rotated env var used to silently produce wrong behaviour in prod with only a warning. Now: - config.is_production() + validate_production_config(): at startup (main.py lifespan) production refuses to boot if GOOGLE_CLOUD_PROJECT, GCS_BUCKET_NAME, or CLOUD_RUN_SERVICE_URL are unset — rather than silently using a dev default (wrong project for secret lookups / wrong bucket / localhost worker URL). This subsumes 7.3 (worker_service localhost fallback): prod can't boot without CLOUD_RUN_SERVICE_URL. is_production() also treats Cloud Run's auto-set K_SERVICE as production, so the gate is robust even if ENVIRONMENT were unset. - tenant middleware (7.4, security): the `?tenant=` override (a spoofing vector) is now disabled whenever K_SERVICE is present, not only when ENVIRONMENT is explicitly "production" — production-safe by default. - dropbox_service (7.1): GOOGLE_CLOUD_PROJECT default was the WRONG literal "karaoke-gen" (real project is "nomadkaraoke"); replaced with an empty default (prod requires it via the startup gate). - email_service (7.5): a missing POSTMARK_SERVER_TOKEN in production used to silently use the console provider (every send "succeeded" while emails vanished). Now raises in production; console fallback remains for dev/test. Deferred: 7.2 GCS_TEMP_BUCKET / GCS_OUTPUT_BUCKET — these are NOT set via env in infra, so their config defaults are load-bearing; requiring/changing them risks prod. Flagged for a follow-up after confirming whether those buckets are used. Tests: new test_fallback_prod_config.py (is_production, validate gate, email prod raise, tenant K_SERVICE). 111 related tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Theme 7 of the silent-fallback audit (
docs/archive/2026-06-09-fallback-audit-plan.md, in #805). A single unset/rotated env var used to silently produce wrong behaviour in prod with only a warning. Now production fails fast / is safe-by-default. Verified against Pulumi infra: prod setsENVIRONMENT=production,CLOUD_RUN_SERVICE_URL,GCS_BUCKET_NAME,GOOGLE_CLOUD_PROJECT, and Postmark (secret); Cloud Run also auto-setsK_SERVICE.Changes
config.is_production()+validate_production_config()(called frommain.pylifespan): production refuses to boot ifGOOGLE_CLOUD_PROJECT,GCS_BUCKET_NAME, orCLOUD_RUN_SERVICE_URLare unset — instead of silently using a dev default (wrong project for secret lookups / wrong bucket / localhost worker URL). This subsumes 7.3 (worker_service localhost fallback).is_production()treats Cloud Run's auto-setK_SERVICEas production, so it's robust even ifENVIRONMENTwere unset.?tenant=override (a spoofing vector) is now disabled wheneverK_SERVICEis present, not only whenENVIRONMENT=production— production-safe by default.GOOGLE_CLOUD_PROJECTdefault was the wrong literal"karaoke-gen"(real project isnomadkaraoke); replaced with an empty default (prod requires it via the startup gate).POSTMARK_SERVER_TOKENin production used to silently use the console provider (sends "succeed" while emails vanish). Now raises in production; console fallback remains for dev/test.Deferred (flagged)
7.2
GCS_TEMP_BUCKET/GCS_OUTPUT_BUCKET— these are not set via env in infra, so their config defaults are load-bearing. Requiring or changing them risks prod, so deferred pending confirmation of whether those buckets are actually used.Testing
backend/tests/test_fallback_prod_config.py(is_production, validate gate raise/pass/no-op, email prod-raise vs dev-console, tenant K_SERVICE)Review
@coderabbitai ignore
🤖 Generated with Claude Code