Skip to content

Big 🫏 PR: IAP-gated single-image deploy with a fully mediated data plane#114

Open
christophervoelpel wants to merge 30 commits into
google-marketing-solutions:mainfrom
christophervoelpel:deploy/corp-release
Open

Big 🫏 PR: IAP-gated single-image deploy with a fully mediated data plane#114
christophervoelpel wants to merge 30 commits into
google-marketing-solutions:mainfrom
christophervoelpel:deploy/corp-release

Conversation

@christophervoelpel

@christophervoelpel christophervoelpel commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Big 🫏 PR: IAP-gated single-image deploy with a fully mediated data plane

This PR replaces Scene Machine's deploy and auth architecture with one that fits a corporate Google Cloud environment. The app now runs as a single container image deployed as two Cloud Run services (a public, IAP-gated app and a private worker), the browser never touches Firestore or Cloud Storage directly (everything goes through a same-origin /api control plane), and Firestore and Storage rules deny all direct client access. It also folds in the GA Gemini 3 models and improved outpainting, a reworked deploy.sh, new CI, and a batch of UI fixes.

This is not an additive option. It is the new authoritative deploy and auth model for the project. Conflicts with main should resolve to this branch.

What and why

The previous architecture could not be deployed in our corporate environment:

  • It needs a publicly reachable Cloud Run backend behind an API Gateway, but org policy (Domain Restricted Sharing and IAP-mandatory ingress) blocks public, unauthenticated services.
  • The browser signed in with Firebase Auth and then read and wrote Firestore and Cloud Storage directly from the client. In a locked-down org we cannot hand the browser direct data-plane access, and the public client config is awkward to govern.

This branch solves both:

  • The front door is Identity-Aware Proxy only. There is no public service and no separate sign-in UI; IAP admits the user before any app code runs.
  • The data plane is fully mediated. The browser calls a same-origin /api on the same service that serves the SPA. The server uses the Admin SDK and hands back short-lived, server-signed GET and PUT URLs for media. The browser never holds Firestore or Storage credentials.
  • It is compatible with Domain Restricted Sharing and IAP-mandatory org policies: a single least-privilege runtime service account, service-to-service auth via OIDC, and no public ingress.

Architecture: before and after

flowchart LR
  subgraph BEFORE["BEFORE (main)"]
    direction TB
    b_user["Browser<br/>(Firebase Auth sign-in)"]
    b_gw["API Gateway<br/>(apispec.template.yaml)"]
    b_ui["Angular UI on App Engine<br/>(ui/app.yaml, deploy-ui.sh)"]
    b_run["Backend on Cloud Run"]
    b_fs[("Firestore")]
    b_gcs[("Cloud Storage")]
    b_user --> b_ui
    b_user --> b_gw --> b_run
    b_user -- "direct read/write" --> b_fs
    b_user -- "direct read/write" --> b_gcs
  end

  subgraph AFTER["AFTER (this branch)"]
    direction TB
    a_user["Browser"]
    a_iap{{"Identity-Aware Proxy"}}
    a_app["app service (Cloud Run, public)<br/>serves SPA + same-origin /api"]
    a_worker["worker service (Cloud Run, private)"]
    a_tasks["Cloud Tasks (OIDC)"]
    a_fs[("Firestore<br/>rules: deny all client")]
    a_gcs[("Cloud Storage<br/>rules: deny all client")]
    a_user --> a_iap --> a_app
    a_app -- "Admin SDK" --> a_fs
    a_app -- "server-signed GET/PUT URLs" --> a_gcs
    a_app --> a_tasks -- "OIDC invoke" --> a_worker
    a_worker -- "Admin SDK" --> a_fs
    a_worker -- "Admin SDK" --> a_gcs
  end
Loading

Before: three deployables (API Gateway, App Engine UI, Cloud Run backend), Firebase Auth in the browser, and a client that reads and writes Firestore and Storage directly.

After: one container image built once and run as two Cloud Run services selected by environment (ROLE, AUTH_MODE, WORKER_URL). The app service is public but IAP-gated and serves both the built SPA and a same-origin /api control plane. The worker service is private and is only ever invoked by Cloud Tasks using OIDC. No API Gateway, no App Engine, no separate UI deploy. All data access is mediated by the server.

Major changes

Front-door topology (single image, two services). One image (Dockerfile) runs as app or worker based on ROLE, with AUTH_MODE and WORKER_URL selecting behavior. orch.py validates these at startup (rejects an unknown ROLE, requires the IAP audience when ROLE=app and AUTH_MODE=iap, requires WORKER_URL when ROLE=app). The app service serves the built SPA and a same-origin /api; the worker service is private.

IAP front door and Firebase Auth removal. The Firebase Auth sign-in path is fully removed: no @angular/fire, no firebase-admin, no custom-token bridge. IAP is the only admission path. The browser-side sign-in UI is gone; the SPA assumes the request already passed IAP.

Mediated data plane and storyboard split. The browser never touches Firestore or Storage directly. Uploads go through POST /api/uploadUrl (server-signed PUT URLs) and reads through GET /api/signUrl (server-signed GET URLs); config is read through GET /api/config. firebase/firestore.rules and firebase/storage.rules both deny all direct client access (allow read, write: if false). Because the server signs GET URLs for any object in GCS_BUCKET, the deploy treats it as a dedicated bucket: it creates and labels the bucket (app=scene-machine), reuses or auto-adopts the default-named bucket on redeploy, and refuses to adopt a foreign or shared bucket unless the deployer explicitly opts in. Storyboards are split into a per-scene Firestore subcollection (projects/<id>/scenes/<index> in orch.py) so the 1 MiB per-document limit applies per scene instead of per project.

Deploy, infra, and CI. deploy.sh is reworked into a single-image Cloud Build deploy that is idempotent and least-privilege: the runtime service account self-scopes token-creator and act-as, the worker invoker is service-scoped, REST calls use ADC tokens (safe under corporate Certificate-Based Access), and there are opt-in fast-deploy flags (--app-only, --skip-ui-build, --no-build-cache) plus a local no-deploy dev loop. Supporting scripts live in deploy/libs.sh and deploy/grant-access.sh. Build context and image are slimmed (.dockerignore, .gcloudignore, cloudbuild.yaml layer caching).

GA Gemini 3 models and outpainting (folds in upstream PR #104). Defaults move to GA Gemini 3 models (config.template.txt: GEMINI_MODEL=gemini-3.5-flash, IMAGE_MODEL=gemini-3-pro-image) with improved outpainting (actions/outpaint_image.py). Action signatures and example workflows are brought into parity with the new model parameters.

UI features and fixes. Inline project-name editing on the storyboard, composition, and output headers; per-scene failure display; stuck-upload fix; position-based scene labels; and an image-import feature (paste, pasted URL lists, base64, and drag-from-another-tab across setup, storyboard reference, and the overlay dialog, with a browser-side fetch that has a timeout and size cap).

Tests and docs. UI vitest suite at 197 tests; Python suite at 146 tests. Two obsolete UI specs removed (config.spec.ts, remix-engine.spec.ts). README.md and DEVELOPING.md document the IAP-only deploy, the one-time console steps, the local dev loop, and the fast-deploy flags.

Security and review hardening. Deny-all client rules, server-only config and access allow-list reads, an ALLOWED_DOMAINS charset guard in deploy.sh, and static rule tests in CI. Several correctness and security edges found in review since this PR opened were also closed:

  • a failed project save is no longer silently treated as saved, so unsaved edits are not lost,
  • a finished render is no longer discarded when its output URL cannot be signed for a moment (it is retried and kept),
  • the generation paths degrade gracefully when /api/config has not loaded instead of marking a scene failed,
  • POST /api/projects is create-only (returns 409 on an existing id), so it cannot overwrite a project or reassign its owner,
  • GET /api/signUrl caps the number of paths per request, preventing signing-quota exhaustion by a single caller,
  • the project list reads only the first scene of each project (for its thumbnail) instead of every scene,
  • GCS_BUCKET must be a dedicated bucket that the deploy creates, labels, and owns,
  • CI now type-checks the UI test files (the fast test runner compiled but did not type-check them).

Breaking changes and migration

  • This PR replaces main's deploy and auth architecture. The API Gateway, the App Engine UI, and the separate UI deploy are removed (apispec.template.yaml, ui/app.yaml, deploy-ui.sh are deleted). Firebase Auth sign-in is removed.
  • There is no data migration. Existing deployments do a fresh redeploy on the new topology. Old client behavior (direct Firestore and Storage access) stops working by design once the deny-all rules are in place.
  • Merge conflicts should resolve to this branch. Do not regress to the gateway, App Engine, or client-Firebase architecture.

Security model

  • Admission: Identity-Aware Proxy gates the public app service. No request reaches app code without passing IAP.
  • Data access: only the app and worker service accounts read or write Firestore and Storage, using the Admin SDK. Media reaches the browser only through short-lived, server-signed URLs.
  • Client rules: Firestore and Storage rules deny all direct client access, so a leaked or replayed client token cannot read or write data directly.
  • Dedicated storage bucket: signed GET URLs are issued for any object in GCS_BUCKET, so the deploy requires a bucket dedicated to Scene Machine. It creates and labels the bucket, auto-adopts the default-named one on redeploy, and refuses a foreign or shared bucket unless the deployer explicitly opts in.
  • Service-to-service: the worker service is private and only invokable by Cloud Tasks via OIDC; the invoker permission is scoped to that service.
  • Shared-team CRUD (intentional): every IAP-admitted user can create, read, update, and delete every project. createdBy is a display label, not an access gate. This is a deliberate small-team design choice, not an oversight.

Testing and CI

All four GitHub Actions workflows are green on Linux runners:

  • ui-tests.yml: UI vitest suite (197 tests) plus a spec type-check and a production ng build.
  • python-tests.yml: Python tests on 3.11, 3.12, and 3.13 (146 tests) plus the action-signature check.
  • deploy-checks.yml: ShellCheck and static safety checks on the deploy scripts.
  • firebase-rules.yml: static and behavioral tests of the Firestore and Storage rules.

The full flow has also been deploy-validated end to end on a live GCP project using ./deploy.sh.

Deploy

One command does the full, safe deploy. There is no auth-mode flag: the deploy is always IAP-gated (the public Firebase sign-in mode was removed), so a plain run is all there is.

./deploy.sh

A first-time deploy on a brand-new project needs a few one-time console steps:

  • Configure the OAuth consent screen first. The custom OAuth client step below refuses to be created until the consent screen exists.
  • On a project without a Google Cloud organization, create a custom OAuth client once for IAP (a project inside an organization gets the managed OAuth client automatically). Skipping this on an org-less project surfaces as 502 "Empty OAuth client".
  • After the IAP deploy, nobody can open the app (not even the deployer) until they are admitted through IAP. Use deploy/grant-access.sh to admit a user.

The script prints these remaining steps verbatim at the end of a run, and they are documented in README.md.

Deploy options

A plain ./deploy.sh always runs the full, safe, end-to-end deploy. The flags below are opt-in, and each one logs what it skipped:

  • --non-interactive: for headless or agent-driven runs. It auto-confirms the "deploy to this project?" prompt, and instead of pausing at a human-only console step it fails fast, prints the exact console URL and the command to re-run, then exits so an automation can surface it and resume once the step is done.
  • --app-only: rebuild the image and redeploy only the app service, reusing the live worker (use when the worker did not change).
  • --skip-ui-build (alias --use-existing-ui-dist): reuse the existing ui/dist instead of rebuilding the Angular app. The deploy refuses a ui/dist that was built for local dev (sign-in disabled).
  • --no-build-cache: force a clean, cold image build, for a release or a dependency refresh.

Local development (UI)

You do not run a full deploy to work on the UI. A local loop edits and hot-reloads the Angular app in seconds while still calling the same /api endpoints the deployed app uses. People using Scene Machine should use a deployed instance; this path is for people building it.

One-time setup renders the two gitignored files the UI needs (ui/src/env.ts, ui/definitions/config.json) in the dev front-door mode, and points the local backend at a dev project through your Application Default Credentials:

./scripts/dev-setup.sh
gcloud auth application-default login

Then run two terminals:

# Terminal 1 - the local /api backend
ROLE=app AUTH_MODE=none LOCAL_WORKER=1 FIRESTORE_DB_UI=<your-dev-db> PORT=8080 python3 orch.py

# Terminal 2 - the Angular dev server with the /api proxy
cd ui && npm run dev

Open the printed http://localhost:4200; edits under ui/src hot-reload. A proxy (ui/proxy.conf.json) forwards every /api call to the local backend, so the browser uses the same relative /api/... URLs as production with no CORS setup. On a fresh dev database, seed the config/global document once (DEVELOPING.md gives the command) so /api/config does not return 404.

  • AUTH_MODE=none (and the UI's controlPlaneMode: 'none') removes the sign-in gate. This is dev only: fine on localhost, unacceptable in production, and deploy.sh refuses to build or ship a none UI (including via --skip-ui-build).
  • LOCAL_WORKER=1 runs workflow actions in-process instead of scheduling Cloud Tasks, so no separate worker is needed. It is dev-only (no retries or backoff) and is honored only when AUTH_MODE=none, so it can never turn on in a deployed service.

The full local-dev guide, including running an end-to-end workflow locally, is in DEVELOPING.md.

Known trade-offs and notes

  • Shared-team delete is intentional. Any admitted user can delete any project. For a small trusted team this is the desired behavior; if per-user ownership is ever needed it would be a follow-on.
  • Storyboard multi-batch write atomicity. A project that fits in one Firestore batch commits atomically. A larger project spans multiple batches committed in sequence, so a failure partway through can leave the project partially written until the next save. This is a documented trade-off of raising the per-project size ceiling via the per-scene subcollection.
  • No data migration. Old deployments redeploy fresh; there is no in-place upgrade path from the previous architecture for Firestore project documents. Existing storage buckets are kept and reused: the deploy adds an ownership label to the bucket and moves no media.

…eway and App Engine

Replace the API Gateway and App Engine UI hosting with a single container image
that runs as the app or the worker based on ROLE/AUTH_MODE/WORKER_URL. Drop the
gateway config fields and point the status viewer at the same origin.
The app role serves the SPA plus a same-origin /api control plane gated by IAP;
the worker role is private and Cloud-Tasks-only. Firestore and Storage access is
mediated through /api with server-signed URLs, storyboards are split into a
per-scene subcollection, and the client security rules deny all direct access.
Remove @angular/fire from the UI and firebase-admin from the Python
requirements; the deployed path no longer uses Firebase Auth.
Behind IAP the user is already authenticated, so the SPA no longer initializes
Firebase or signs in with a custom token. The /api auth-header interceptor
handles the iap and local none modes.
Replace direct client Firestore/Storage with mediated /api calls and
server-signed media URLs, re-signing on read so long-lived bindings never serve
an expired URL.
Reorganize into a deploy/ folder, make IAM grants check-before-grant,
authenticate REST calls with ADC tokens, scope the runtime service account to
least privilege, and add a grant-access helper.
Cache Docker layers via cloudbuild.yaml, shrink the build context and runtime
image, and add a no-deploy local UI dev loop with an /api proxy.
Adopt the GA Gemini model ids with a separate image-model region, add
usage-tracking headers, and outpaint via a blank canvas with adaptive output
size. Sync the action definitions and example workflows to the required params.
… fixes

Import images by paste, drag, remote link, or base64 across setup, storyboard
and the overlay dialog; rename a project inline from the page headers; surface
per-scene generation failures by position; and steady playback, unstick uploads,
and recover interrupted renders.
…nput

Update the README, DEVELOPING guide and walkthrough for the IAP-only deployment,
the three sign-ins, the fast-deploy flags, the local dev loop, the shared-team
access model, and the new paste/links/base64 image input.
Add vitest UI specs, Python front-door/engine/rules tests, and GitHub workflows
that build and unit-test the UI, run the Python tests and action contract, run
the Firebase rule tests, and ShellCheck the deploy scripts.
@gps-readability-bot

Copy link
Copy Markdown

@gps-readability-bot

Copy link
Copy Markdown

@gps-readability-bot gps-readability-bot Bot requested a review from echom June 17, 2026 13:07
@gps-readability-bot

Copy link
Copy Markdown

Still need readability approvals from:

@gps-readability-bot

Copy link
Copy Markdown

Still need readability approvals from:

@gps-readability-bot

Copy link
Copy Markdown

Still need readability approvals from:

1 similar comment
@gps-readability-bot

Copy link
Copy Markdown

Still need readability approvals from:

…oogle-marketing-solutions#113 Cloud Tasks lock

Pre-merge review and hardening:
- Split the build and runtime service accounts. The app and worker now run as a
  dedicated runtime service account that does not hold artifactregistry.writer,
  so a compromised app cannot push or overwrite container images.
- Enforce a content-type allow-list on the signed-upload endpoint.
- Remove the unnecessary storage target-apply from the firestore-only deploy
  phase so it cannot abort and leave the deny-all rules undeployed.
- Route the raw Cloud Run and service-account IAM grants through the retry
  wrapper, and give the worker a finite gunicorn timeout.
- Fix the generation flow in the UI: do not resume until the global config has
  loaded, keep the in-flight marker when only URL signing fails, make
  render-resume per-execution, guard a malformed signed-URL expiry, and tolerate
  a single missing URL in a batch sign request.
- Reject an empty template update and surface an expired status-viewer session.
- Update the storage-rules test to match the deny-all rules, remove README
  trailing whitespace, and add a diff-check CI step.
- Add regression tests for the above.

Keep the PR google-marketing-solutions#113 fix from regressing:
- Port the Cloud Tasks duplicate-execution lock. The worker acquires and
  releases a per-task Firestore lock and the deploy enables its TTL, so this
  rewrite of orch.py and deploy.sh does not drop the fix when it lands after
  PR google-marketing-solutions#113.
…ions#113 Cloud Tasks lock

The lock added in PR google-marketing-solutions#113 reads executionId, nodeId, groupId and
workflowDefinition from the task payload, so the older test that posted the
pre-lock payload shape raised a KeyError and returned 500 (red Python CI).

- Update test_worker_url_overrides_host_for_trigger_action to send a valid
  Cloud Tasks payload and acquire the lock.
- Add tests for the lock behavior: a duplicate delivery is skipped with
  'Already Triggered', a retryable failure releases the lock and returns 429,
  and a non-retryable failure keeps the lock.
- Validate the task payload in the handler so a malformed body returns a clean
  400 instead of a 500, and add a test for it.

This keeps the PR google-marketing-solutions#113 duplicate-execution protection as the source of truth
rather than weakening it to satisfy the stale test.
Direct signed PUT uploads go from the browser to GCS and bypass Flask's
request-size limit, so the server could not previously reject an oversized
upload. The client now declares sizeBytes on the /api/uploadUrl request and the
server rejects an oversized or invalid declaration before signing, using a
per-prefix limit (1 GiB for remix-input, 50 MiB for thumbnail).

This is a pre-sign bound for the honest and accidental cases. A custom client
could still understate or omit sizeBytes; a bypass-proof bound would sign an
x-goog-content-length-range into the PUT URL, tracked as a follow-up. Content
type is already allow-listed.

- orch.py: validate sizeBytes against a per-prefix maximum.
- media.ts: send sizeBytes (the file size, or the byte length for text).
- Add tests for oversized, invalid, and within-limit declarations.
@christophervoelpel

Copy link
Copy Markdown
Contributor Author

Thanks for the v2 review — all three blockers are addressed on this branch (now at 35b4ad6):

P1 · Python CI (stale /triggerAction test). Updated test_worker_url_overrides_host_for_trigger_action to send a valid Cloud Tasks payload, and added tests for the PR #113 lock behavior: duplicate delivery → Already Triggered (no re-run), retryable failure → release lock + 429, non-retryable → keep lock. Also added handler validation so a malformed payload returns 400 instead of 500. PR #113's duplicate-execution lock is kept as the source of truth, not weakened. Full Python suite now passes (143).

P1 · deploy.sh merge with PR #113. This branch already carries the PR #113 lock behavior (orch.py, util/database.py, and the cloudTasks TTL step in deploy.sh). The remaining conflict is only that both PRs touch deploy.sh; when PR #113 lands first I'll rebase and keep this branch's deploy structure plus the PR #113 TTL step (guarded by test_cloudtasks_lock_ttl_step_present).

P2 · signed-upload size. The client now sends sizeBytes and the server rejects an oversized or invalid declaration before signing, using a per-prefix limit (1 GiB for remix-input, 50 MiB for thumbnail). This is a pre-sign bound for the honest/accidental case; the bypass-proof version (signing an x-goog-content-length-range into the PUT URL) is tracked as a follow-up. Content type is already allow-listed.

Separately, I validated the IAP-only deploy end-to-end on a fresh GCP project and the app is running.

@christophervoelpel christophervoelpel marked this pull request as ready for review June 17, 2026 17:42
@christophervoelpel christophervoelpel changed the title Corp release: IAP-gated single-image deploy with a fully mediated data plane Big 🫏 PR: IAP-gated single-image deploy with a fully mediated data plane Jun 17, 2026
- deploy.sh: when this CLI's 'gcloud run deploy' lacks the --iap flag, enable IAP
  with 'gcloud run services update app --iap' automatically instead of only
  printing it as a manual step, and drop that manual step from the summary when
  the auto-enable succeeds.
- deploy.sh: point the OAuth consent step at /auth/overview (the 'Get started'
  entry for a fresh project) and state it must be done before the custom OAuth
  client, which otherwise refuses with "you need to configure your OAuth consent
  screen".
The deploy only ever runs in IAP mode, so --auth-mode had a single valid
value and the deployed services already set AUTH_MODE=iap themselves.
Remove the flag from the script and the docs. Keep AUTH_MODE as a fixed
value inside the script so the banners and the IAP post-deploy steps
still read clearly.
The --yes flag only skipped the deploy-target confirmation, which is
exactly what --non-interactive already does. Drop --yes and its -y alias
and have the auto-confirm check key off --non-interactive, so there is
one way to skip the prompt. Update the usage text and the README.
The global config resource had no error handling, so a failed /api/config
request (a brief server error, a network blip, or an expired IAP session)
put the resource in its error state. Reading its value then threw, which
broke the home and setup pages instead of falling back to defaults.

Catch the error in the loader and return nothing, so the existing
fall-back-to-defaults code keeps working.
When a render finished, the code asked the backend for a signed link to the
output video with no retry. If that one request failed for a moment, the
render was recorded as a failure and the finished video was lost, even
though the per-scene generation path already retries and keeps the work.

Retry the signing the same way the generation path does, and if it still
fails, keep the in-flight marker so reopening the project picks the
finished video back up. Add a test for this case.
Two backend fixes:

- The sign-URL endpoint signed every path in the request, and each path is
  one signing call to Google. With no limit, one user could send thousands
  of paths and use up the project's signing quota for everyone. Cap a
  single request at 100 paths.

- Listing projects rebuilt the full storyboard of every project by reading
  all of its scenes, but the project list only shows the first scene as a
  thumbnail. Read just the first scene when listing, so opening the home
  page does far fewer reads.
The static rules tests already checked that no signed-in user can write
Firestore directly, but there was no matching check for reads. Add one so a
change that re-opened reads to any signed-in user would fail the tests, in
every pull request, with no test project needed.
The fast unit-test runner compiles each test file but never reports type
errors, so the tests passed even though seven test files had real type
errors (they only showed up in the slower local test command). CI never
caught them.

Add a step that type-checks the test files and fix the seven errors it
surfaces: drop references to env fields that no longer exist, complete a
test object that was missing required fields, and guard an optional value.
The per-item commits added the code but left a few gaps, and an
adversarial re-review of the change set surfaced more. This fills them:

- Sign each URL-signing path once and bound the distinct-path count
  (orch.py). Repeated 'path' params no longer each spend a signBlob RPC,
  and the cap measures distinct paths (the real RPC cost) by deduping
  before the cap check, not raw params.
- Guard the generation paths against an unloaded global config
  (remix-engine.ts). generateStoryboard and combineScenes now fail fast
  with a clear, recoverable message instead of a vague error from a
  non-null assertion on an undefined config, covering every caller that
  reads globalConfig fields.
- Make the global-config degradation testable: extract the resource
  loader into loadGlobalConfig() and test that a failed /api/config
  resolves to undefined and that the default project config still builds.
- Update the project-list test to the first-scene-only summary.
- Test the signing limits (cap on distinct paths, duplicates signed once)
  and the resumed render retry-success path.
- Match a parenthesized condition in the static no-broad-read rule test.
The autosave dedupe marked a config as saved the moment its POST/PATCH was
sent, before the server responded. If that save failed and the user did not
click the Retry snackbar, a later flush or save of the same unchanged project
skipped it as already-saved and the work was lost on navigation.

Split the two states: advance lastSavedConfig only on a successful response,
and track the in-flight object separately so concurrent saves are still
deduped. On failure the in-flight marker is cleared so a later flush, save, or
the next autosave re-attempts the unsaved work. Add tests for the retry and
the success dedupe.
Storyboard and combine generation already fail fast with a recoverable message
when /api/config has not loaded, but candidate generation still read the config
with a non-null assertion. An unloaded config turned into a vague 'Failed to
start workflow' and then marked the scene permanently failed.

Add the same 'configuration not loaded yet' guard before starting candidate
generation: show a recoverable message and leave the scene unflagged. Also add
a test for the live combine-render path when its output URL cannot be signed
(marker kept, no error run, button reset).
POST wrote the project document directly even when its id already existed,
overwriting it and re-stamping createdBy to the poster, which reassigned the
owner shown in 'My projects'. PATCH already preserves the owner; POST did not.

Return 409 Conflict when the id already exists, so updates go through PATCH and
a repeated create cannot clobber an existing project or its owner. Add a test.
Signed download URLs are issued for every object in GCS_BUCKET, which is safe
only if the bucket is dedicated to Scene Machine. The deploy already creates the
default bucket, but if the name already existed it silently reused it, so a
shared or unrelated bucket could have its contents exposed.

Label buckets the deploy creates with app=scene-machine. On a later run a
labeled bucket is reused; an unlabeled bucket with the default
${PROJECT}-scene-machine name (an upgrade of an existing deployment) is adopted
in place by adding the label, with no data moved; any other pre-existing or
custom-named bucket is refused unless the deployer sets ADOPT_EXISTING_BUCKET=1.
No new deployer task or role: roles/storage.admin already covers bucket creation
and label updates.
…-deploy scope

Follow-ups from the review:

- README: state that Scene Machine needs a dedicated bucket it creates and
  owns, not to point GCS_BUCKET at shared data, and how redeploys reuse it (the
  default-named bucket is adopted automatically; a custom existing name needs
  ADOPT_EXISTING_BUCKET=1).
- README: recommend deploy/grant-access.sh for admitting users, keeping the raw
  gcloud command as an alternative.
- README: note this release targets fresh deployments and does not migrate
  existing Firestore project documents.
- DEVELOPING.md: document the one-time config/global seed for local dev, since
  /api/config returns 404 on a fresh dev database.
After POST /api/projects became create-only, a POST whose id already exists
returns 409. The client did not recognise it: it showed the generic
save-failed snackbar, and Retry re-POSTed and hit 409 again, so a project the
server already had could get stuck in a failing-save loop.

On a 409, mark the id as already-persisted and retry the save once as a PATCH,
which updates the existing project and keeps the user's edits. The retry goes
through the normal save path with the latest project state, so it is tracked
for de-duplication and cannot send stale data over newer edits.
A new project is built from the global config defaults (aspect ratio,
resolution, model). When /api/config has not loaded yet, or failed to load,
those defaults are empty, so a project created in that window was saved with
missing settings.

Guard the new-project route: if the config is not loaded, show a recoverable
'try again in a moment' message and return to the home page without creating
anything.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant