Skip to content

Post-deploy fixes: pin search_vec, add /office/policy to sidebar, doc push hazard#9

Merged
xiaolai merged 1 commit into
mainfrom
post-deploy-fixes
May 6, 2026
Merged

Post-deploy fixes: pin search_vec, add /office/policy to sidebar, doc push hazard#9
xiaolai merged 1 commit into
mainfrom
post-deploy-fixes

Conversation

@xiaolai

@xiaolai xiaolai commented May 6, 2026

Copy link
Copy Markdown
Owner

Summary

Three corrections after the moderator slice deploy.

1. submissions.search_vec got dropped from prod

drizzle-kit push was used to apply migrations 0018–0021 (because migrate hangs over the HTTP-only Neon driver). Push diffs against web/src/db/schema/*.ts, sees no search_vec declaration there, and drops the column. The column was tsvector GENERATED ALWAYS AS (...) STORED from migration 0003_fts.sql — Drizzle's schema DSL can't fully express generated columns.

Restore SQL (run via Neon dashboard SQL editor against prod):

ALTER TABLE submissions
  ADD COLUMN IF NOT EXISTS search_vec tsvector
  GENERATED ALWAYS AS (
    to_tsvector('english',
      coalesce(title, '') || ' ' || coalesce(text, '') || ' ' || coalesce(url, ''))
  ) STORED;
CREATE INDEX IF NOT EXISTS idx_submissions_search_vec ON submissions USING GIN (search_vec);

This PR adds the schema-side declaration so a future push sees the column as expected and won't drop it again. The GENERATED clause stays owned by migration 0003.

2. /office/policy was orphaned

The page rendered, but used a standalone breadcrumb instead of OfficeSidebar — readers navigating from /office couldn't find it from the left rail. Added "policy" to OfficeSidebarPage, wired a Shield-icon entry, and rewrapped the page in the proto-page-aside shell.

3. New project rule: never drizzle-kit push against prod

.claude/rules/db-migrations.md documents the search_vec incident, lists what else push will silently drop (triggers, generated columns, custom constraint names), and prescribes the safer path: explicit SQL migrations in web/src/db/migrations/<NNNN>_<slug>.sql applied via psql or the Neon dashboard.

Test plan

  • tsc clean
  • Apply the restore SQL above to prod via Neon dashboard → curl https://claudepot.com/search?q=test returns 200 (was likely 500)
  • After merge, /office left rail shows the new "Policy moderation" link → click navigates to /office/policy → sidebar persists with active highlight on the policy entry

…ar; doc push hazard

Three corrections after the post-deploy state of the moderator slice:

1. submissions.search_vec was dropped from prod when drizzle-kit push
   was used to apply migrations 0018-0021 (push hangs on migrate over
   the HTTP-only Neon driver). The column is `tsvector GENERATED ALWAYS
   AS (...) STORED` per migration 0003_fts.sql; Drizzle's schema DSL
   can't fully express it, so push saw a phantom column and dropped
   it. Now declared via customType<string>("tsvector") so push sees
   a match and leaves it alone. The actual column DDL still lives in
   migration 0003 — schema declaration is just the "preserve this"
   signal for future pushes.

   Re-creating search_vec in the live DB needs to be done out of
   band (Neon dashboard SQL editor or psql) since drizzle-kit push
   cannot author the GENERATED clause. The required SQL is in
   migration 0003_fts.sql verbatim, idempotent (IF NOT EXISTS).

2. /office/policy now uses the OfficeSidebar layout instead of a
   standalone breadcrumb. The page was orphaned — visible to readers
   navigating directly but invisible from /office's left rail. Added
   "policy" to OfficeSidebarPage union, wired a new entry with
   the Shield icon, and rewrapped the page in the proto-page-aside
   shell so navigation between Office sections works consistently.

3. New rule: .claude/rules/db-migrations.md — never use `drizzle-kit
   push` against prod. Documents the search_vec incident, what other
   DB features push will want to drop (triggers, generated columns,
   non-Drizzle constraint names), and the safer path (psql or Neon
   dashboard with the explicit migration files in
   web/src/db/migrations/*.sql).
@vercel

vercel Bot commented May 6, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
claudepot-com Error Error May 6, 2026 0:40am

Request Review

@xiaolai xiaolai merged commit 4870a33 into main May 6, 2026
7 of 8 checks passed
@xiaolai xiaolai deleted the post-deploy-fixes branch May 6, 2026 12:43
xiaolai added a commit that referenced this pull request May 8, 2026
HIGH fixes:
- register_from_browser: rollback private blob + cleanup temp dir
  on store.insert() failure (#4)
- Orphan detection: roundtrip check prevents lossy unsanitize_path
  from causing false-positive deletions (#1, #10)
- --from-token reads from stdin when value is "-" to avoid shell
  history exposure (#9)

MEDIUM fixes:
- swap switch(): rollback CC credentials on set_active_cli failure (#11)
- register_account_from_profile: delete orphaned private blob on
  insert failure (#17)
- remove_account: reorder to clear pointers + remove DB row before
  irreversible file deletions (#18)
- desktop switch(): propagate DB update failures instead of ignoring (#22)
- account.rs: propagate DB permission-setting failures (#20)
- project.rs: post-move failures become warnings instead of errors
  when directory already moved (#16)
- account remove --json: emit structured JSON response (#28)
- doctor summary: count expired accounts as warnings (#26)

LOW fixes:
- doctor: surface store.list() failures via db_error field (#29)
- keychain status: distinguish "no credential" from "access error" (#30)
xiaolai added a commit that referenced this pull request May 8, 2026
Independent Codex audit (5-dim mini, threadId 019de662) flagged
2 High + 8 Medium + 1 Low across the templates feature surface.
Independent verification (threadId 019de792) confirmed 10 fixes
landed; #11 is deferred with rationale.

### High

#1 prerun probe ignored route auth — `probe_sync` in
`automations/prerun.rs` now attaches `Authorization: Bearer
<key>` (or `Basic <key>` when `auth_scheme` is set) from the
route's `GatewayConfig.api_key` when non-empty. LiteLLM /
Cloudflare AI Gateway endpoints behind auth no longer false-
fail with 401.

#2 sidecar swap in `templates_apply_pending` —
`commands_templates.rs` now rejects early when
`pending.automation_id != automation_id`. Without this, an
attacker (or a buggy caller) could pair automation A's pending-
changes.json with automation B's broader apply policy.

### Medium

#3 Ollama probe detection too broad — was `cfg.base_url.contains
("/api/")` which false-positives on `/api/v1` (OpenRouter,
LiteLLM). Now narrowed to `:11434` port OR explicit `/api/tags`
path.

#4 hardcoded `~/.claudepot` — both `templates_pending_changes`
and `templates_read_report` now use `claudepot_data_dir()` from
claudepot-core, honoring `CLAUDEPOT_DATA_DIR` overrides
correctly.

#5 `supported_platforms` was only enforced in `templates_list`
— `templates_install` now ALSO rejects when
`!bp.supports(HostPlatform::current())`. The contract is
symmetric: a direct IPC bypass of the gallery filter no longer
lets a macOS-only template instantiate on Linux/Windows.

#6 Windows ACL trusted `USERNAME` — `tighten_consent_acl` now
resolves the current user SID via `whoami /user /fo csv /nh`,
grants `*<SID>:F` in icacls (well-known SID notation), grants
SYSTEM via the literal `S-1-5-18` SID, and surfaces every
failure path through `tracing::warn` instead of swallowing
silently.

#7 cron test wrote `CLAUDE_OAUTH_TOKEN` to disk via
`automation.extra_env` (which `install_shim` renders into
`run.sh`). Token now lives in a 0600 sibling file
(`<tmp>/.oauth-token`); a 0700 wrapper script
(`<tmp>/claude-with-token.sh`) reads the file at fire time and
exec's the real claude. The shim sees only the wrapper path,
never the token.

#8 cron test scheduled `now + 1 minute`, which races the case
where `install_shim + scheduler.register` span past the next
minute boundary — launchd would defer to the same time
tomorrow and the test would hang. Now schedules `now + 2
minutes`; deadline extended to 180s.

#9 cron test passed on a `"result":` substring match, which is
too permissive — error rows mention `result` too. Now parses
`stdout.log` as a JSON event array, finds the terminal
`result` event, asserts `is_error == false`. Verified locally:
the cron run produced `is_error=false result_first_80=
CRON_TEMPLATE_OK`.

#10 `CronCleanupGuard` did not restore the previous
`CLAUDEPOT_DATA_DIR` — risk of polluting subsequent tests in
the same process. Now captures the prior value as
`Option<OsString>` at construction and restores it on Drop.
Also explicitly zero-fills + unlinks the token file so a
tempdir-removal race doesn't leave the token on disk.

### Low (deferred)

#11 validator rebuilds globset matcher per call. Apply is
interactive (~5-50 items × 1-3 globs each, user reviews + clicks
per run); not a hot loop. Caching would require an
`ApplyConfig`-level matcher cache and is not worth the
complexity. Documented in the audit-fix verification report.

### Verification

Independent Codex re-audit on a fresh thread:
| Status | Count |
|---|---:|
| FIXED | 10 |
| NOT FIXED | 0 |
| PARTIAL | 0 |
| REGRESSED | 0 |
| DEFERRED | 1 |

Local validation:
- `cargo test --workspace` — all green
- `cargo test -p claudepot-core --test templates_real_llm
   cron_schedule_fires -- --ignored` — passes with new
  is_error=false assertion (110s wall, ~$0.30)

Audit threadId: 019de662-808c-7b62-80b3-e85d373f92b5
Verify threadId: 019de792-3515-7533-aac0-0203a96388be
xiaolai added a commit that referenced this pull request May 8, 2026
Post-deploy fixes: pin search_vec, add /office/policy to sidebar, doc push hazard
xiaolai added a commit that referenced this pull request May 8, 2026
Independent Codex audit (5-dim mini, threadId 019de662) flagged
2 High + 8 Medium + 1 Low across the templates feature surface.
Independent verification (threadId 019de792) confirmed 10 fixes
landed; #11 is deferred with rationale.

### High

#1 prerun probe ignored route auth — `probe_sync` in
`automations/prerun.rs` now attaches `Authorization: Bearer
<key>` (or `Basic <key>` when `auth_scheme` is set) from the
route's `GatewayConfig.api_key` when non-empty. LiteLLM /
Cloudflare AI Gateway endpoints behind auth no longer false-
fail with 401.

#2 sidecar swap in `templates_apply_pending` —
`commands_templates.rs` now rejects early when
`pending.automation_id != automation_id`. Without this, an
attacker (or a buggy caller) could pair automation A's pending-
changes.json with automation B's broader apply policy.

### Medium

#3 Ollama probe detection too broad — was `cfg.base_url.contains
("/api/")` which false-positives on `/api/v1` (OpenRouter,
LiteLLM). Now narrowed to `:11434` port OR explicit `/api/tags`
path.

#4 hardcoded `~/.claudepot` — both `templates_pending_changes`
and `templates_read_report` now use `claudepot_data_dir()` from
claudepot-core, honoring `CLAUDEPOT_DATA_DIR` overrides
correctly.

#5 `supported_platforms` was only enforced in `templates_list`
— `templates_install` now ALSO rejects when
`!bp.supports(HostPlatform::current())`. The contract is
symmetric: a direct IPC bypass of the gallery filter no longer
lets a macOS-only template instantiate on Linux/Windows.

#6 Windows ACL trusted `USERNAME` — `tighten_consent_acl` now
resolves the current user SID via `whoami /user /fo csv /nh`,
grants `*<SID>:F` in icacls (well-known SID notation), grants
SYSTEM via the literal `S-1-5-18` SID, and surfaces every
failure path through `tracing::warn` instead of swallowing
silently.

#7 cron test wrote `CLAUDE_OAUTH_TOKEN` to disk via
`automation.extra_env` (which `install_shim` renders into
`run.sh`). Token now lives in a 0600 sibling file
(`<tmp>/.oauth-token`); a 0700 wrapper script
(`<tmp>/claude-with-token.sh`) reads the file at fire time and
exec's the real claude. The shim sees only the wrapper path,
never the token.

#8 cron test scheduled `now + 1 minute`, which races the case
where `install_shim + scheduler.register` span past the next
minute boundary — launchd would defer to the same time
tomorrow and the test would hang. Now schedules `now + 2
minutes`; deadline extended to 180s.

#9 cron test passed on a `"result":` substring match, which is
too permissive — error rows mention `result` too. Now parses
`stdout.log` as a JSON event array, finds the terminal
`result` event, asserts `is_error == false`. Verified locally:
the cron run produced `is_error=false result_first_80=
CRON_TEMPLATE_OK`.

#10 `CronCleanupGuard` did not restore the previous
`CLAUDEPOT_DATA_DIR` — risk of polluting subsequent tests in
the same process. Now captures the prior value as
`Option<OsString>` at construction and restores it on Drop.
Also explicitly zero-fills + unlinks the token file so a
tempdir-removal race doesn't leave the token on disk.

### Low (deferred)

#11 validator rebuilds globset matcher per call. Apply is
interactive (~5-50 items × 1-3 globs each, user reviews + clicks
per run); not a hot loop. Caching would require an
`ApplyConfig`-level matcher cache and is not worth the
complexity. Documented in the audit-fix verification report.

### Verification

Independent Codex re-audit on a fresh thread:
| Status | Count |
|---|---:|
| FIXED | 10 |
| NOT FIXED | 0 |
| PARTIAL | 0 |
| REGRESSED | 0 |
| DEFERRED | 1 |

Local validation:
- `cargo test --workspace` — all green
- `cargo test -p claudepot-core --test templates_real_llm
   cron_schedule_fires -- --ignored` — passes with new
  is_error=false assertion (110s wall, ~$0.30)

Audit threadId: 019de662-808c-7b62-80b3-e85d373f92b5
Verify threadId: 019de792-3515-7533-aac0-0203a96388be
xiaolai added a commit that referenced this pull request May 8, 2026
Post-deploy fixes: pin search_vec, add /office/policy to sidebar, doc push hazard
xiaolai added a commit that referenced this pull request May 8, 2026
HIGH fixes:
- register_from_browser: rollback private blob + cleanup temp dir
  on store.insert() failure (#4)
- Orphan detection: roundtrip check prevents lossy unsanitize_path
  from causing false-positive deletions (#1, #10)
- --from-token reads from stdin when value is "-" to avoid shell
  history exposure (#9)

MEDIUM fixes:
- swap switch(): rollback CC credentials on set_active_cli failure (#11)
- register_account_from_profile: delete orphaned private blob on
  insert failure (#17)
- remove_account: reorder to clear pointers + remove DB row before
  irreversible file deletions (#18)
- desktop switch(): propagate DB update failures instead of ignoring (#22)
- account.rs: propagate DB permission-setting failures (#20)
- project.rs: post-move failures become warnings instead of errors
  when directory already moved (#16)
- account remove --json: emit structured JSON response (#28)
- doctor summary: count expired accounts as warnings (#26)

LOW fixes:
- doctor: surface store.list() failures via db_error field (#29)
- keychain status: distinguish "no credential" from "access error" (#30)
xiaolai added a commit that referenced this pull request May 8, 2026
Independent Codex audit (5-dim mini, threadId 019de662) flagged
2 High + 8 Medium + 1 Low across the templates feature surface.
Independent verification (threadId 019de792) confirmed 10 fixes
landed; #11 is deferred with rationale.

### High

#1 prerun probe ignored route auth — `probe_sync` in
`automations/prerun.rs` now attaches `Authorization: Bearer
<key>` (or `Basic <key>` when `auth_scheme` is set) from the
route's `GatewayConfig.api_key` when non-empty. LiteLLM /
Cloudflare AI Gateway endpoints behind auth no longer false-
fail with 401.

#2 sidecar swap in `templates_apply_pending` —
`commands_templates.rs` now rejects early when
`pending.automation_id != automation_id`. Without this, an
attacker (or a buggy caller) could pair automation A's pending-
changes.json with automation B's broader apply policy.

### Medium

#3 Ollama probe detection too broad — was `cfg.base_url.contains
("/api/")` which false-positives on `/api/v1` (OpenRouter,
LiteLLM). Now narrowed to `:11434` port OR explicit `/api/tags`
path.

#4 hardcoded `~/.claudepot` — both `templates_pending_changes`
and `templates_read_report` now use `claudepot_data_dir()` from
claudepot-core, honoring `CLAUDEPOT_DATA_DIR` overrides
correctly.

#5 `supported_platforms` was only enforced in `templates_list`
— `templates_install` now ALSO rejects when
`!bp.supports(HostPlatform::current())`. The contract is
symmetric: a direct IPC bypass of the gallery filter no longer
lets a macOS-only template instantiate on Linux/Windows.

#6 Windows ACL trusted `USERNAME` — `tighten_consent_acl` now
resolves the current user SID via `whoami /user /fo csv /nh`,
grants `*<SID>:F` in icacls (well-known SID notation), grants
SYSTEM via the literal `S-1-5-18` SID, and surfaces every
failure path through `tracing::warn` instead of swallowing
silently.

#7 cron test wrote `CLAUDE_OAUTH_TOKEN` to disk via
`automation.extra_env` (which `install_shim` renders into
`run.sh`). Token now lives in a 0600 sibling file
(`<tmp>/.oauth-token`); a 0700 wrapper script
(`<tmp>/claude-with-token.sh`) reads the file at fire time and
exec's the real claude. The shim sees only the wrapper path,
never the token.

#8 cron test scheduled `now + 1 minute`, which races the case
where `install_shim + scheduler.register` span past the next
minute boundary — launchd would defer to the same time
tomorrow and the test would hang. Now schedules `now + 2
minutes`; deadline extended to 180s.

#9 cron test passed on a `"result":` substring match, which is
too permissive — error rows mention `result` too. Now parses
`stdout.log` as a JSON event array, finds the terminal
`result` event, asserts `is_error == false`. Verified locally:
the cron run produced `is_error=false result_first_80=
CRON_TEMPLATE_OK`.

#10 `CronCleanupGuard` did not restore the previous
`CLAUDEPOT_DATA_DIR` — risk of polluting subsequent tests in
the same process. Now captures the prior value as
`Option<OsString>` at construction and restores it on Drop.
Also explicitly zero-fills + unlinks the token file so a
tempdir-removal race doesn't leave the token on disk.

### Low (deferred)

#11 validator rebuilds globset matcher per call. Apply is
interactive (~5-50 items × 1-3 globs each, user reviews + clicks
per run); not a hot loop. Caching would require an
`ApplyConfig`-level matcher cache and is not worth the
complexity. Documented in the audit-fix verification report.

### Verification

Independent Codex re-audit on a fresh thread:
| Status | Count |
|---|---:|
| FIXED | 10 |
| NOT FIXED | 0 |
| PARTIAL | 0 |
| REGRESSED | 0 |
| DEFERRED | 1 |

Local validation:
- `cargo test --workspace` — all green
- `cargo test -p claudepot-core --test templates_real_llm
   cron_schedule_fires -- --ignored` — passes with new
  is_error=false assertion (110s wall, ~$0.30)

Audit threadId: 019de662-808c-7b62-80b3-e85d373f92b5
Verify threadId: 019de792-3515-7533-aac0-0203a96388be
xiaolai added a commit that referenced this pull request May 8, 2026
Post-deploy fixes: pin search_vec, add /office/policy to sidebar, doc push hazard
xiaolai added a commit that referenced this pull request May 18, 2026
Eight findings collapsed into one refactor cycle plus two small
cleanups. The grill report (in conversation; not on disk) flagged
0 critical / 0 high / 3 medium / 8 low — this addresses every
actionable item.

Core reshape (#1 + #6 + #7):
  - detect_pr returns DetectOutcome { branch, pr } so the
    orchestrator keys the cache directly. Halves the per-tick git
    invocation count (no more standalone current_branch call).
  - PrCache keyed by repo_root only; insert() overwrites
    unconditionally so a branch flip is absorbed by the next tick
    without explicit detection. Drops get_any_for (was O(n)),
    drops the dead Entry.branch field, drops refresh_one's
    early-return guard.

Orchestrator parallelism + gh latching (#2 + #9):
  - tick_all fans out via tokio Semaphore + JoinSet with
    MAX_PARALLEL=4 cap on simultaneous gh invocations. 30-project
    tick now completes in ~8 batches instead of 30 sequential
    calls.
  - First MissingCli("gh") flips an AtomicBool that
    short-circuits every subsequent tick. gh-less users pay one
    detect attempt per process lifetime, then nothing.

Test coverage (#8):
  - New PrDetector trait + RealDetector wrapper so the orchestrator
    is testable against a FakeDetector. Four new tokio tests cover
    cache round-trip, negative caching, gh-absent latch, and the
    bounded-parallel fan-out (12 roots vs MAX_PARALLEL=4).
  - DTO tests: PrInfoDto serializes PrState lowercase;
    ProjectInfoDto omits the pr field when None.
  - PrCache test for explicit overwrite semantics.

UI hygiene (#3, #4, #5, #11):
  - Extract liveDotTitle to src/components/primitives so
    ProjectDetail and ProjectsTable share one implementation.
  - LiveStatusDot.title is now required in the prop type — the
    "mandatory by comment" contract was decay-prone.
  - BrandGithubMark license comment now correctly says the
    silhouette is a GitHub Inc. trademark used under the
    design.md brand-mark exception (was claiming CC-BY 4.0 on
    GitHub's mark, which doesn't apply).
  - cwdMatchesProject now normalizes both inputs to forward slash
    before prefix-checking, so a mixed-separator Windows project
    path matches its live cwd. Three new regression tests cover
    the cases.

Tests: 81 + 613 green. Clippy --all-targets clean. Fmt clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant