Skip to content

feat: deploy status feature (per-service deployment lag)#314

Open
golgoth31 wants to merge 37 commits into
mainfrom
feat/deploystatus-feature
Open

feat: deploy status feature (per-service deployment lag)#314
golgoth31 wants to merge 37 commits into
mainfrom
feat/deploystatus-feature

Conversation

@golgoth31

Copy link
Copy Markdown
Owner

Summary

Adds a deployStatus portal feature: per running service, it shows the commits on the service's default branch not yet deployed (deployment lag) plus a best-effort link to the deploy-gate workflow run — computed from cluster truth (imageInventory observations) joined to the git source via OCI image labels.

Unlike the existing push-based release log, this is pull-based current state, and unlike a standalone tag-proxy generator it derives the deployed version from what's actually running and discovers the service list from the cluster (no hardcoded repo list).

Design + plan: docs/superpowers/specs/2026-06-18-deploystatus-feature-design.md, docs/superpowers/plans/2026-06-18-deploystatus-feature.md.

What's included (end-to-end)

  • CRD DeployStatus (controller-managed). Spec.Services = input (workload, image, sourceRepo, deployedRef); Status.Services = observed lag (state, aheadBy, pendingCommits, lastCheckedAt) — written only via Status().Update to avoid a reconcile loop.
  • deployStatus portal feature flag (defaults true, opt-out) + nil-safe accessor; emitted in the portal proto so the UI gates correctly.
  • Operator config DeployStatusConfig/ForgeConfig with a forge endpoint list matched by OCI source-URL host; two auth modes per forge — fine-grained PAT (auth.tokenEnv) or GitHub App (auth.app: appID/installationID/privateKeyEnv). Secret values are read from named env vars via os.Getenv, never stored in config/CR.
  • Forge-agnostic port + GitHub REST client (zero-dep): retry/backoff (429/5xx, no-retry 4xx), URL-escaped refs. GitHub App token source signs an RS256 App JWT with the Go stdlib and mints/caches/refreshes the short-lived installation token.
  • Controller chain: select-due (paced) → resolve OCI source (host match) → forge compare (per-entry error isolation) → best-effort deploy-run link → project to read store → update status.
  • Projection from imageInventory: reads each observed image's OCI labels (source/revision, semver-tag fallback) and upserts DeployStatus CRs for first-party (source-labeled) images only; prunes stale namespaces; carries forward entries on transient registry errors.
  • Connect/gRPC DeployStatusService + MCP server /mcp/deploystatus, feature-gated.
  • Validating webhook for the controller-managed invariants.
  • Remote federation via a shadow remote-<portal> CR (IsRemote) mirroring imageInventory; remote entries projected into a sentinel read-store bucket.
  • React "Deploy Status" page + sidebar nav, feature-gated, mirroring the image-inventory page.
  • Helm: deployStatus operator config block + secretEnvsecretKeyRef wiring, surviving make helm regeneration.

Review & verification

A review panel (security / execution-trace / silent-failure) ran on the full diff. Findings handled:

  • 🔴 Blocking — read store wiped on every "nothing due" reconcile (missing GenerationChangedPredicate + unguarded empty ReplaceForNamespace). Fixed: predicate added; read store now projects the complete Status.Services set (chain reordered so status update precedes the store projection).
  • 🟠 Projection never pruned stale entries → fixed (orphan deletion, mirroring sync_registry_crs.go).
  • 🟠 Transient registry error silently dropped a service → fixed (Error-level log + carry-forward of the prior entry).
  • Hardening: URL-escape compare refs; startup warning on empty PAT env.
  • Security: clean — no token/key leakage; correct RS256 JWT; no SSRF (the untrusted OCI source-label host only selects a configured forge, never the request target).

Known limitations (documented in the spec, deferred): remote-fetch staleness and feature-disabled state are not yet surfaced on the read API; minor pacing edge; GitLab subgroup parsing (GitHub-only in v1).

Gates: make lint = 0 · make test = 0 fail (full envtest suite) · go build ./... clean · make helm idempotent · web tsc + Vitest green. New-package coverage: read store 97.6%, forge client 83.9%, chain handlers 93.1%, webhook 82.6%.

🤖 Generated with Claude Code

golgoth31 and others added 30 commits June 22, 2026 14:01
- DeployStatus CRD types (controller-managed, derived from imageInventory)
- deployStatus portal feature flag + IsDeployStatusEnabled accessor + test
- DeployStatusService proto + generated Go/TS bindings
- kubebuilder scaffolding (PROJECT, RBAC, samples, scheme markers)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add DeployStatusConfig, ForgeConfig, ForgeAuthConfig, GitHubAppConfig
types to OperatorConfig. Token VALUES are never stored in config —
only the env var name (TokenEnv / PrivateKeyEnv) following the
established SlackEmojiConfig convention.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add forge validation to OperatorConfig.Validate(): enforces non-empty
host, supported kind, and exactly one auth mode (PAT XOR GitHub App
with all required App fields). Add 6 TDD tests covering all reject and
accept paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- internal/domain/forge/port.go: Client interface, RepoRef, Commit, CompareResult
- internal/domain/forge/parse.go: ParseSourceURL supporting https and scp-style ssh
- internal/domain/forge/parse_test.go: 10 table-driven cases (all pass)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- internal/forgeclient/github/token.go: TokenSource interface, PATTokenSource,
  AppTokenSource (RS256 JWT via stdlib only, installation token cache with 1-min
  early-refresh, PKCS1+PKCS8 PEM support)
- internal/forgeclient/github/token_test.go: PAT, mint-and-cache, refresh-expired,
  PKCS8 (all pass)
- internal/forgeclient/github/client.go: Client implementing forge.Client,
  getJSON with retry/backoff on 429/5xx/network, no-retry on 4xx
  (nonRetryableError), DefaultBranch + Compare (merge flag via parent count)
- internal/forgeclient/github/client_test.go: parse, merge flag, 429-retry,
  4xx-no-retry, truncated (all pass)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… injectable http client

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds Entry, WorkloadRef, Commit read-model types and the Reader/Writer
interfaces for the deploystatus bounded context.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements Store satisfying both Reader and Writer interfaces with
per-(portalRef,namespace) scoping, flat aggregation, and subscriber
broadcast (close-on-write pattern). All tests pass under -race.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds handlers.go (ChainData, WorkItem, ComputedEntry), compute.go
(ComputeLag merge-filter + 50-cap, StateFor ok/behind) with tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ResolveOCISourceHandler parses each WorkItem's OCI source label,
matches the host against configured forges, and marks unmatched /
unparseable items as unresolved ComputedEntries (removed from Due).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ForgeCompareHandler calls DefaultBranch then Compare per Due item using
the real RepoRef-based forge.Client API. Per-entry errors set state=error
and never fail the chain. ResolveDeployRunHandler enriches ok/behind
entries with a best-effort LatestWorkflowRun URL, swallowing errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SelectDueHandler builds WorkItems from Spec.Services entries older than
refreshInterval (default 5m). UpdateReadStoreHandler maps ComputedEntries
into domain entries (forge.SHA->dom.Sha) and calls ReplaceForNamespace.
UpdateStatusHandler stamps LastCheckedAt/state/aheadBy onto each processed
Spec.Services entry and updates ServiceCount/ObservedGeneration in Status.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…oncile loop)

Per K8s convention, computed lag/LastCheckedAt now live in Status.Services and are
written only via Status().Update; Spec.Services holds controller-managed input only.
SelectDue reads LastCheckedAt from Status.Services. Prevents generation bumps that
would re-trigger the controller.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a ProjectDeployStatusHandler to the imageInventory chain that upserts one
per-(portal, namespace) DeployStatus CR, populating Spec.Services from observed
workload-images that carry an org.opencontainers.image.source label. DeployedRef
comes from the org.opencontainers.image.revision label, falling back to the image
tag when it is a semver tag. Gated on the portal deployStatus feature; wired with
the existing CraneClient as the OCI label reader.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Add DeployStatus bool to domain PortalFeatures and PortalFeaturesStatus API type
- Add CheckDeployStatus FeatureChecker to feature_gate.go
- Implement DeployStatusService (mirrors ImageService): feature gate, List with Search/StateFilter, full entry+commit mapping with timestamppb
- Register handler on the Connect mux in webserver.Config (guarded by nil reader)
- Wire deployStatusStore reader into webCfg in cmd/main.go
- Propagate DeployStatus feature flag through portal controller (local + remote AND logic)
- 5 tests covering full mapping, search filter, state filter, feature-disabled path, and portal-default-to-main

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Exposes the deploy-status read store via a Streamable HTTP MCP server
mirroring the existing image_server.go pattern: same hooks, same
withToolMetrics wrapper, same Handler() shape.

Tool: list_deploy_status
- params: portal (default "main"), state filter (ok|behind|unresolved|error)
- returns JSON array of deploy-status entries with workload ref, image,
  source repo, deployed ref, ahead-by count, pending commits, and state

Mounted at /mcp/deploystatus in cmd/main.go wired to the shared
deployStatusStore reader (same instance used by gRPC).

Tests: 8 Ginkgo specs in mcp_test.go cover creation, list all,
default portal, state filtering (behind/ok), invalid state error,
pending commits in JSON output, and empty store.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implement DeployStatusCustomValidator enforcing controller-managed
invariants: spec.portalRef non-empty, spec.namespace non-empty,
every spec.services[i].key non-empty, and spec.services[i].state
(when set) must be one of ok|behind|unresolved|error.

Add plain stdlib tests mirroring imageregistry_webhook_test.go style.
Regenerate config/webhook/manifests.yaml via make manifests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Phase 9 wires two data-federation pieces for deploy-status, mirroring the
imageInventory remote pattern:

- SyncRemoteDeployStatusHandler creates/updates a shadow DeployStatus CR
  (remote-<portal>, IsRemote=true) per remote portal, gated on the
  deployStatus feature; registered in the portal controller chain next to
  the image-inventory sync handler. CleanupDisabledFeaturesHandler deletes
  the shadow CR when the feature is disabled or the portal is no longer remote.
- The DeployStatus controller IsRemote branch now fetches entries from the
  remote portal's DeployStatusService via the shared remoteclient cache
  (TLS from spec.remote.tls), maps proto entries to domain entries, and
  projects them into the local readstore under a single sentinel-namespace
  bucket. Remote fetch errors are best-effort: surfaced on status.lastError
  and requeued, never crashing the reconcile.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Add deploy_status field 7 to PortalFeatures proto and regenerate Go + TS bindings
- Create features/deploystatus feature with domain types, infrastructure API client,
  TanStack Query hook, badge utilities, card and list UI components
- Add DeployStatusPage page component mirroring ImagesPage structure
- Register :portalName/deploystatus route in router.tsx (lazy-loaded)
- Add Deploy Status nav item to PortalSidebar gated on features.deployStatus === true
- Extend MSW test helpers (connectJson, handlers) with deploy status fixtures
- Add Vitest hook test covering behind (with pending commits) and ok states

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wire the deploy-status feature into the generated Helm chart:

- Fix two malformed kustomization entries (config/crd, config/rbac) left by
  the Phase 0 scaffold that broke `make helm` (kustomize parse errors). This
  unblocks regeneration of the deploy-status CRD, RBAC, validating webhook
  and the portal deployStatus feature flag into helm/templates.
- Operator config: add a documented `deployStatus:` block (PAT and GitHub App
  variants) to config/manager/configmap.yaml so helmify renders it into
  config.configYaml (survives `make helm`).
- Secret env injection: add a `deployStatus.secretEnv` values block and a
  hack/helmify patch that renders each `envName -> {name,key}` entry as a
  `valueFrom.secretKeyRef` on the controller-manager container, so forge
  tokens (auth.tokenEnv / app.privateKeyEnv) can be sourced from a Secret.

`make helm` is idempotent; `helm template` renders the ConfigMap deployStatus
config and the secretKeyRef env on the manager container.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fix all 22 remaining lint issues so make lint exits 0 with 0 issues:
- gofmt: reformat 3 files with misaligned whitespace
- modernize: use for-range-N and strings.Cut patterns
- lll: wrap 133-char line in cmd/main.go
- goconst: extract repeated string literals into named constants
  across config, deploystatus chain, portal chain, source, forge,
  forgeclient/github, grpc, and mcp packages

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
golgoth31 and others added 7 commits June 22, 2026 14:02
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…andler, ResolveDeployRunHandler

Cover the three untested chain handlers with focused fake-client unit tests:

- update_status_test.go (4 tests): verifies Status.Services is written with
  correct state/aheadBy/lastCheckedAt; Spec.Services is never mutated; prior
  status is preserved for services not computed this cycle; no-op on empty
  Computed; only client.Status().Update is called (re-fetched from fake store
  to assert persisted state vs in-memory mutation).

- update_readstore_test.go (3 tests): verifies ComputedEntry → dom.Entry field
  mapping including forge.Commit.SHA → dom.Commit.Sha; PendingTruncated;
  ReplaceForNamespace is called with correct (portalRef, namespace) args; nil
  commits produce nil PendingCommits.

- resolve_deploy_run_test.go (4 tests): verifies ok/behind entries are enriched
  with DeployRunURL; error/unresolved entries skip the forge client call
  entirely; LatestWorkflowRun errors are swallowed (handle returns nil);
  mixed-state scenario exercises all four state values.

Item #4 (IsRemote happy-path): remoteclient.Client is a concrete struct with
no injectable interface, making fake injection impossible without a live HTTP
server or production-code seam. The mapRemoteEntry proto→domain mapping is
already fully covered in remote_map_test.go (2 tests), satisfying the fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- prevent readstore wipe on no-op reconciles: add GenerationChangedPredicate
  to the DeployStatus watch and project the readstore from the full,
  just-updated Status.Services (UpdateStatus now runs before UpdateReadStore)
- prune stale DeployStatus CRs whose namespace no longer carries a first-party
  image (mirrors sync_registry_crs orphan deletion)
- carry forward existing entries and log at Error level when an OCI label read
  fails transiently, instead of silently dropping the workload
- url-escape owner/repo/base/head/workflow/branch path segments in the GitHub
  forge client to prevent path injection
- warn on empty PAT env at startup and log deploy-run resolution failures at V(1)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant