feat: deploy status feature (per-service deployment lag)#314
Open
golgoth31 wants to merge 37 commits into
Open
feat: deploy status feature (per-service deployment lag)#314golgoth31 wants to merge 37 commits into
golgoth31 wants to merge 37 commits into
Conversation
- DeployStatus CRD types (controller-managed, derived from imageInventory) - deployStatus portal feature flag + IsDeployStatusEnabled accessor + test - DeployStatusService proto + generated Go/TS bindings - kubebuilder scaffolding (PROJECT, RBAC, samples, scheme markers) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add DeployStatusConfig, ForgeConfig, ForgeAuthConfig, GitHubAppConfig types to OperatorConfig. Token VALUES are never stored in config — only the env var name (TokenEnv / PrivateKeyEnv) following the established SlackEmojiConfig convention. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add forge validation to OperatorConfig.Validate(): enforces non-empty host, supported kind, and exactly one auth mode (PAT XOR GitHub App with all required App fields). Add 6 TDD tests covering all reject and accept paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- internal/domain/forge/port.go: Client interface, RepoRef, Commit, CompareResult - internal/domain/forge/parse.go: ParseSourceURL supporting https and scp-style ssh - internal/domain/forge/parse_test.go: 10 table-driven cases (all pass) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- internal/forgeclient/github/token.go: TokenSource interface, PATTokenSource, AppTokenSource (RS256 JWT via stdlib only, installation token cache with 1-min early-refresh, PKCS1+PKCS8 PEM support) - internal/forgeclient/github/token_test.go: PAT, mint-and-cache, refresh-expired, PKCS8 (all pass) - internal/forgeclient/github/client.go: Client implementing forge.Client, getJSON with retry/backoff on 429/5xx/network, no-retry on 4xx (nonRetryableError), DefaultBranch + Compare (merge flag via parent count) - internal/forgeclient/github/client_test.go: parse, merge flag, 429-retry, 4xx-no-retry, truncated (all pass) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… injectable http client Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds Entry, WorkloadRef, Commit read-model types and the Reader/Writer interfaces for the deploystatus bounded context. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements Store satisfying both Reader and Writer interfaces with per-(portalRef,namespace) scoping, flat aggregation, and subscriber broadcast (close-on-write pattern). All tests pass under -race. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds handlers.go (ChainData, WorkItem, ComputedEntry), compute.go (ComputeLag merge-filter + 50-cap, StateFor ok/behind) with tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ResolveOCISourceHandler parses each WorkItem's OCI source label, matches the host against configured forges, and marks unmatched / unparseable items as unresolved ComputedEntries (removed from Due). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ForgeCompareHandler calls DefaultBranch then Compare per Due item using the real RepoRef-based forge.Client API. Per-entry errors set state=error and never fail the chain. ResolveDeployRunHandler enriches ok/behind entries with a best-effort LatestWorkflowRun URL, swallowing errors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SelectDueHandler builds WorkItems from Spec.Services entries older than refreshInterval (default 5m). UpdateReadStoreHandler maps ComputedEntries into domain entries (forge.SHA->dom.Sha) and calls ReplaceForNamespace. UpdateStatusHandler stamps LastCheckedAt/state/aheadBy onto each processed Spec.Services entry and updates ServiceCount/ObservedGeneration in Status. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…oncile loop) Per K8s convention, computed lag/LastCheckedAt now live in Status.Services and are written only via Status().Update; Spec.Services holds controller-managed input only. SelectDue reads LastCheckedAt from Status.Services. Prevents generation bumps that would re-trigger the controller. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a ProjectDeployStatusHandler to the imageInventory chain that upserts one per-(portal, namespace) DeployStatus CR, populating Spec.Services from observed workload-images that carry an org.opencontainers.image.source label. DeployedRef comes from the org.opencontainers.image.revision label, falling back to the image tag when it is a semver tag. Gated on the portal deployStatus feature; wired with the existing CraneClient as the OCI label reader. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Add DeployStatus bool to domain PortalFeatures and PortalFeaturesStatus API type - Add CheckDeployStatus FeatureChecker to feature_gate.go - Implement DeployStatusService (mirrors ImageService): feature gate, List with Search/StateFilter, full entry+commit mapping with timestamppb - Register handler on the Connect mux in webserver.Config (guarded by nil reader) - Wire deployStatusStore reader into webCfg in cmd/main.go - Propagate DeployStatus feature flag through portal controller (local + remote AND logic) - 5 tests covering full mapping, search filter, state filter, feature-disabled path, and portal-default-to-main Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Exposes the deploy-status read store via a Streamable HTTP MCP server mirroring the existing image_server.go pattern: same hooks, same withToolMetrics wrapper, same Handler() shape. Tool: list_deploy_status - params: portal (default "main"), state filter (ok|behind|unresolved|error) - returns JSON array of deploy-status entries with workload ref, image, source repo, deployed ref, ahead-by count, pending commits, and state Mounted at /mcp/deploystatus in cmd/main.go wired to the shared deployStatusStore reader (same instance used by gRPC). Tests: 8 Ginkgo specs in mcp_test.go cover creation, list all, default portal, state filtering (behind/ok), invalid state error, pending commits in JSON output, and empty store. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implement DeployStatusCustomValidator enforcing controller-managed invariants: spec.portalRef non-empty, spec.namespace non-empty, every spec.services[i].key non-empty, and spec.services[i].state (when set) must be one of ok|behind|unresolved|error. Add plain stdlib tests mirroring imageregistry_webhook_test.go style. Regenerate config/webhook/manifests.yaml via make manifests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Phase 9 wires two data-federation pieces for deploy-status, mirroring the imageInventory remote pattern: - SyncRemoteDeployStatusHandler creates/updates a shadow DeployStatus CR (remote-<portal>, IsRemote=true) per remote portal, gated on the deployStatus feature; registered in the portal controller chain next to the image-inventory sync handler. CleanupDisabledFeaturesHandler deletes the shadow CR when the feature is disabled or the portal is no longer remote. - The DeployStatus controller IsRemote branch now fetches entries from the remote portal's DeployStatusService via the shared remoteclient cache (TLS from spec.remote.tls), maps proto entries to domain entries, and projects them into the local readstore under a single sentinel-namespace bucket. Remote fetch errors are best-effort: surfaced on status.lastError and requeued, never crashing the reconcile. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Add deploy_status field 7 to PortalFeatures proto and regenerate Go + TS bindings - Create features/deploystatus feature with domain types, infrastructure API client, TanStack Query hook, badge utilities, card and list UI components - Add DeployStatusPage page component mirroring ImagesPage structure - Register :portalName/deploystatus route in router.tsx (lazy-loaded) - Add Deploy Status nav item to PortalSidebar gated on features.deployStatus === true - Extend MSW test helpers (connectJson, handlers) with deploy status fixtures - Add Vitest hook test covering behind (with pending commits) and ok states Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wire the deploy-status feature into the generated Helm chart:
- Fix two malformed kustomization entries (config/crd, config/rbac) left by
the Phase 0 scaffold that broke `make helm` (kustomize parse errors). This
unblocks regeneration of the deploy-status CRD, RBAC, validating webhook
and the portal deployStatus feature flag into helm/templates.
- Operator config: add a documented `deployStatus:` block (PAT and GitHub App
variants) to config/manager/configmap.yaml so helmify renders it into
config.configYaml (survives `make helm`).
- Secret env injection: add a `deployStatus.secretEnv` values block and a
hack/helmify patch that renders each `envName -> {name,key}` entry as a
`valueFrom.secretKeyRef` on the controller-manager container, so forge
tokens (auth.tokenEnv / app.privateKeyEnv) can be sourced from a Secret.
`make helm` is idempotent; `helm template` renders the ConfigMap deployStatus
config and the secretKeyRef env on the manager container.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fix all 22 remaining lint issues so make lint exits 0 with 0 issues: - gofmt: reformat 3 files with misaligned whitespace - modernize: use for-range-N and strings.Cut patterns - lll: wrap 133-char line in cmd/main.go - goconst: extract repeated string literals into named constants across config, deploystatus chain, portal chain, source, forge, forgeclient/github, grpc, and mcp packages Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…andler, ResolveDeployRunHandler Cover the three untested chain handlers with focused fake-client unit tests: - update_status_test.go (4 tests): verifies Status.Services is written with correct state/aheadBy/lastCheckedAt; Spec.Services is never mutated; prior status is preserved for services not computed this cycle; no-op on empty Computed; only client.Status().Update is called (re-fetched from fake store to assert persisted state vs in-memory mutation). - update_readstore_test.go (3 tests): verifies ComputedEntry → dom.Entry field mapping including forge.Commit.SHA → dom.Commit.Sha; PendingTruncated; ReplaceForNamespace is called with correct (portalRef, namespace) args; nil commits produce nil PendingCommits. - resolve_deploy_run_test.go (4 tests): verifies ok/behind entries are enriched with DeployRunURL; error/unresolved entries skip the forge client call entirely; LatestWorkflowRun errors are swallowed (handle returns nil); mixed-state scenario exercises all four state values. Item #4 (IsRemote happy-path): remoteclient.Client is a concrete struct with no injectable interface, making fake injection impossible without a live HTTP server or production-code seam. The mapRemoteEntry proto→domain mapping is already fully covered in remote_map_test.go (2 tests), satisfying the fallback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- prevent readstore wipe on no-op reconciles: add GenerationChangedPredicate to the DeployStatus watch and project the readstore from the full, just-updated Status.Services (UpdateStatus now runs before UpdateReadStore) - prune stale DeployStatus CRs whose namespace no longer carries a first-party image (mirrors sync_registry_crs orphan deletion) - carry forward existing entries and log at Error level when an OCI label read fails transiently, instead of silently dropping the workload - url-escape owner/repo/base/head/workflow/branch path segments in the GitHub forge client to prevent path injection - warn on empty PAT env at startup and log deploy-run resolution failures at V(1) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
deployStatusportal feature: per running service, it shows the commits on the service's default branch not yet deployed (deployment lag) plus a best-effort link to the deploy-gate workflow run — computed from cluster truth (imageInventory observations) joined to the git source via OCI image labels.Unlike the existing push-based
releaselog, this is pull-based current state, and unlike a standalone tag-proxy generator it derives the deployed version from what's actually running and discovers the service list from the cluster (no hardcoded repo list).Design + plan:
docs/superpowers/specs/2026-06-18-deploystatus-feature-design.md,docs/superpowers/plans/2026-06-18-deploystatus-feature.md.What's included (end-to-end)
DeployStatus(controller-managed).Spec.Services= input (workload, image, sourceRepo, deployedRef);Status.Services= observed lag (state, aheadBy, pendingCommits, lastCheckedAt) — written only viaStatus().Updateto avoid a reconcile loop.deployStatusportal feature flag (defaults true, opt-out) + nil-safe accessor; emitted in the portal proto so the UI gates correctly.DeployStatusConfig/ForgeConfigwith a forge endpoint list matched by OCI source-URL host; two auth modes per forge — fine-grained PAT (auth.tokenEnv) or GitHub App (auth.app: appID/installationID/privateKeyEnv). Secret values are read from named env vars viaos.Getenv, never stored in config/CR.source/revision, semver-tag fallback) and upsertsDeployStatusCRs for first-party (source-labeled) images only; prunes stale namespaces; carries forward entries on transient registry errors.DeployStatusService+ MCP server/mcp/deploystatus, feature-gated.remote-<portal>CR (IsRemote) mirroring imageInventory; remote entries projected into a sentinel read-store bucket.deployStatusoperator config block +secretEnv→secretKeyRefwiring, survivingmake helmregeneration.Review & verification
A review panel (security / execution-trace / silent-failure) ran on the full diff. Findings handled:
GenerationChangedPredicate+ unguarded emptyReplaceForNamespace). Fixed: predicate added; read store now projects the completeStatus.Servicesset (chain reordered so status update precedes the store projection).sync_registry_crs.go).Known limitations (documented in the spec, deferred): remote-fetch staleness and feature-disabled state are not yet surfaced on the read API; minor pacing edge; GitLab subgroup parsing (GitHub-only in v1).
Gates:
make lint= 0 ·make test= 0 fail (full envtest suite) ·go build ./...clean ·make helmidempotent · webtsc+ Vitest green. New-package coverage: read store 97.6%, forge client 83.9%, chain handlers 93.1%, webhook 82.6%.🤖 Generated with Claude Code