|
| 1 | +# Epic 4 Retrospective — Auth Engine Integration Tests |
| 2 | + |
| 3 | +**Date:** 2026-04-23 |
| 4 | +**Facilitator:** Bob (Scrum Master) |
| 5 | +**Participants:** Raffa (Project Lead), Alice (Product Owner), Charlie (Senior Dev), Dana (QA Engineer), Amelia (Developer Agent) |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Epic Summary |
| 10 | + |
| 11 | +| Metric | Value | |
| 12 | +|--------|-------| |
| 13 | +| Epic | 4: Auth Engine Integration Tests | |
| 14 | +| Stories | 3 of 3 completed (100%) | |
| 15 | +| Duration | ~1 day (April 23, 2026) | |
| 16 | +| Scope | Integration tests for 6 auth engine types across 3 auth methods | |
| 17 | +| Debug failures | 7 total (2 + 2 + 3) — all environmental or external service quirks, zero operator code bugs | |
| 18 | +| Code review findings | 4 patches across 3 stories (0 + 3 + 1) | |
| 19 | +| New infrastructure | OpenLDAP wired into `make integration` (`deploy-ldap`), Keycloak 26.2 deployed to Kind (`deploy-keycloak`) | |
| 20 | +| Production code fixes | 1 (`ldapauthengineconfig_types.go` — `omitempty` on TLS version fields, later identified as incorrect workaround) | |
| 21 | +| New test patterns | Auth engine dependency chain (mount → config → role/group), non-deletable config persistence verification | |
| 22 | +| Technical debt documented | 1 new (Story 4.2 `omitempty` workaround violates CRD rules — tracked in Epic 7.5 Story 7.5.1) | |
| 23 | +| Regressions | 0 | |
| 24 | +| Coverage delta | 38.8% → 42.0% | |
| 25 | + |
| 26 | +### AI Models Used |
| 27 | + |
| 28 | +| Story | Model | |
| 29 | +|-------|-------| |
| 30 | +| 4.1 — KubernetesAuthEngineConfig/Role | Claude Opus 4.6 | |
| 31 | +| 4.2 — LDAPAuthEngineConfig/Group | Claude Opus 4.6 | |
| 32 | +| 4.3 — JWTOIDCAuthEngineConfig/Role | Claude Opus 4 | |
| 33 | + |
| 34 | +--- |
| 35 | + |
| 36 | +## Epic 3 Retrospective Follow-Through |
| 37 | + |
| 38 | +| Action Item | Status | |
| 39 | +|-------------|--------| |
| 40 | +| Fix `CreateOrUpdateConfig` dual bug in PKI engine (carried from Epic 2) | ❌ Not addressed — downgraded from "must fix before Epic 5" to "fix in Epic 7" after confirming no Epic 5 types use VaultPKIEngineResource | |
| 41 | +| AC#4 extra-field handling → Story 7-4 | ⏳ In backlog — 6 more auth engine types confirmed with this pattern in Epic 4 | |
| 42 | +| Policy `ConditionsAware` copy-paste bug | ✅ Fixed during this retrospective (also found a second bug in `rabbitmqsecretenginerole_types.go`) | |
| 43 | +| Add checked type assertion rule to project-context.md | ✅ Applied — all 3 stories used checked type assertions consistently | |
| 44 | +| Continue using Opus 4.6 exclusively | ⚠️ Mostly — Stories 4.1/4.2 used Opus 4.6, Story 4.3 used Claude Opus 4 | |
| 45 | +| Story specs must flag infrastructure scope | ✅ Applied — all 3 stories explicitly classified infrastructure tier | |
| 46 | + |
| 47 | +Completed 2/6, in progress 1/6, not addressed 3/6. Key correction: PKI bug was incorrectly flagged as blocking Epic 5 in previous retros. The `VaultPKIEngineResource` reconciler is used exclusively by `PKISecretEngineConfig` — none of Epic 5's types (Database, RabbitMQ, GitHub, Azure, Quay, Kubernetes) use it. |
| 48 | + |
| 49 | +--- |
| 50 | + |
| 51 | +## Successes |
| 52 | + |
| 53 | +1. **Real infrastructure over mocks.** All 3 stories deployed real services in Kind — Kubernetes API (already available), OpenLDAP (manifests wired into `make integration`), and Keycloak 26.2 (full OIDC provider with realm import). Vault actually validated these connections end-to-end. |
| 54 | + |
| 55 | +2. **Infrastructure template compounding.** The `deploy-*` Makefile target pattern (create namespace → apply manifests → wait for ready) was established in Story 4.2 and reused identically in Story 4.3. This template is directly reusable for RabbitMQ in Epic 5. |
| 56 | + |
| 57 | +3. **Story ordering by infrastructure complexity worked well.** 4.1 (no new infra) → 4.2 (LDAP — medium) → 4.3 (Keycloak — high). Each story built on patterns from the previous one. |
| 58 | + |
| 59 | +4. **Zero operator code bugs in debug failures.** All 7 debug failures were environmental (port conflicts, stale CRs, Keycloak filename conventions, health port, kubeconfig context) or Vault API surprises (`tls_min_version` empty string rejection, `policies` returned as `[]interface{}`). The operator code itself was solid. |
| 60 | + |
| 61 | +5. **Coverage grew 3.2 percentage points** (38.8% → 42.0%) — the largest single-epic coverage gain since Epic 2. |
| 62 | + |
| 63 | +--- |
| 64 | + |
| 65 | +## Challenges |
| 66 | + |
| 67 | +1. **Non-deletable config persistence verification missed twice.** Both Stories 4.2 and 4.3 were caught in code review for the same gap: when deleting a `IsDeletable=false` type, the test verified K8s deletion but didn't prove the Vault config persisted. Now codified as a project-context rule. |
| 68 | + |
| 69 | +2. **Story 4.2 `omitempty` workaround violated CRD rules.** Vault 1.19.0 rejected empty `tls_min_version`/`tls_max_version` strings. The fix (adding `omitempty` to JSON tags) was a workaround that violates Rule 2 of the CRD Field Default & Validation Rules — non-zero defaults (`tls12`) should NOT have `omitempty`. The correct fix should be in the API interaction layer. Revert tracked in Epic 7.5 Story 7.5.1. |
| 70 | + |
| 71 | +3. **Keycloak external service quirks caused most debug time.** Story 4.3 had 3 debug failures, all Keycloak-specific: realm import filename convention (`test-realm-realm.json` not `test-realm.json`), health endpoint on management port 9000 not app port 8080, and wrong kubeconfig context. These are one-time learning costs that won't recur. |
| 72 | + |
| 73 | +--- |
| 74 | + |
| 75 | +## Key Insights |
| 76 | + |
| 77 | +1. **Real infrastructure beats mocks for integration test confidence.** When Vault writes OIDC config and actually fetches Keycloak's discovery endpoint, or LDAP config and actually binds to OpenLDAP, that's proof the operator works — not just that it generates correct JSON. |
| 78 | + |
| 79 | +2. **Never change CRD annotations as a workaround for Vault API issues.** When Vault rejects a field value, fix the operator's API interaction (`toMap()`, `IsEquivalentToDesiredState`, or webhook `Default()`), not the CRD schema annotations. CRD annotations define the Kubernetes API contract; Vault compatibility is a separate concern. |
| 80 | + |
| 81 | +3. **PKI bug urgency was overstated for two retros.** The "must fix before Epic 5" claim from Epic 2 was never re-validated against Epic 5's actual scope. Lesson: always check whether a carried action item is truly blocking before escalating its priority. |
| 82 | + |
| 83 | +--- |
| 84 | + |
| 85 | +## Action Items |
| 86 | + |
| 87 | +### Process Improvements |
| 88 | + |
| 89 | +1. **Non-deletable config persistence verification rule added to project-context.md** |
| 90 | + - Status: ✅ Done during retrospective |
| 91 | + - Description: When `IsDeletable=false`, delete tests must verify Vault config persists after CR deletion |
| 92 | + |
| 93 | +2. **CRD annotation workaround rule** (to be added) |
| 94 | + - Owner: Bob (Scrum Master) |
| 95 | + - Description: Never change CRD annotations (`omitempty`, `kubebuilder:default`) as a workaround for Vault API compatibility. Fix the API interaction layer instead. |
| 96 | + - Success criteria: Rule added to project-context.md Critical Don't-Miss Rules section |
| 97 | + |
| 98 | +### Technical Debt |
| 99 | + |
| 100 | +3. **Fix `CreateOrUpdateConfig` dual bug in PKI engine** (CARRIED from Epic 2 — priority downgraded) |
| 101 | + - Owner: Dev Agent (Epic 7) |
| 102 | + - Priority: Medium — confirmed NOT blocking Epic 5 |
| 103 | + - Description: Two bugs in `vautlpkiengineobject.go:105-118`: (a) write path hardcoded to CRL, (b) equivalence check uses wrong payload |
| 104 | + |
| 105 | +4. **AC#4 extra-field handling → Story 7-4** (CARRIED from Epic 1) |
| 106 | + - Owner: Story 7-4 (Epic 7) |
| 107 | + - Priority: Medium — scope growing, all auth engine types confirmed with this pattern |
| 108 | + - Status: Properly deferred |
| 109 | + |
| 110 | +5. **Revert Story 4.2 `omitempty` workaround on `TLSMinVersion`/`TLSMaxVersion`** |
| 111 | + - Owner: Epic 7.5, Story 7.5.1 (Task 1.2) |
| 112 | + - Priority: Low — functional workaround, proper fix tracked |
| 113 | + - Status: Already planned |
| 114 | + |
| 115 | +### Items Resolved During Retrospective |
| 116 | + |
| 117 | +6. **Policy and RabbitMQSecretEngineRole `ConditionsAware` copy-paste bugs — FIXED** |
| 118 | + - `policy_types.go:37`: `&PKISecretEngineRole{}` → `&Policy{}` |
| 119 | + - `rabbitmqsecretenginerole_types.go:145`: `&RabbitMQSecretEngineConfig{}` → `&RabbitMQSecretEngineRole{}` |
| 120 | + |
| 121 | +7. **Epic 7.5 (CRD Field Annotation Refactor) added to sprint-status.yaml** |
| 122 | + - 7 stories, positioned before Epic D1 |
| 123 | + |
| 124 | +### Team Agreements |
| 125 | + |
| 126 | +- Continue using Opus-class models for integration test stories — validated across 12 consecutive stories (Epics 2–4) |
| 127 | +- Continue the code review process — catches real issues consistently |
| 128 | +- Infrastructure "install in Kind" template is now standard — reuse for RabbitMQ in Epic 5 |
| 129 | +- Story ordering by ascending infrastructure complexity (proven in Epics 3 and 4) |
| 130 | + |
| 131 | +--- |
| 132 | + |
| 133 | +## Epic 5 Preparation |
| 134 | + |
| 135 | +### Dependencies on Epic 4 |
| 136 | + |
| 137 | +- VaultResource integration test pattern (established Epics 2–4) — direct reuse |
| 138 | +- `deploy-*` Makefile target template — reuse for RabbitMQ |
| 139 | +- Auth engine dependency chain pattern (mount → config → role) — analogous for secret engines |
| 140 | +- Non-deletable config persistence rule — ready for application |
| 141 | + |
| 142 | +### Infrastructure Requirements by Story |
| 143 | + |
| 144 | +| Story | External Dependency | Classification | Infrastructure Scope | |
| 145 | +|-------|-------------------|----------------|---------------------| |
| 146 | +| 5.1 — DatabaseSecretEngineConfig/Role | PostgreSQL | Already deployed (`deploy-postgresql`) | Low — no new infra | |
| 147 | +| 5.2 — RabbitMQ secret engine types | RabbitMQ | Install in Kind | Medium — new `deploy-rabbitmq` target | |
| 148 | +| 5.3 — Remaining secret engine types | Mixed | See below | Mixed | |
| 149 | + |
| 150 | +### Story 5.3 Type Classification |
| 151 | + |
| 152 | +| Type | Dependency | Classification | Rationale | |
| 153 | +|------|-----------|---------------|-----------| |
| 154 | +| KubernetesSecretEngineConfig/Role | Kubernetes API | Tier 1: Already available | Kind cluster IS the Kubernetes instance | |
| 155 | +| GitHubSecretEngineConfig/Role | GitHub App + custom Vault plugin | Tier 3: Skip | Cloud service, custom plugin already registered but no GitHub App | |
| 156 | +| AzureSecretEngineConfig/Role | Azure cloud | Tier 3: Skip | Cloud provider | |
| 157 | +| QuaySecretEngineConfig/Role/StaticRole | Quay registry + custom Vault plugin | Tier 3: Skip | Heavy deployment (PostgreSQL + Redis + Quay + custom Vault plugin not installed) | |
| 158 | + |
| 159 | +### Readiness Assessment |
| 160 | + |
| 161 | +- Testing & Quality: All tests passing, coverage at 42.0% |
| 162 | +- Technical Health: PKI bug confirmed non-blocking, copy-paste bugs fixed |
| 163 | +- Infrastructure: PostgreSQL already deployed, RabbitMQ needs new target |
| 164 | +- Unresolved Blockers: None |
| 165 | + |
| 166 | +### Verdict |
| 167 | + |
| 168 | +**Ready to proceed with Epic 5.** Story ordering: 5.1 (PostgreSQL — already there) → 5.2 (RabbitMQ — medium infra) → 5.3 (Kubernetes secret engine from Kind + skip cloud/Quay types). |
| 169 | + |
| 170 | +--- |
| 171 | + |
| 172 | +## Team Performance |
| 173 | + |
| 174 | +Epic 4 delivered 3 stories covering integration tests for 6 auth engine types (KubernetesAuthEngineConfig, KubernetesAuthEngineRole, LDAPAuthEngineConfig, LDAPAuthEngineGroup, JWTOIDCAuthEngineConfig, JWTOIDCAuthEngineRole) with real infrastructure deployments (OpenLDAP, Keycloak), zero operator code bugs, zero regressions, and 3.2 percentage points of coverage growth. The retrospective fixed 2 copy-paste bugs, added a testing rule to project-context.md, tracked Epic 7.5 in sprint status, corrected the PKI bug priority, and identified the Story 4.2 CRD annotation workaround as a mistake to revert. The team is well-positioned for Epic 5. |
0 commit comments