Skip to content

Commit e3379a2

Browse files
authored
Pattern A: Flux HelmRelease GitOps migration (closes GH013) (#1089)
Closes #1088. WHY --- The deploy-azd.yml `commit-rendered-manifests` job pushed bot-generated manifests directly to refs/heads/main. The `main-governance-baseline` ruleset (id 14638366) rejects this push with GH013, blocking every deploy after PRs merge. Bypass-actor and orphan-branch alternatives were rejected as anti-patterns. WHAT ---- Roll out Pattern A (Flux HelmRelease + in-cluster Helm rendering) to all 27 AKS services. The helm-controller renders the chart in-cluster on every reconciliation, so no rendered YAML lives in git and no workflow ever pushes back to main. CHANGES ------- - 24 new HelmRelease YAMLs in .kubernetes/releases/agents/ + 1 in .kubernetes/releases/crud/. Generator preserves every env var, resource limit, AGC route, command/args override, and UAMI binding from the previously deployed cluster state. Skips the 3 already- migrated services (ecommerce-catalog-search, truth-enrichment, truth-hitl). - .kubernetes/releases/agents/kustomization.yaml updated to list all 26 agent HelmReleases. - .kubernetes/releases/crud/kustomization.yaml created listing the crud-service HelmRelease. - Bicep fluxConfig switched from `.kubernetes/rendered/{crud,agents}` to `.kubernetes/releases/{crud,agents}`. CRUD reconciles first; agents depend on it. - deploy-azd.yml: removed `commit-rendered-manifests` job and rewired `wait-flux-reconciliation` to depend on deploy-crud / deploy-agents directly. Renamed misleading "for Flux commit" artifact step labels to "for verification" since these artifacts now feed only the prompt-verification flow. - ADR-017 amended with Phase 2 completion notes and Phase 2b plan (Flux ImageUpdateAutomation with PR bridge for image tag updates). VALIDATION ---------- - All 27 HelmReleases pass `helm template` against .kubernetes/chart (validate_helmreleases.py) - `kubectl kustomize .kubernetes/releases/agents` resolves 26 HelmReleases; `.kubernetes/releases/crud` resolves 1 - scripts/ci/validate_k8s_name_length.py: passes - Workflow YAML parses cleanly (yaml.safe_load) KNOWN GAP (Phase 2b, separate epic) ------------------------------------ After merge, `azd deploy` continues to kubectl-apply new image tags. Within 5 min Flux can revert to the older tag committed in the HelmRelease YAML. HelmRelease tags here reflect the currently-deployed images at PR time, so the first reconciliation after Bicep redeploy is a no-op. Phase 2b closes the loop with Flux ImageRepository/ImagePolicy/ImageUpdateAutomation and a PR-bridge for protected branches.
1 parent 811fdbe commit e3379a2

29 files changed

Lines changed: 2649 additions & 87 deletions

.github/workflows/deploy-azd.yml

Lines changed: 12 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -2057,7 +2057,7 @@ jobs:
20572057
POSTGRES_USER: ${{ needs.provision.outputs.POSTGRES_AUTH_MODE == 'password' && needs.provision.outputs.POSTGRES_ADMIN_USER || needs.provision.outputs.POSTGRES_USER }}
20582058
POSTGRES_DATABASE: ${{ needs.provision.outputs.POSTGRES_DATABASE }}
20592059

2060-
- name: Upload rendered CRUD manifest for Flux commit
2060+
- name: Upload rendered CRUD manifest for verification
20612061
uses: actions/upload-artifact@v4
20622062
with:
20632063
name: rendered-manifest-crud-service
@@ -2710,91 +2710,29 @@ jobs:
27102710
POSTGRES_USER: ${{ needs.provision.outputs.POSTGRES_AUTH_MODE == 'password' && needs.provision.outputs.POSTGRES_ADMIN_USER || needs.provision.outputs.POSTGRES_USER }}
27112711
POSTGRES_DATABASE: ${{ needs.provision.outputs.POSTGRES_DATABASE }}
27122712

2713-
- name: Upload rendered agent manifest for Flux commit
2713+
- name: Upload rendered agent manifest for verification
27142714
if: ${{ success() }}
27152715
uses: actions/upload-artifact@v4
27162716
with:
27172717
name: rendered-manifest-${{ matrix.service }}
27182718
path: .kubernetes/rendered/${{ matrix.service }}/all.yaml
27192719
retention-days: 1
27202720

2721-
commit-rendered-manifests:
2721+
# NOTE (ADR-017 amendment, Pattern A): the previous `commit-rendered-manifests`
2722+
# job was deleted because it pushed bot-generated manifests directly to
2723+
# `refs/heads/main`, which is rejected by the `main-governance-baseline`
2724+
# ruleset (GH013). Flux now reconciles HelmRelease CRDs from
2725+
# `.kubernetes/releases/{crud,agents}` and the helm-controller renders the
2726+
# chart in-cluster on every reconciliation, so no workflow push to main is
2727+
# ever required. Image tag updates remain a follow-up (planned: Flux
2728+
# ImageUpdateAutomation with PR bridge).
2729+
2730+
wait-flux-reconciliation:
27222731
runs-on: ubuntu-latest
27232732
if: ${{ always() && !cancelled() && !inputs.uiOnly && (needs.deploy-crud.result == 'success' || needs.deploy-agents.result == 'success') }}
27242733
needs:
2725-
- detect-changes
2726-
- provision
27272734
- deploy-crud
27282735
- deploy-agents
2729-
- build-aks-images
2730-
permissions:
2731-
contents: write
2732-
steps:
2733-
- name: Resolve publication branch
2734-
id: publication-branch
2735-
shell: bash
2736-
run: |
2737-
set -euo pipefail
2738-
2739-
SOURCE_REF="${DEPLOY_SOURCE_REF}"
2740-
2741-
if [[ "$SOURCE_REF" != refs/heads/* ]]; then
2742-
echo "::error::Rendered manifest publication requires a pushable branch ref, but DEPLOY_SOURCE_REF='$SOURCE_REF'."
2743-
exit 1
2744-
fi
2745-
2746-
BRANCH_NAME="${SOURCE_REF#refs/heads/}"
2747-
if [ -z "$BRANCH_NAME" ]; then
2748-
echo "::error::Rendered manifest publication could not resolve a branch name from DEPLOY_SOURCE_REF='$SOURCE_REF'."
2749-
exit 1
2750-
fi
2751-
2752-
echo "branch_ref=$SOURCE_REF" >> "$GITHUB_OUTPUT"
2753-
echo "branch_name=$BRANCH_NAME" >> "$GITHUB_OUTPUT"
2754-
2755-
- uses: actions/checkout@v4
2756-
with:
2757-
ref: ${{ steps.publication-branch.outputs.branch_ref }}
2758-
fetch-depth: 1
2759-
token: ${{ github.token }}
2760-
2761-
- name: Download rendered manifests
2762-
uses: actions/download-artifact@v4
2763-
with:
2764-
pattern: rendered-manifest-*
2765-
path: ${{ runner.temp }}/rendered-artifacts
2766-
2767-
- name: Copy rendered manifests into repo
2768-
shell: bash
2769-
run: |
2770-
set -euo pipefail
2771-
for artifact_dir in "${RUNNER_TEMP}"/rendered-artifacts/rendered-manifest-*; do
2772-
[ -d "$artifact_dir" ] || continue
2773-
svc_name=$(basename "$artifact_dir" | sed 's/^rendered-manifest-//')
2774-
mkdir -p ".kubernetes/rendered/${svc_name}"
2775-
cp "${artifact_dir}/all.yaml" ".kubernetes/rendered/${svc_name}/all.yaml"
2776-
echo "Copied rendered manifest for ${svc_name}"
2777-
done
2778-
2779-
- name: Commit rendered manifests for Flux reconciliation
2780-
shell: bash
2781-
run: |
2782-
set -euo pipefail
2783-
git config user.name "github-actions[bot]"
2784-
git config user.email "github-actions[bot]@users.noreply.github.com"
2785-
git add .kubernetes/rendered/ || true
2786-
if git diff --cached --quiet; then
2787-
echo "No rendered manifest changes to commit."
2788-
exit 0
2789-
fi
2790-
git commit -m "deploy: update rendered manifests [skip ci]"
2791-
git push origin "HEAD:refs/heads/${{ steps.publication-branch.outputs.branch_name }}"
2792-
2793-
wait-flux-reconciliation:
2794-
runs-on: ubuntu-latest
2795-
if: ${{ !inputs.uiOnly && (needs.commit-rendered-manifests.result == 'success') }}
2796-
needs:
2797-
- commit-rendered-manifests
27982736
- detect-changes
27992737
environment: ${{ inputs.githubEnvironment }}
28002738
env:
@@ -4203,7 +4141,6 @@ jobs:
42034141
- detect-changes
42044142
- deploy-crud
42054143
- deploy-agents
4206-
- commit-rendered-manifests
42074144
- wait-flux-reconciliation
42084145
- sync-apim
42094146
- sync-apic
@@ -4299,7 +4236,6 @@ jobs:
42994236
runs-on: ubuntu-latest
43004237
if: ${{ always() && !cancelled() && !inputs.uiOnly }}
43014238
needs:
4302-
- commit-rendered-manifests
43034239
- wait-flux-reconciliation
43044240
- validate-agc-readiness
43054241
- sync-apim

.infra/modules/shared-infrastructure/shared-infrastructure.bicep

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1218,7 +1218,9 @@ resource fluxExtension 'Microsoft.KubernetesConfiguration/extensions@2023-05-01'
12181218
}
12191219
}
12201220

1221-
// Flux GitOps configuration — reconciles rendered manifests from the repository.
1221+
// Flux GitOps configuration — reconciles HelmRelease CRDs from the repository.
1222+
// Pattern A (ADR-017 amended): the helm-controller renders the chart in-cluster on
1223+
// every reconciliation, so no workflow ever pushes rendered YAML back to main.
12221224
resource fluxConfig 'Microsoft.KubernetesConfiguration/fluxConfigurations@2024-04-01-preview' = {
12231225
name: 'holiday-peak-gitops'
12241226
scope: aksClusterResource
@@ -1239,15 +1241,15 @@ resource fluxConfig 'Microsoft.KubernetesConfiguration/fluxConfigurations@2024-0
12391241
}
12401242
kustomizations: {
12411243
'holiday-peak-crud': {
1242-
path: '.kubernetes/rendered/crud'
1244+
path: '.kubernetes/releases/crud'
12431245
syncIntervalInSeconds: 300
12441246
timeoutInSeconds: 600
12451247
prune: true
12461248
force: false
12471249
dependsOn: []
12481250
}
12491251
'holiday-peak-agents': {
1250-
path: '.kubernetes/rendered/agents'
1252+
path: '.kubernetes/releases/agents'
12511253
syncIntervalInSeconds: 300
12521254
timeoutInSeconds: 600
12531255
prune: true
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
apiVersion: helm.toolkit.fluxcd.io/v2
2+
kind: HelmRelease
3+
metadata:
4+
name: crm-campaign-intelligence
5+
namespace: flux-system
6+
spec:
7+
targetNamespace: holiday-peak-agents
8+
releaseName: crm-campaign-intelligence
9+
interval: 5m
10+
timeout: 10m
11+
chart:
12+
spec:
13+
chart: .kubernetes/chart
14+
sourceRef:
15+
kind: GitRepository
16+
name: holiday-peak-gitops
17+
namespace: flux-system
18+
interval: 5m
19+
upgrade:
20+
remediation:
21+
retries: 3
22+
install:
23+
createNamespace: false
24+
remediation:
25+
retries: 3
26+
values:
27+
serviceName: crm-campaign-intelligence
28+
serviceAccount:
29+
create: true
30+
clientId: 036f4fd2-9093-4948-b0dd-beb5c87aa6dc
31+
image:
32+
repository: holidaypeakhub405devacr.azurecr.io/holiday-peak-hub/crm-campaign-intelligence-dev
33+
tag: azd-deploy-1775344528
34+
replicaCount: 2
35+
resources:
36+
limits:
37+
cpu: 500m
38+
memory: 512Mi
39+
requests:
40+
cpu: 250m
41+
memory: 256Mi
42+
nodeSelector:
43+
agentpool: agents
44+
tolerations:
45+
- effect: NoSchedule
46+
key: workload
47+
operator: Equal
48+
value: agents
49+
availability:
50+
strategy:
51+
type: RollingUpdate
52+
rollingUpdate:
53+
maxUnavailable: 25%
54+
maxSurge: 25%
55+
pdb:
56+
enabled: false
57+
keda:
58+
enabled: false
59+
agc:
60+
enabled: true
61+
gatewayClassName: azure-alb-external
62+
hostnames:
63+
- esbcc8bcfyazbbdg.fz03.alb.azure.com
64+
parentRefs:
65+
- name: holiday-peak-agc
66+
namespace: holiday-peak-crud
67+
paths:
68+
- path: /crm-campaign-intelligence
69+
pathType: PathPrefix
70+
rewriteTo: /
71+
env:
72+
AI_SEARCH_AUTH_MODE: managed_identity
73+
AI_SEARCH_ENDPOINT: https://holidaypeakhub405devsearch.search.windows.net
74+
AI_SEARCH_INDEX: catalog-products
75+
AI_SEARCH_INDEXER_NAME: search-enriched-products-indexer
76+
AI_SEARCH_VECTOR_INDEX: product_search_index
77+
APPLICATIONINSIGHTS_CONNECTION_STRING: InstrumentationKey=d2fb45bc-e648-4da2-9345-d278636aede5;IngestionEndpoint=https://centralus-2.in.applicationinsights.azure.com/;LiveEndpoint=https://centralus.livediagnostics.monitor.azure.com/;ApplicationId=d8eb64c4-956d-46ab-a02e-d481deadaa0b
78+
AZURE_CLIENT_ID: 036f4fd2-9093-4948-b0dd-beb5c87aa6dc
79+
AZURE_TENANT_ID: 16b3c013-d300-468d-ac64-7eda0820b6d3
80+
BLOB_ACCOUNT_URL: https://holidaypeakhub405devstor.blob.core.windows.net
81+
BLOB_CONTAINER: agent-memory
82+
CATALOG_SEARCH_REQUIRE_AI_SEARCH: 'true'
83+
COSMOS_ACCOUNT_URI: https://holidaypeakhub405-dev-cosmos.documents.azure.com:443/
84+
COSMOS_CONTAINER: agent-memory
85+
COSMOS_DATABASE: holiday-peak-db
86+
EMBEDDING_DEPLOYMENT_NAME: text-embedding-3-large
87+
EVENT_HUB_NAMESPACE: holidaypeakhub405-dev-eventhub
88+
FOUNDRY_AGENT_NAME_FAST: crm-campaign-intelligence-fast
89+
FOUNDRY_AGENT_NAME_RICH: crm-campaign-intelligence-rich
90+
FOUNDRY_AUTO_ENSURE_ON_STARTUP: 'true'
91+
FOUNDRY_STREAM: 'true'
92+
AGENT_FOUNDRY_INVOKE_TIMEOUT_SECONDS: '60'
93+
FOUNDRY_STRICT_ENFORCEMENT: 'false'
94+
KEY_VAULT_URI: https://holidaypeakhub405-dev-kv.vault.azure.net/
95+
MODEL_DEPLOYMENT_NAME_FAST: gpt-5-nano
96+
MODEL_DEPLOYMENT_NAME_RICH: gpt-5
97+
POSTGRES_AUTH_MODE: entra
98+
POSTGRES_DATABASE: holiday_peak_crud
99+
POSTGRES_HOST: holidaypeakhub405-dev-postgres.postgres.database.azure.com
100+
POSTGRES_SSL: 'true'
101+
POSTGRES_USER: holidaypeakhub405-dev-crud-identity
102+
PROJECT_ENDPOINT: https://holidaypeakhub405devais.services.ai.azure.com/api/projects/aipholidaris
103+
PROJECT_NAME: aipholidaris
104+
REDIS_HOST: holidaypeakhub405-dev-redis.redis.cache.windows.net
105+
REDIS_PASSWORD_SECRET_NAME: redis-primary-key
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
apiVersion: helm.toolkit.fluxcd.io/v2
2+
kind: HelmRelease
3+
metadata:
4+
name: crm-profile-aggregation
5+
namespace: flux-system
6+
spec:
7+
targetNamespace: holiday-peak-agents
8+
releaseName: crm-profile-aggregation
9+
interval: 5m
10+
timeout: 10m
11+
chart:
12+
spec:
13+
chart: .kubernetes/chart
14+
sourceRef:
15+
kind: GitRepository
16+
name: holiday-peak-gitops
17+
namespace: flux-system
18+
interval: 5m
19+
upgrade:
20+
remediation:
21+
retries: 3
22+
install:
23+
createNamespace: false
24+
remediation:
25+
retries: 3
26+
values:
27+
serviceName: crm-profile-aggregation
28+
serviceAccount:
29+
create: true
30+
clientId: 036f4fd2-9093-4948-b0dd-beb5c87aa6dc
31+
image:
32+
repository: holidaypeakhub405devacr.azurecr.io/holiday-peak-hub/crm-profile-aggregation-dev
33+
tag: azd-deploy-1775344626
34+
replicaCount: 2
35+
resources:
36+
limits:
37+
cpu: 500m
38+
memory: 512Mi
39+
requests:
40+
cpu: 250m
41+
memory: 256Mi
42+
nodeSelector:
43+
agentpool: agents
44+
tolerations:
45+
- effect: NoSchedule
46+
key: workload
47+
operator: Equal
48+
value: agents
49+
availability:
50+
strategy:
51+
type: RollingUpdate
52+
rollingUpdate:
53+
maxUnavailable: 25%
54+
maxSurge: 25%
55+
pdb:
56+
enabled: false
57+
keda:
58+
enabled: false
59+
agc:
60+
enabled: true
61+
gatewayClassName: azure-alb-external
62+
hostnames:
63+
- esbcc8bcfyazbbdg.fz03.alb.azure.com
64+
parentRefs:
65+
- name: holiday-peak-agc
66+
namespace: holiday-peak-crud
67+
paths:
68+
- path: /crm-profile-aggregation
69+
pathType: PathPrefix
70+
rewriteTo: /
71+
env:
72+
AI_SEARCH_AUTH_MODE: managed_identity
73+
AI_SEARCH_ENDPOINT: https://holidaypeakhub405devsearch.search.windows.net
74+
AI_SEARCH_INDEX: catalog-products
75+
AI_SEARCH_INDEXER_NAME: search-enriched-products-indexer
76+
AI_SEARCH_VECTOR_INDEX: product_search_index
77+
APPLICATIONINSIGHTS_CONNECTION_STRING: InstrumentationKey=d2fb45bc-e648-4da2-9345-d278636aede5;IngestionEndpoint=https://centralus-2.in.applicationinsights.azure.com/;LiveEndpoint=https://centralus.livediagnostics.monitor.azure.com/;ApplicationId=d8eb64c4-956d-46ab-a02e-d481deadaa0b
78+
AZURE_CLIENT_ID: 036f4fd2-9093-4948-b0dd-beb5c87aa6dc
79+
AZURE_TENANT_ID: 16b3c013-d300-468d-ac64-7eda0820b6d3
80+
BLOB_ACCOUNT_URL: https://holidaypeakhub405devstor.blob.core.windows.net
81+
BLOB_CONTAINER: agent-memory
82+
CATALOG_SEARCH_REQUIRE_AI_SEARCH: 'true'
83+
COSMOS_ACCOUNT_URI: https://holidaypeakhub405-dev-cosmos.documents.azure.com:443/
84+
COSMOS_CONTAINER: agent-memory
85+
COSMOS_DATABASE: holiday-peak-db
86+
EMBEDDING_DEPLOYMENT_NAME: text-embedding-3-large
87+
EVENT_HUB_NAMESPACE: holidaypeakhub405-dev-eventhub
88+
FOUNDRY_AGENT_NAME_FAST: crm-profile-aggregation-fast
89+
FOUNDRY_AGENT_NAME_RICH: crm-profile-aggregation-rich
90+
FOUNDRY_AUTO_ENSURE_ON_STARTUP: 'true'
91+
FOUNDRY_STREAM: 'true'
92+
AGENT_FOUNDRY_INVOKE_TIMEOUT_SECONDS: '60'
93+
FOUNDRY_STRICT_ENFORCEMENT: 'false'
94+
KEY_VAULT_URI: https://holidaypeakhub405-dev-kv.vault.azure.net/
95+
MODEL_DEPLOYMENT_NAME_FAST: gpt-5-nano
96+
MODEL_DEPLOYMENT_NAME_RICH: gpt-5
97+
POSTGRES_AUTH_MODE: entra
98+
POSTGRES_DATABASE: holiday_peak_crud
99+
POSTGRES_HOST: holidaypeakhub405-dev-postgres.postgres.database.azure.com
100+
POSTGRES_SSL: 'true'
101+
POSTGRES_USER: holidaypeakhub405-dev-crud-identity
102+
PROJECT_ENDPOINT: https://holidaypeakhub405devais.services.ai.azure.com/api/projects/aipholidaris
103+
PROJECT_NAME: aipholidaris
104+
REDIS_HOST: holidaypeakhub405-dev-redis.redis.cache.windows.net
105+
REDIS_PASSWORD_SECRET_NAME: redis-primary-key

0 commit comments

Comments
 (0)