Skip to content

Commit 39be499

Browse files
fix(netpol): leg tijdelijke allow-all voor rig-prd-operations vast in git
Incident 2026-06-10: de eerste echte toepassing van PR #64 (scoped allow-argo) onthulde dat geen enkele andere workload in rig-prd-operations een volledige NetworkPolicy had — de oorspronkelijke allow-argo was podSelector:{} allow-all en maskeerde alle restrictieve policies sinds dag één. Keycloak had helemaal geen policy en verloor database- en github-egress; OPI verloor database-egress; external-dns crashloopte. Deze policy reproduceert de pre-incident netwerkstaat en is bewust TIJDELIJK: het verwijderplan staat in het bestand. De kind-sandbox handhaaft geen NetworkPolicies (kindnet), dus per-component policies moeten op een handhavende cluster getest worden vóór verwijdering.
1 parent d30f6d8 commit 39be499

2 files changed

Lines changed: 33 additions & 0 deletions

File tree

bootstrap/rig-system/kustomize/overlays/odcn-production/kustomization.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ resources:
1010
- namespace.yaml
1111
- argocd-deployment.yaml
1212
- network-policies/argocd-network-policy.yaml
13+
- network-policies/emergency-restore-allow-all.yaml
1314
- ../../operations-manager/overlays/odcn-production
1415
- argocd-application-production-infrastructure.yaml
1516
- argocd-application-user-applications.yaml
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# TEMPORARY — DO NOT REMOVE WITHOUT READING THIS
2+
#
3+
# Allow-all policy over every pod in rig-prd-operations. This recreates the
4+
# effective network state production ran under from day one until 2026-06-10:
5+
# the original `allow-argo` was podSelector:{} allow-all and masked every
6+
# restrictive policy in the namespace (NetworkPolicies are additive).
7+
#
8+
# PR #64 scoped allow-argo down (correct security fix), but the first apply
9+
# of it (2026-06-10 bootstrap run) revealed that no other workload had a
10+
# complete policy — Keycloak had none at all — and took production down
11+
# (incident 2026-06-10: OPI/Keycloak/external-dns lost database and egress).
12+
#
13+
# Removal plan (in this order, nothing skipped):
14+
# 1. Write per-component NetworkPolicies for EVERY workload in this
15+
# namespace (keycloak, external-dns, minio, redis, rig-db, OPI, argocd).
16+
# 2. Test them on a cluster that actually ENFORCES policies — the kind
17+
# sandbox runs kindnet, which ignores NetworkPolicies entirely.
18+
# 3. Apply the per-component policies to production.
19+
# 4. Only then delete this file and the live object.
20+
apiVersion: networking.k8s.io/v1
21+
kind: NetworkPolicy
22+
metadata:
23+
name: emergency-restore-allow-all
24+
spec:
25+
podSelector: {}
26+
policyTypes:
27+
- Ingress
28+
- Egress
29+
ingress:
30+
- {}
31+
egress:
32+
- {}

0 commit comments

Comments
 (0)