name

production-deployment-runbook

description

Step-by-step guide for deploying Meridian to a production Kubernetes cluster, including infrastructure prerequisites, service startup order, health verification, and rollback procedures

triggers

Initial production deployment

New environment provisioning

Major version upgrade deployment

Disaster recovery re-deployment

instructions

Follow infrastructure prerequisites before deploying services. Apply database migrations per service before starting applications. Deploy services in dependency order (platform first, then core, then orchestration). Verify health checks at each stage before proceeding. Use rollback procedures if any stage fails.

Production Deployment Guide

When to use this runbook: Initial production deployment, new environment provisioning, major version upgrades, or re-deployment after disaster recovery.

Customization Note: This guide documents Meridian's architecture and deployment order. You must customize infrastructure details (cloud provider, DNS, TLS certificates, backup locations) for your specific environment.

1. Infrastructure Prerequisites

All infrastructure components must be provisioned and healthy before deploying Meridian services.

1.1 CockroachDB Cluster

CockroachDB is the primary database for all Meridian services.

Requirements:

CockroachDB v23.1+ (PostgreSQL wire-compatible)
Minimum 3-node cluster for production (multi-region recommended)
TLS/SSL enabled with valid certificates
Authentication enabled (username/password or client certificates)
Port 26257 (SQL) and 8080 (Admin UI) accessible from the Kubernetes cluster

Database-per-service architecture -- each service uses a dedicated database:

Database Name	Service
`meridian_platform`	Shared platform (tenant provisioning)
`meridian_tenant`	Tenant service
`meridian_current_account`	Current Account service
`meridian_financial_accounting`	Financial Accounting service
`meridian_position_keeping`	Position Keeping service
`meridian_payment_order`	Payment Order service
`meridian_party`	Party service
`meridian_reference_data`	Reference Data service
`meridian_market_information`	Market Information service
`meridian_internal_account`	Internal Account service

# Create databases (run from CockroachDB SQL shell)
cockroach sql --certs-dir=/certs --host=<cockroachdb-host>:26257 <<'SQL'
CREATE DATABASE IF NOT EXISTS meridian_platform;
CREATE DATABASE IF NOT EXISTS meridian_tenant;
CREATE DATABASE IF NOT EXISTS meridian_current_account;
CREATE DATABASE IF NOT EXISTS meridian_financial_accounting;
CREATE DATABASE IF NOT EXISTS meridian_position_keeping;
CREATE DATABASE IF NOT EXISTS meridian_payment_order;
CREATE DATABASE IF NOT EXISTS meridian_party;
CREATE DATABASE IF NOT EXISTS meridian_reference_data;
CREATE DATABASE IF NOT EXISTS meridian_market_information;
CREATE DATABASE IF NOT EXISTS meridian_internal_account;
SQL

CockroachDB limitations (see CLAUDE.md for details):

No LISTEN/NOTIFY -- Meridian uses polling and outbox patterns instead
No range types (TSTZRANGE) -- services use separate start/end columns
Schema changes cannot run inside transactions

1.2 Apache Kafka Cluster

Kafka provides event streaming for asynchronous workflows and audit event delivery.

Requirements:

Apache Kafka 3.9+ with KRaft mode (no ZooKeeper)
Minimum 3-broker cluster for production
TLS/SASL authentication configured
Port 9092 (broker) accessible from the Kubernetes cluster

Required topics (create before deploying services):

# Audit event topics (one per domain service)
kafka-topics.sh --create --bootstrap-server <kafka-host>:9092 \
  --topic audit.events.current-account --partitions 12 --replication-factor 3
kafka-topics.sh --create --bootstrap-server <kafka-host>:9092 \
  --topic audit.events.financial-accounting --partitions 6 --replication-factor 3
kafka-topics.sh --create --bootstrap-server <kafka-host>:9092 \
  --topic audit.events.position-keeping --partitions 9 --replication-factor 3
kafka-topics.sh --create --bootstrap-server <kafka-host>:9092 \
  --topic audit.events.party --partitions 6 --replication-factor 3
kafka-topics.sh --create --bootstrap-server <kafka-host>:9092 \
  --topic audit.events.payment-order --partitions 12 --replication-factor 3
kafka-topics.sh --create --bootstrap-server <kafka-host>:9092 \
  --topic audit.events.tenant --partitions 6 --replication-factor 3

# Dead letter queue for failed audit events
kafka-topics.sh --create --bootstrap-server <kafka-host>:9092 \
  --topic audit.events.dlq --partitions 3 --replication-factor 3

# Position keeping transaction events
kafka-topics.sh --create --bootstrap-server <kafka-host>:9092 \
  --topic position-keeping.transaction-captured.v1 --partitions 12 --replication-factor 3
kafka-topics.sh --create --bootstrap-server <kafka-host>:9092 \
  --topic position-keeping.transaction-amended.v1 --partitions 6 --replication-factor 3
kafka-topics.sh --create --bootstrap-server <kafka-host>:9092 \
  --topic position-keeping.transaction-reconciled.v1 --partitions 6 --replication-factor 3

# Verify topics
kafka-topics.sh --list --bootstrap-server <kafka-host>:9092

Partition counts should be tuned based on expected throughput. Current Account and Payment Order handle the highest transaction volumes.

1.3 Redis

Redis provides optional distributed idempotency and caching.

Requirements:

Redis 7+ with authentication enabled
TLS recommended for production
Port 6379 accessible from the Kubernetes cluster
Persistence enabled (appendonly yes)

Services that use Redis:

Service	Purpose	Required
Gateway	Tenant slug cache	Optional (degrades gracefully)
Tenant	Slug cache	Optional
Current Account	Idempotency keys	Recommended for multi-replica
Position Keeping	Idempotency keys	Recommended for multi-replica
Financial Accounting	Idempotency keys	Recommended for multi-replica
Payment Order	Idempotency keys	Recommended for multi-replica
Reference Data	Instrument cache	Optional

Redis is not strictly required. Services degrade gracefully when Redis is unavailable, falling back to per-instance idempotency.

1.4 Keycloak (Identity Provider)

Keycloak provides OAuth 2.0 / OpenID Connect authentication.

Requirements:

Keycloak 26+ (or any OIDC-compliant provider)
Production database backend (not embedded H2)
HTTPS enabled with valid TLS certificate
JWKS endpoint accessible from the Kubernetes cluster

Keycloak configuration:

Create realm: meridian
Create client: meridian-service (confidential, service account enabled)
Configure JWKS endpoint: https://<keycloak-host>/realms/meridian/protocol/openid-connect/certs
Define roles and scopes per your authorization requirements

1.5 Kubernetes Cluster

Requirements:

Kubernetes 1.23+
kubectl configured with cluster access
Kustomize (built into kubectl 1.14+)
Container registry accessible from the cluster
Ingress controller for external traffic (NGINX, Traefik, etc.)
TLS certificates for external endpoints

Namespace setup:

kubectl create namespace production
kubectl label namespace production name=production

1.6 Container Images

Build and push all service images to your container registry:

# Build all service images
# Each Dockerfile is at services/<service>/cmd/Dockerfile
# Images use multi-stage builds with distroless base (~2MB)

REGISTRY="your-registry.io/meridian"
VERSION="1.0.0"
COMMIT=$(git rev-parse --short HEAD)
BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ")

SERVICES=(
  "current-account"
  "financial-accounting"
  "position-keeping"
  "payment-order"
  "party"
  "tenant"
  "gateway"
  "internal-account"
  "market-information"
  "event-router"
)

# Note: reference-data is a library embedded in other services, not a standalone deployment.
# Note: audit-worker does not have a Dockerfile yet -- build via a root-level Dockerfile
# or add one before deploying.

for svc in "${SERVICES[@]}"; do
  docker build \
    --build-arg VERSION="${VERSION}" \
    --build-arg COMMIT="${COMMIT}" \
    --build-arg BUILD_DATE="${BUILD_DATE}" \
    -t "${REGISTRY}/${svc}:${VERSION}" \
    -f "services/${svc}/cmd/Dockerfile" .

  docker push "${REGISTRY}/${svc}:${VERSION}"
done

2. Initial Setup

2.1 Database Schema Migration

Each service manages its own schema using Atlas migrations. Migrations must be applied before starting services.

Migration directory structure:

services/<service>/migrations/     # Service-specific migrations
shared/migrations/                 # Shared infrastructure (audit tables)
services/<service>/atlas/atlas.hcl # Atlas configuration per service
shared/atlas/atlas.hcl             # Shared Atlas configuration

Apply shared migrations first, then per-service migrations:

# 1. Apply shared migrations (audit infrastructure)
atlas migrate apply --env production \
  --config file://shared/atlas/atlas.hcl \
  --url "postgres://user:pass@<cockroachdb-host>:26257/meridian_platform?sslmode=verify-full"

# 2. Apply per-service migrations
SERVICES_WITH_DB=(
  "tenant"
  "party"
  "reference-data"
  "current-account"
  "financial-accounting"
  "position-keeping"
  "payment-order"
  "internal-account"
  "market-information"
)

for svc in "${SERVICES_WITH_DB[@]}"; do
  DB_NAME="meridian_$(echo $svc | tr '-' '_')"
  atlas migrate apply --env production \
    --config "file://services/${svc}/atlas/atlas.hcl" \
    --url "postgres://user:pass@<cockroachdb-host>:26257/${DB_NAME}?sslmode=verify-full"
done

Verify migrations applied:

for svc in "${SERVICES_WITH_DB[@]}"; do
  DB_NAME="meridian_$(echo $svc | tr '-' '_')"
  echo "=== ${svc} ==="
  atlas migrate status --env production \
    --config "file://services/${svc}/atlas/atlas.hcl" \
    --url "postgres://user:pass@<cockroachdb-host>:26257/${DB_NAME}?sslmode=verify-full"
done

2.2 Kubernetes Secrets

Create secrets for each service. Use External Secrets Operator or Sealed Secrets in production -- never commit real credentials to version control.

# Per-service database secrets
# Each service needs a DATABASE_URL pointing to its dedicated database
SERVICES_WITH_DB=(
  "tenant"
  "party"
  "reference-data"
  "current-account"
  "financial-accounting"
  "position-keeping"
  "payment-order"
  "internal-account"
  "market-information"
  "audit-worker"
)

for svc in "${SERVICES_WITH_DB[@]}"; do
  DB_NAME="meridian_$(echo $svc | tr '-' '_')"
  kubectl create secret generic "${svc}-db" \
    --namespace=production \
    --from-literal=DATABASE_URL="postgres://user:pass@cockroachdb:26257/${DB_NAME}?sslmode=verify-full"
done

# Gateway database secret (uses platform database for tenant lookups)
kubectl create secret generic gateway-db \
  --namespace=production \
  --from-literal=DATABASE_URL="postgres://user:pass@cockroachdb:26257/meridian_platform?sslmode=verify-full"

# Audit consumer secrets (one per domain service, each pointing to that service's database)
for svc in current-account financial-accounting position-keeping party payment-order tenant; do
  DB_NAME="meridian_$(echo $svc | tr '-' '_')"
  kubectl create secret generic "${svc}-audit-consumer-db" \
    --namespace=production \
    --from-literal=DATABASE_URL="postgres://user:pass@cockroachdb:26257/${DB_NAME}?sslmode=verify-full"
done

2.3 Kubernetes ConfigMaps

Service configuration is managed via Kustomize overlays. The production overlay at deployments/k8s/overlays/production/kustomization.yaml sets production-appropriate values:

log_level=warn (reduce log volume)
log_format=json (structured logging for aggregation)
metrics_enabled=true
ENVIRONMENT=production
DEBUG_MODE=false

Additional environment-specific configuration should be added to the production Kustomize overlay.

3. Service Deployment Order

Services must be deployed in dependency order. Wait for each tier's health checks to pass before proceeding to the next tier.

Service Dependency Map

flowchart TD
    subgraph Tier1["Tier 1: Platform Services (deploy first)"]
        Tenant["Tenant :50056"]
        Party["Party :50055"]
        RD["ReferenceData :50059"]
    end

    subgraph Tier2["Tier 2: Core Domain Services"]
        MI["MarketInformation :50058"]
        IBA["InternalAccount :50057"]
        PK["PositionKeeping :50053"]
        FA["FinancialAccounting :50052"]
    end

    subgraph Tier3["Tier 3: Orchestration Services"]
        CA["CurrentAccount :50051"]
        PO["PaymentOrder :50054"]
    end

    subgraph Tier4["Tier 4: Edge and Supporting Services"]
        GW["Gateway :8080"]
        AW["AuditWorker :8080"]
        ER["EventRouter :8080"]
        AC["AuditConsumers :8080"]
    end

    Tenant --> Party
    Tenant --> RD
    CA --> Party
    CA --> PK
    CA --> FA
    CA --> IBA
    CA --> RD
    PO --> CA
    GW --> CA
    GW --> PO
    GW --> Party
    GW --> MI
    ER --> PK

Tier 1: Platform Services

These services have no dependencies on other Meridian services. They depend only on CockroachDB.

# Deploy Tenant service
kubectl apply -k deployments/k8s/overlays/production/
# Note: The base kustomization deploys the audit-worker.
# Service-specific kustomizations should be created or adjusted for each service.

# Verify Tenant is healthy before proceeding
kubectl wait --for=condition=Available deployment/tenant -n production --timeout=120s

Deploy in this order:

Tenant -- Manages multi-tenant isolation, provisions schemas for other services
Party -- Party reference data directory (Tenant optionally calls Party to register org parties)
Reference Data -- Instrument definitions (Tenant seeds system instruments during provisioning). Note: Reference Data does not yet have a standalone main.go or Dockerfile. Its handlers are currently embedded in other services. Deploy when a standalone entry point is added.

# Verify Tier 1 health
for svc in tenant party reference-data; do
  echo "=== ${svc} ==="
  kubectl get pods -n production -l app=${svc}
  # gRPC health check via grpcurl (install from https://github.com/fullstorydev/grpcurl)
  # grpcurl -plaintext <pod-ip>:<grpc-port> grpc.health.v1.Health/Check
done

Tier 2: Core Domain Services

These services depend on Tier 1 services and CockroachDB. Some optionally depend on Kafka and Redis.

Deploy in this order:

Market Information (:50058) -- Price benchmarks and market data. No upstream service dependencies.
Internal Account (:50057) -- Counterparty and operational accounts. No upstream service dependencies.
Position Keeping (:50053) -- Financial position tracking. Optionally calls Reference Data for instrument lookup. Publishes transaction events to Kafka.
Financial Accounting (:50052) -- Ledger postings. Consumes transaction events from Position Keeping via Kafka.

# Deploy Tier 2 services
for svc in market-information internal-account position-keeping financial-accounting; do
  kubectl apply -k deployments/k8s/overlays/production/
  kubectl wait --for=condition=Available deployment/${svc} -n production --timeout=120s
done

Tier 3: Orchestration Services

These services orchestrate workflows across multiple Tier 1 and Tier 2 services.

Current Account (:50051) -- The most connected service. Depends on:
- Party (validate party exists)
- Position Keeping (create position logs)
- Financial Accounting (record ledger postings)
- Internal Account (resolve clearing accounts, optional)
- Reference Data (fungibility validation, optional)
- Redis (idempotency, optional)
- Kafka (outbox worker for audit events, optional)
Payment Order (:50054) -- Payment processing. Depends on:
- Current Account (initiate liens for fund reservation)
- Redis (idempotency, optional)
- Kafka (outbox worker, optional)

# Deploy Tier 3 services
for svc in current-account payment-order; do
  kubectl apply -k deployments/k8s/overlays/production/
  kubectl wait --for=condition=Available deployment/${svc} -n production --timeout=120s
done

Tier 4: Edge and Supporting Services

Gateway (:8080) -- API gateway. Routes external HTTP requests to backend gRPC services. Depends on CockroachDB (tenant lookups) and optionally Redis (tenant slug cache).
Audit Worker (:8080) -- Processes audit outbox entries when Kafka is unavailable (fallback path). Depends on CockroachDB only.
Audit Consumers -- Per-service Kafka consumers that write audit events to audit_log tables. One deployment per domain service.
Event Router (:8080) -- CEL-filtered saga dispatcher. Consumes domain events from Kafka and triggers saga workflows. Also handles platform billing via utilization metering.

# Deploy Gateway
kubectl apply -k deployments/k8s/overlays/production/
kubectl wait --for=condition=Available deployment/gateway -n production --timeout=120s

# Deploy Audit Worker
kubectl apply -k deployments/k8s/base/
kubectl wait --for=condition=Available deployment/audit-worker -n production --timeout=120s

# Deploy Audit Consumers (one per domain service)
for svc in current-account financial-accounting position-keeping party payment-order tenant; do
  kubectl apply -k deployments/k8s/audit-consumer/overlays/${svc}
done

# Deploy Event Router
kubectl apply -k services/event-router/k8s/
kubectl wait --for=condition=Available deployment/event-router -n production --timeout=120s

4. Health Check Verification

4.1 gRPC Health Checks

All gRPC services implement the standard grpc.health.v1.Health service.

# Install grpcurl if not available
# https://github.com/fullstorydev/grpcurl

# Check gRPC service health (from within the cluster)
SERVICES_GRPC=(
  "tenant:50056"
  "party:50055"
  "reference-data:50059"
  "market-information:50058"
  "internal-account:50057"
  "position-keeping:50053"
  "financial-accounting:50052"
  "current-account:50051"
  "payment-order:50054"
)

for entry in "${SERVICES_GRPC[@]}"; do
  SVC="${entry%%:*}"
  PORT="${entry##*:}"
  echo "=== ${SVC} ==="
  # From a debug pod within the cluster:
  grpcurl -plaintext "${SVC}.production.svc.cluster.local:${PORT}" \
    grpc.health.v1.Health/Check
done

Expected output:

{
  "status": "SERVING"
}

4.2 HTTP Health Checks

HTTP services (Gateway, Audit Worker, Audit Consumers) expose health endpoints:

Endpoint	Purpose
`/health/live`	Liveness probe -- is the process alive?
`/health/ready`	Readiness probe -- is the service ready to accept traffic?
`/health/startup`	Startup probe -- has the service finished initializing?

# Check HTTP service health
HTTP_SERVICES=("gateway" "audit-worker")

for svc in "${HTTP_SERVICES[@]}"; do
  POD=$(kubectl get pod -n production -l app=${svc} -o jsonpath='{.items[0].metadata.name}')
  echo "=== ${svc} ==="
  kubectl exec -n production "${POD}" -- wget -qO- http://localhost:8080/health/ready 2>/dev/null \
    || echo "Health check via exec not available (distroless image)"
  # Alternative: port-forward and curl
  # kubectl port-forward -n production "${POD}" 8080:8080 &
  # curl http://localhost:8080/health/ready
done

4.3 Database Connectivity

Verify each service can reach its database:

# Check pod logs for successful database connection
for svc in tenant party reference-data current-account financial-accounting \
           position-keeping payment-order internal-account market-information; do
  echo "=== ${svc} ==="
  kubectl logs -n production -l app=${svc} --tail=20 | grep -i "database"
done

Expected log message:

{"level":"INFO","msg":"database connection established"}

4.4 Kafka Consumer Lag

Monitor Kafka consumer lag to ensure audit consumers are keeping up:

# Check consumer group lag
kafka-consumer-groups.sh --bootstrap-server <kafka-host>:9092 \
  --describe --all-groups

# Per-service consumer group check
for svc in current-account financial-accounting position-keeping party payment-order tenant; do
  echo "=== ${svc}-audit-consumer ==="
  kafka-consumer-groups.sh --bootstrap-server <kafka-host>:9092 \
    --describe --group "${svc}-audit-consumer"
done

Healthy state: LAG values should be 0 or near 0 under normal load.

4.5 Smoke Test

Run a basic end-to-end test after deployment:

# 1. Register a test tenant via tenantctl
go build -o tenantctl ./cmd/tenantctl
./tenantctl register \
  --id=smoke_test_tenant \
  --name="Smoke Test" \
  --settlement-asset=GBP \
  --service-url=tenant.production.svc.cluster.local:50056

# 2. Verify tenant was created
./tenantctl get smoke_test_tenant \
  --service-url=tenant.production.svc.cluster.local:50056

# 3. Provision default internal accounts via ibactl
go build -o ibactl ./cmd/ibactl
./ibactl provision-defaults smoke_test_tenant \
  --service-url=internal-account.production.svc.cluster.local:50057 \
  --tenant-service-url=tenant.production.svc.cluster.local:50056

# 4. Test Gateway health endpoint (external)
curl -s https://<gateway-external-url>/health/ready

# 5. Clean up test tenant
./tenantctl deprovision smoke_test_tenant --confirm \
  --service-url=tenant.production.svc.cluster.local:50056

5. Rollback Procedure

5.1 Service Rollback

Kubernetes Deployments support rolling back to previous revisions:

# View deployment history
kubectl rollout history deployment/<service-name> -n production

# Rollback to previous version
kubectl rollout undo deployment/<service-name> -n production

# Rollback to specific revision
kubectl rollout undo deployment/<service-name> -n production --to-revision=<N>

# Verify rollback
kubectl rollout status deployment/<service-name> -n production

Rollback order (reverse of deployment order):

Tier 4: Gateway, Audit Consumers, Audit Worker, Event Router
Tier 3: Payment Order, Current Account
Tier 2: Financial Accounting, Position Keeping, Internal Account, Market Information
Tier 1: Reference Data, Party, Tenant

5.2 Database Migration Rollback

Atlas migrations support downgrade operations:

# View migration status
atlas migrate status --env production \
  --config "file://services/<service>/atlas/atlas.hcl" \
  --url "<DATABASE_URL>"

# Rollback last migration
atlas migrate down --env production \
  --config "file://services/<service>/atlas/atlas.hcl" \
  --url "<DATABASE_URL>"

Precautions:

Always back up the database before rolling back migrations
Some migrations (data migrations, column drops) are not reversible
Atlas lint rules (destructive, data_depend, incompatible) catch dangerous changes in CI
Test rollback procedures in staging before applying to production

5.3 Kafka Consumer Offset Reset

If audit consumers need to reprocess events (after a bug fix or data corruption):

# Stop the consumer deployment first
kubectl scale deployment <service>-audit-consumer -n production --replicas=0

# Reset to earliest offset (reprocess all retained messages)
kafka-consumer-groups.sh --bootstrap-server <kafka-host>:9092 \
  --group <service>-audit-consumer \
  --topic audit.events.<service> \
  --reset-offsets --to-earliest --execute

# Or reset to specific timestamp
kafka-consumer-groups.sh --bootstrap-server <kafka-host>:9092 \
  --group <service>-audit-consumer \
  --topic audit.events.<service> \
  --reset-offsets --to-datetime 2025-01-15T00:00:00.000 --execute

# Restart the consumer
kubectl scale deployment <service>-audit-consumer -n production --replicas=<previous-count>

6. Service Port Reference

Service	gRPC Port	HTTP Port	Metrics Port	Kubernetes Service Name
Tenant	50056	-	9090	`tenant`
Party	50055	-	9090	`party`
Reference Data	50059	-	9090	`reference-data`
Market Information	50058	-	9090	`market-information`
Internal Account	50057	-	9090	`internal-account`
Position Keeping	50053	-	9090	`position-keeping`
Financial Accounting	50052	-	9090	`financial-accounting`
Current Account	50051	-	9090	`current-account`
Payment Order	50054	8080	9090	`payment-order`
Gateway	-	8080	8080	`gateway`
Audit Worker	-	8080	8080	`audit-worker`
Event Router	-	8080	8080	`event-router`

All gRPC services use Kubernetes DNS for service discovery: <service-name>.<namespace>.svc.cluster.local:<port>

7. Security Checklist

Before accepting production traffic, verify:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Production Deployment Guide

1. Infrastructure Prerequisites

1.1 CockroachDB Cluster

1.2 Apache Kafka Cluster

1.3 Redis

1.4 Keycloak (Identity Provider)

1.5 Kubernetes Cluster

1.6 Container Images

2. Initial Setup

2.1 Database Schema Migration

2.2 Kubernetes Secrets

2.3 Kubernetes ConfigMaps

3. Service Deployment Order

Service Dependency Map

Tier 1: Platform Services

Tier 2: Core Domain Services

Tier 3: Orchestration Services

Tier 4: Edge and Supporting Services

4. Health Check Verification

4.1 gRPC Health Checks

4.2 HTTP Health Checks

4.3 Database Connectivity

4.4 Kafka Consumer Lag

4.5 Smoke Test

5. Rollback Procedure

5.1 Service Rollback

5.2 Database Migration Rollback

5.3 Kafka Consumer Offset Reset

6. Service Port Reference

7. Security Checklist

Related Documentation

FilesExpand file tree

production-deployment.md

Latest commit

History

production-deployment.md

File metadata and controls

Production Deployment Guide

1. Infrastructure Prerequisites

1.1 CockroachDB Cluster

1.2 Apache Kafka Cluster

1.3 Redis

1.4 Keycloak (Identity Provider)

1.5 Kubernetes Cluster

1.6 Container Images

2. Initial Setup

2.1 Database Schema Migration

2.2 Kubernetes Secrets

2.3 Kubernetes ConfigMaps

3. Service Deployment Order

Service Dependency Map

Tier 1: Platform Services

Tier 2: Core Domain Services

Tier 3: Orchestration Services

Tier 4: Edge and Supporting Services

4. Health Check Verification

4.1 gRPC Health Checks

4.2 HTTP Health Checks

4.3 Database Connectivity

4.4 Kafka Consumer Lag

4.5 Smoke Test

5. Rollback Procedure

5.1 Service Rollback

5.2 Database Migration Rollback

5.3 Kafka Consumer Offset Reset

6. Service Port Reference

7. Security Checklist

Related Documentation