|
| 1 | += ADR 0043 - Managed OpenBao Service Implementation |
| 2 | +:adr_author: Yannik Dällenbach |
| 3 | +:adr_owner: Schedar/bespinian |
| 4 | +:adr_reviewers: Schedar |
| 5 | +:adr_date: 2025-01-13 |
| 6 | +:adr_upd_date: 2025-01-13 |
| 7 | +:adr_status: draft |
| 8 | +:adr_tags: service,openbao,secret-management |
| 9 | + |
| 10 | +include::partial$adr-meta.adoc[] |
| 11 | + |
| 12 | +[NOTE] |
| 13 | +.Summary |
| 14 | +==== |
| 15 | +This ADR outlines the implementation of a managed OpenBao service on the AppCat platform to provide secret management capabilities to customers. |
| 16 | +This builds upon the suggestion of xref:adr/0024-product-choice-for-secret-management.adoc[] to use OpenBao as secret and PKI management solution. |
| 17 | +==== |
| 18 | + |
| 19 | +== Context |
| 20 | + |
| 21 | +Following the suggestion in xref:adr/0024-product-choice-for-secret-management.adoc[] to use OpenBao for secret management, we need to implement it as a managed service within the AppCat ecosystem. OpenBao provides: |
| 22 | + |
| 23 | +- Secret storage with REST API |
| 24 | +- Vault API compatibility |
| 25 | +- Open-source license with Linux Foundation backing |
| 26 | +- Self-hostable deployment model |
| 27 | + |
| 28 | +The service must integrate with the existing AppCat patterns including: |
| 29 | + |
| 30 | +- Crossplane-based provisioning |
| 31 | +- Managed namespace deployment model |
| 32 | +- User-workload monitoring integration |
| 33 | +- Backup and maintenance automation |
| 34 | +- SLA monitoring and reporting |
| 35 | + |
| 36 | +== Requirements |
| 37 | + |
| 38 | +=== Functional Requirements |
| 39 | + |
| 40 | +* **Secret Management**: Store, retrieve, and manage secrets via REST API |
| 41 | +* **API Compatibility**: Maintain Vault API compatibility for existing tooling |
| 42 | +* **High Availability**: Support clustered deployment for production workloads |
| 43 | +* **Authentication**: Integration with OIDC |
| 44 | + |
| 45 | +=== Operational Requirements |
| 46 | + |
| 47 | +* **Backup & Recovery**: Automated backup of secret data |
| 48 | +* **Monitoring**: SLA metrics, capacity alerts, and operational dashboards |
| 49 | +* **Maintenance**: Automated security updates and version upgrades |
| 50 | +* **Scaling**: Horizontal scaling capabilities for high-throughput scenarios |
| 51 | +* **Security**: Encryption at rest, TLS in transit, audit logging |
| 52 | + |
| 53 | +== Proposals |
| 54 | + |
| 55 | +=== Proposal 1: Helm Chart with External Storage |
| 56 | + |
| 57 | +Deploy OpenBao using the official Helm chart with external storage backends (PostgreSQL). |
| 58 | + |
| 59 | +Implementation:: |
| 60 | + |
| 61 | +- Use `provider-helm` to deploy OpenBao Helm chart |
| 62 | +- PostgreSQL backend for secret storage (leveraging existing `VSHNPostgreSQL`) |
| 63 | +- Initialization through composition functions |
| 64 | + |
| 65 | +Advantages:: |
| 66 | + |
| 67 | +- Leverages existing PostgreSQL infrastructure |
| 68 | +- Official Helm chart provides production-ready deployment |
| 69 | +- Separation of compute and storage for better scaling |
| 70 | +- Familiar AppCat deployment patterns |
| 71 | + |
| 72 | +Disadvantages:: |
| 73 | + |
| 74 | +- Additional complexity in managing external dependencies |
| 75 | +- Potential performance overhead with external storage |
| 76 | + |
| 77 | +=== Proposal 2: Helm Chart with Internal Storage |
| 78 | + |
| 79 | +Deploy OpenBao using the official Helm chart with integrated storage using Raft consensus. |
| 80 | + |
| 81 | +Implementation:: |
| 82 | + |
| 83 | +- Use `provider-helm` to deploy OpenBao Helm chart |
| 84 | +- Raft storage backend for simplicity |
| 85 | +- Built-in clustering for high availability |
| 86 | + |
| 87 | +Advantages:: |
| 88 | + |
| 89 | +- Simplified deployment with fewer external dependencies |
| 90 | +- Built-in consensus and replication |
| 91 | + |
| 92 | +Disadvantages:: |
| 93 | + |
| 94 | +- Raft cluster management overhead |
| 95 | + |
| 96 | +=== Proposal 3: Operator-Based Deployment |
| 97 | + |
| 98 | +Develop or adopt an OpenBao operator for Kubernetes-native management. |
| 99 | + |
| 100 | +Implementation:: |
| 101 | + |
| 102 | +- Custom operator following AppCat patterns |
| 103 | +- CRDs for vault configuration and policies |
| 104 | +- Automated lifecycle management |
| 105 | +- Native Kubernetes integration |
| 106 | + |
| 107 | +Advantages:: |
| 108 | + |
| 109 | +- Full Kubernetes-native experience |
| 110 | +- Automated day-2 operations |
| 111 | +- Extensible for future features |
| 112 | + |
| 113 | +Disadvantages:: |
| 114 | + |
| 115 | +- High development effort |
| 116 | +- Additional operational complexity |
| 117 | +- Maintenance burden for custom operator |
| 118 | + |
| 119 | +== Decision |
| 120 | + |
| 121 | +**Proposal 2: Helm Chart with Internal Storage** |
| 122 | + |
| 123 | +We choose to implement OpenBao using the official Helm chart with integrated Raft storage. |
| 124 | + |
| 125 | +### Implementation Details |
| 126 | + |
| 127 | +Storage Backend:: |
| 128 | + |
| 129 | +- Primary: Raft consensus storage for built-in clustering |
| 130 | +- Leverage existing AppCat Backup mechanisms (K8up) |
| 131 | +- Self-contained storage eliminates external dependencies |
| 132 | + |
| 133 | +**API Specification:** |
| 134 | + |
| 135 | +The VSHNOpenBao CRD follows AppCat conventions (xref:adr/0016-service-api-design.adoc[]) with parameter groups for service configuration, sizing, backup, monitoring, and maintenance. |
| 136 | + |
| 137 | +```yaml |
| 138 | +apiVersion: vshn.appcat.vshn.io/v1 |
| 139 | +kind: VSHNOpenBao |
| 140 | +metadata: |
| 141 | + name: my-openbao |
| 142 | + namespace: my-namespace |
| 143 | +spec: |
| 144 | + parameters: |
| 145 | + # Service configuration |
| 146 | + service: |
| 147 | + version: "2.1.0" # OpenBao version (enum of supported versions) |
| 148 | + fqdn: "openbao.example.com" # Fully qualified domain name |
| 149 | + serviceLevel: guaranteed # besteffort or guaranteed |
| 150 | + openBaoSettings: |
| 151 | + # Auto-unseal configuration (optional) |
| 152 | + # Enables automatic unsealing using external key management systems |
| 153 | + # Only one provider should be configured at a time |
| 154 | + autoUnseal: |
| 155 | + awsKmsSecretRef: "" # Reference to secret containing AWS KMS credentials and configuration |
| 156 | + azureKeyVaultSecretRef: "" # Reference to secret containing Azure Key Vault credentials and configuration |
| 157 | + gcpKmsSecretRef: "" # Reference to secret containing GCP Cloud KMS credentials and configuration |
| 158 | + transitSecretRef: "" # Reference to secret containing connection details to another Vault/OpenBao instance |
| 159 | + |
| 160 | + # Number of OpenBao instances |
| 161 | + # For guaranteed serviceLevel: must be 3 |
| 162 | + # For besteffort serviceLevel: can be 1 or 3 |
| 163 | + instances: 3 |
| 164 | + |
| 165 | + # Sizing |
| 166 | + size: |
| 167 | + plan: standard # Resource plan: small, standard, large |
| 168 | + requests: |
| 169 | + cpu: "2" |
| 170 | + memory: "4Gi" |
| 171 | + disk: 20Gi # Raft storage volume size per replica |
| 172 | + storageClass: "" # Optional storage class override |
| 173 | + |
| 174 | + # Backup and restore configuration (using K8up) |
| 175 | + backup: |
| 176 | + enabled: true |
| 177 | + schedule: "0 2 * * *" # Cron schedule for Raft snapshots |
| 178 | + retention: |
| 179 | + keepLast: 2 |
| 180 | + keepHourly: 2 |
| 181 | + keepDaily: 7 |
| 182 | + keepWeekly: 4 |
| 183 | + keepMonthly: 3 |
| 184 | + restore: |
| 185 | + claimName: "" |
| 186 | + backupName: "" |
| 187 | + |
| 188 | + # Maintenance window |
| 189 | + maintenance: |
| 190 | + dayOfWeek: Tuesday # enum: Monday-Sunday |
| 191 | + timeOfDay: "22:00" # HH:MM format in UTC |
| 192 | + |
| 193 | + # Monitoring |
| 194 | + monitoring: |
| 195 | + alertmanagerConfigRef: "" |
| 196 | + alertmanagerConfigSecretRef: {} |
| 197 | + alertmanagerConfigTemplate: {} |
| 198 | + email: "" |
| 199 | + |
| 200 | + # Unseal keys and root token secret reference |
| 201 | + # This secret will contain the unseal keys and root token generated during initialization |
| 202 | + writeConnectionSecretToRef: |
| 203 | + name: openbao-unseal-keys |
| 204 | +``` |
| 205 | + |
| 206 | +**Unseal Keys Secret:** |
| 207 | + |
| 208 | +The `writeConnectionSecretToRef` secret contains the unseal keys and root token: |
| 209 | + |
| 210 | +```yaml |
| 211 | +apiVersion: v1 |
| 212 | +kind: Secret |
| 213 | +metadata: |
| 214 | + name: openbao-unseal-keys |
| 215 | +data: |
| 216 | + UNSEAL_KEY_1: <base64-encoded-key> |
| 217 | + UNSEAL_KEY_2: <base64-encoded-key> |
| 218 | + UNSEAL_KEY_3: <base64-encoded-key> |
| 219 | + UNSEAL_KEY_4: <base64-encoded-key> |
| 220 | + UNSEAL_KEY_5: <base64-encoded-key> |
| 221 | + ROOT_TOKEN: <base64-encoded-root-token> |
| 222 | +``` |
| 223 | + |
| 224 | +**Auto-unseal** |
| 225 | + |
| 226 | +Auto unseal allows OpenBao to unseal automatically without manual intervention using an external key management system. This is crucial for automated recovery and reduces operational burden. |
| 227 | + |
| 228 | +By default OpenBao instances will be configured to use a central, internal VSHN managed Vault or OpenBao to auto-unseal. |
| 229 | + |
| 230 | +WARNING: If a customer configures an auto-unseal provider, only the service level "besteffort" can be guaranteed. |
| 231 | + |
| 232 | +Supported auto-unseal providers: |
| 233 | + |
| 234 | +AWS KMS::: Configure using `awsKmsSecretRef` pointing to a secret containing AWS credentials and KMS key configuration |
| 235 | +Azure Key Vault::: Configure using `azureKeyVaultSecretRef` pointing to a secret containing Azure credentials and Key Vault details |
| 236 | +GCP Cloud KMS::: Configure using `gcpKmsSecretRef` pointing to a secret containing GCP credentials and Cloud KMS configuration |
| 237 | +Transit (Vault/OpenBao)::: Configure using `transitSecretRef` pointing to a secret containing connection details to another Vault/OpenBao instance |
| 238 | + |
| 239 | +Each secret reference should contain the necessary credentials and configuration for the respective provider. When auto-unseal is configured, OpenBao will automatically unseal after restarts without requiring the unseal keys from `writeUnsealKeysSecretToRef`. |
| 240 | + |
| 241 | +If no auto-unseal provider is configured, manual unsealing using the unseal keys is required after each pod restart. |
| 242 | + |
| 243 | +Example AWS KMS auto-unseal secret: |
| 244 | + |
| 245 | +```yaml |
| 246 | +apiVersion: v1 |
| 247 | +kind: Secret |
| 248 | +metadata: |
| 249 | + name: openbao-awskms-config |
| 250 | +type: Opaque |
| 251 | +stringData: |
| 252 | + region: "us-east-1" |
| 253 | + access_key: "AKIAIOSFODNN7EXAMPLE" |
| 254 | + secret_key: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" |
| 255 | + kms_key_id: "19ec80b0-dfdd-4d97-8164-c6examplekey" |
| 256 | + endpoint: "https://vpce-0e1bb1852241f8cc6-pzi0do8n.kms.us-east-1.vpce.amazonaws.com" |
| 257 | +``` |
| 258 | + |
| 259 | +**Service Levels:** |
| 260 | + |
| 261 | +besteffort:: |
| 262 | +- 1 or 3 instances |
| 263 | +- Standard resource guarantees |
| 264 | +- Best-effort availability |
| 265 | + |
| 266 | +guaranteed:: |
| 267 | +- Requires 3 instances (HA deployment) |
| 268 | +- Resource guarantees with pod anti-affinity |
| 269 | +- Higher availability SLA |
| 270 | + |
| 271 | +**Plans:** |
| 272 | + |
| 273 | +By default, the following plans are available on every cluster: |
| 274 | + |
| 275 | +[cols="25a,15,15,15", options="header"] |
| 276 | +|=== |
| 277 | +| Plan | CPU | Memory | Disk |
| 278 | +| standard-2 | 500m | 2Gi | 16Gi |
| 279 | +| standard-4 | 1 | 4Gi | 16Gi |
| 280 | +| standard-8 | 2 | 8Gi | 16Gi |
| 281 | +|=== |
| 282 | + |
| 283 | +Key Components:: |
| 284 | + |
| 285 | +1. **OpenBao Cluster**: 3-node HA deployment with Raft consensus |
| 286 | +2. **Raft Storage**: Built-in distributed storage backend |
| 287 | +3. **Backup Storage**: `ObjectBucket` for Raft snapshots using K8up |
| 288 | +4. **Monitoring**: Custom SLI exporter and Prometheus integration |
| 289 | + |
| 290 | +Security Model:: |
| 291 | + |
| 292 | +- TLS encryption for all communications |
| 293 | +- RBAC policies managed through OpenBao |
| 294 | +- Audit logging to persistent storage |
| 295 | +- Auto-unseal configuration for OpenBao bootstrap |
| 296 | + |
| 297 | +== Consequences |
| 298 | + |
| 299 | +Positive:: |
| 300 | + |
| 301 | +- Simplified deployment with fewer external dependencies |
| 302 | +- Built-in consensus and replication reduces operational complexity |
| 303 | +- Self-contained backup mechanisms using Raft snapshots |
| 304 | +- Leverages official OpenBao Helm chart for production readiness |
| 305 | +- Eliminates external storage dependency management |
| 306 | + |
| 307 | +Negative:: |
| 308 | + |
| 309 | +- Raft cluster management requires specialized knowledge |
| 310 | +- Limited to OpenBao's built-in storage capabilities |
| 311 | +- Potential storage scaling limitations compared to external databases |
| 312 | +- No feature parity with HashiCorp Vault Enterprise |
| 313 | + |
| 314 | +Operational Impact:: |
| 315 | + |
| 316 | +- Simplified service deployment with reduced external dependencies |
| 317 | +- Raft snapshot management and restoration procedures |
| 318 | +- Need for OpenBao and Raft consensus expertise in operations team |
| 319 | +- Integration testing with existing AppCat services |
| 320 | +- TLS certificate lifecycle management (renewal, rotation) |
| 321 | +- Auto-unseal configuration and cluster bootstrap management |
| 322 | +- Raft cluster health monitoring and node management |
| 323 | +- Audit log management and compliance reporting |
| 324 | +- ServiceMonitor configuration for Prometheus integration |
| 325 | +- Snapshot-based backup validation and testing |
| 326 | + |
| 327 | +Customer Benefits:: |
| 328 | + |
| 329 | +- Self-hosted alternative to cloud secret management services |
| 330 | +- Vault API compatibility for existing applications and tooling |
| 331 | +- Compliance with data sovereignty requirements |
0 commit comments