|
| 1 | +# SAML SP Key Rotation Runbook |
| 2 | + |
| 3 | +Zero-downtime rotation of the SAML Service Provider signing (and optional encryption) key. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## Overview |
| 8 | + |
| 9 | +GoTrue advertises its SP public key inside SAML metadata. Identity Providers (IdPs) cache this |
| 10 | +metadata and use the embedded certificate to verify SP-signed AuthnRequests and (when encryption |
| 11 | +is enabled) to encrypt assertions sent back to the SP. |
| 12 | + |
| 13 | +A rotation therefore has two concerns: |
| 14 | + |
| 15 | +1. **Signing** — IdPs must trust the new certificate before GoTrue starts signing with it. |
| 16 | +2. **Encryption** (only when `GOTRUE_SAML_ALLOW_ENCRYPTED_ASSERTIONS=true`) — GoTrue must be |
| 17 | + able to decrypt assertions that were encrypted with the *old* certificate while the IdP's |
| 18 | + cache still points to it. |
| 19 | + |
| 20 | +Both concerns are handled automatically once you follow the steps below. |
| 21 | + |
| 22 | +--- |
| 23 | + |
| 24 | +## Prerequisites |
| 25 | + |
| 26 | +- Access to the GoTrue environment variables / secrets store. |
| 27 | +- Ability to trigger a rolling restart or redeploy of GoTrue. |
| 28 | +- `openssl` available locally (or equivalent). |
| 29 | + |
| 30 | +--- |
| 31 | + |
| 32 | +## Step 1 — Generate the new key |
| 33 | + |
| 34 | +```bash |
| 35 | +# Produces a PKCS#1 DER key encoded as standard Base64 (no line breaks). |
| 36 | +openssl genrsa 2048 | openssl rsa -outform DER | base64 | tr -d '\n' |
| 37 | +``` |
| 38 | + |
| 39 | +Store the output somewhere safe (secret manager, vault). This is the **new key** value. |
| 40 | + |
| 41 | +> **Requirement:** RSA 2048 or larger, public exponent 65537 (the `openssl genrsa` default). |
| 42 | +
|
| 43 | +--- |
| 44 | + |
| 45 | +## Step 2 — Announce the new certificate (dual-key window) |
| 46 | + |
| 47 | +Set the new key as the *next* key **without** touching the primary key: |
| 48 | + |
| 49 | +``` |
| 50 | +GOTRUE_SAML_PRIVATE_KEY=<current key — unchanged> |
| 51 | +GOTRUE_SAML_PRIVATE_KEY_NEXT=<new key from Step 1> |
| 52 | +``` |
| 53 | + |
| 54 | +Redeploy / restart GoTrue. |
| 55 | + |
| 56 | +**What happens:** |
| 57 | + |
| 58 | +- Both certificates appear in SP metadata under `<md:KeyDescriptor use="signing">`. |
| 59 | +- The primary certificate remains first, so IdPs that already trust it continue to work. |
| 60 | +- `Cache-Control` drops to `max-age=60` and the XML `cacheDuration` is set to `PT1H` so IdPs |
| 61 | + re-fetch metadata sooner. |
| 62 | +- The `/settings` endpoint returns `"saml_private_key_next_configured": true`. |
| 63 | +- If encrypted assertions are enabled, both certificates also appear as `use="encryption"` |
| 64 | + descriptors, and GoTrue will automatically retry decryption with the old key if the primary |
| 65 | + key fails. |
| 66 | + |
| 67 | +**Verify:** |
| 68 | + |
| 69 | +```bash |
| 70 | +curl -s https://<your-domain>/auth/v1/sso/saml/metadata \ |
| 71 | + | xmllint --xpath 'count(//md:KeyDescriptor[@use="signing"])' \ |
| 72 | + --noout - 2>/dev/null |
| 73 | +# Expected: 2 |
| 74 | +``` |
| 75 | + |
| 76 | +--- |
| 77 | + |
| 78 | +## Step 3 — Wait for IdP caches to drain |
| 79 | + |
| 80 | +IdPs must re-fetch metadata and import the new certificate before you promote it. The safe |
| 81 | +window is determined by the *largest* cache TTL among your IdPs. |
| 82 | + |
| 83 | +**Minimum wait:** 1 hour (the `cacheDuration=PT1H` advertised in metadata). |
| 84 | + |
| 85 | +For IdPs with longer cache windows or manual metadata import workflows, trigger a metadata |
| 86 | +refresh in their admin console before proceeding, or contact the IdP admin to confirm the new |
| 87 | +certificate is imported. |
| 88 | + |
| 89 | +Confirm the new certificate is trusted by performing a test login with an affected IdP if |
| 90 | +possible. |
| 91 | + |
| 92 | +--- |
| 93 | + |
| 94 | +## Step 4 — Promote the new key |
| 95 | + |
| 96 | +Swap the values and remove `_NEXT`: |
| 97 | + |
| 98 | +``` |
| 99 | +GOTRUE_SAML_PRIVATE_KEY=<new key from Step 1> |
| 100 | +GOTRUE_SAML_PRIVATE_KEY_NEXT= # remove / clear |
| 101 | +``` |
| 102 | + |
| 103 | +Redeploy / restart GoTrue. |
| 104 | + |
| 105 | +**What happens:** |
| 106 | + |
| 107 | +- Metadata now advertises only the new certificate. |
| 108 | +- `Cache-Control` returns to `max-age=600`. |
| 109 | +- Signing switches to the new key immediately. |
| 110 | +- If encrypted assertions are enabled, GoTrue no longer attempts the fallback decryption with |
| 111 | + the old key (it is no longer configured). |
| 112 | + |
| 113 | +**Verify:** |
| 114 | + |
| 115 | +```bash |
| 116 | +curl -s https://<your-domain>/auth/v1/sso/saml/metadata \ |
| 117 | + | xmllint --xpath 'count(//md:KeyDescriptor[@use="signing"])' \ |
| 118 | + --noout - 2>/dev/null |
| 119 | +# Expected: 1 |
| 120 | + |
| 121 | +curl -s https://<your-domain>/auth/v1/settings \ |
| 122 | + | jq '.saml_private_key_next_configured' |
| 123 | +# Expected: false |
| 124 | +``` |
| 125 | + |
| 126 | +Perform a test login to confirm end-to-end flow. |
| 127 | + |
| 128 | +--- |
| 129 | + |
| 130 | +## Encrypted assertions — additional notes |
| 131 | + |
| 132 | +When `GOTRUE_SAML_ALLOW_ENCRYPTED_ASSERTIONS=true`: |
| 133 | + |
| 134 | +- During the dual-key window (Step 2), GoTrue accepts assertions encrypted with **either** the |
| 135 | + primary or the next (old) certificate. No action needed. |
| 136 | +- The IdP may send assertions encrypted with the old certificate for up to the cache window |
| 137 | + after Step 4. This is safe because the old key is gone from configuration and the IdP should |
| 138 | + have already switched to the new certificate. If any IdP still sends assertions encrypted with |
| 139 | + the old certificate after promotion, those assertions will fail. Contact the IdP admin to |
| 140 | + force a metadata refresh. |
| 141 | + |
| 142 | +--- |
| 143 | + |
| 144 | +## Rollback |
| 145 | + |
| 146 | +| Phase | How to rollback | |
| 147 | +|-------|----------------| |
| 148 | +| After Step 2 (dual-key deployed, not yet promoted) | Clear `GOTRUE_SAML_PRIVATE_KEY_NEXT` and redeploy. No key material was changed at IdPs. | |
| 149 | +| After Step 4 (new key promoted) | Restore the old key to `GOTRUE_SAML_PRIVATE_KEY`, set the new key in `GOTRUE_SAML_PRIVATE_KEY_NEXT`, redeploy. You are back to the dual-key window. Wait for IdPs to re-import the old certificate before relying on it. | |
| 150 | + |
| 151 | +> Avoid skipping the dual-key window. Promoting a new key before IdPs have cached the new |
| 152 | +> certificate will break SP-initiated flows for the cache window duration. |
| 153 | +
|
| 154 | +--- |
| 155 | + |
| 156 | +## Quick reference |
| 157 | + |
| 158 | +| Variable | Purpose | |
| 159 | +|----------|---------| |
| 160 | +| `GOTRUE_SAML_PRIVATE_KEY` | Active signing (and decryption) key. PKCS#1 DER, Base64-encoded. | |
| 161 | +| `GOTRUE_SAML_PRIVATE_KEY_NEXT` | Incoming key during rotation. Advertised in metadata; used as decryption fallback. Clear after promotion. | |
| 162 | +| `GOTRUE_SAML_ALLOW_ENCRYPTED_ASSERTIONS` | Enable encrypted assertion support. Both keys appear as `use="encryption"` descriptors when rotation is active. | |
0 commit comments