Skip to content

Commit 5b95ff8

Browse files
authored
feat(saml): support zero-downtime SP key rotation (#2575)
## What kind of change does this PR introduce? Feature ## Summary Adds `GOTRUE_SAML_PRIVATE_KEY_NEXT` to support safe rotation of the SAML SP signing key without interrupting active IdP integrations. - Config: PrivateKeyNext parsed, validated (same rules as primary), and derived into RSAPrivateKeyNext / CertificateNext; explicit nil-reset guards against envconfig zero-value allocation - Metadata: next cert injected as a second KeyDescriptor; Cache-Control drops to max-age=60 and XML cacheDuration is set to PT1H during the rotation window - Encrypted assertions: when AllowEncryptedAssertions=true, a failed ParseResponse is retried with the old key in primary position; original error returned if both fail - Settings: saml_private_key_next_configured boolean added to /settings - Runbook: docs/saml_key_rotation.md covers key generation, the dual-key window, promotion, rollback, and encrypted assertion considerations
1 parent 4536aa9 commit 5b95ff8

10 files changed

Lines changed: 623 additions & 48 deletions

File tree

docs/saml_key_rotation.md

Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
# SAML SP Key Rotation Runbook
2+
3+
Zero-downtime rotation of the SAML Service Provider signing (and optional encryption) key.
4+
5+
---
6+
7+
## Overview
8+
9+
GoTrue advertises its SP public key inside SAML metadata. Identity Providers (IdPs) cache this
10+
metadata and use the embedded certificate to verify SP-signed AuthnRequests and (when encryption
11+
is enabled) to encrypt assertions sent back to the SP.
12+
13+
A rotation therefore has two concerns:
14+
15+
1. **Signing** — IdPs must trust the new certificate before GoTrue starts signing with it.
16+
2. **Encryption** (only when `GOTRUE_SAML_ALLOW_ENCRYPTED_ASSERTIONS=true`) — GoTrue must be
17+
able to decrypt assertions that were encrypted with the *old* certificate while the IdP's
18+
cache still points to it.
19+
20+
Both concerns are handled automatically once you follow the steps below.
21+
22+
---
23+
24+
## Prerequisites
25+
26+
- Access to the GoTrue environment variables / secrets store.
27+
- Ability to trigger a rolling restart or redeploy of GoTrue.
28+
- `openssl` available locally (or equivalent).
29+
30+
---
31+
32+
## Step 1 — Generate the new key
33+
34+
```bash
35+
# Produces a PKCS#1 DER key encoded as standard Base64 (no line breaks).
36+
openssl genrsa 2048 | openssl rsa -outform DER | base64 | tr -d '\n'
37+
```
38+
39+
Store the output somewhere safe (secret manager, vault). This is the **new key** value.
40+
41+
> **Requirement:** RSA 2048 or larger, public exponent 65537 (the `openssl genrsa` default).
42+
43+
---
44+
45+
## Step 2 — Announce the new certificate (dual-key window)
46+
47+
Set the new key as the *next* key **without** touching the primary key:
48+
49+
```
50+
GOTRUE_SAML_PRIVATE_KEY=<current key — unchanged>
51+
GOTRUE_SAML_PRIVATE_KEY_NEXT=<new key from Step 1>
52+
```
53+
54+
Redeploy / restart GoTrue.
55+
56+
**What happens:**
57+
58+
- Both certificates appear in SP metadata under `<md:KeyDescriptor use="signing">`.
59+
- The primary certificate remains first, so IdPs that already trust it continue to work.
60+
- `Cache-Control` drops to `max-age=60` and the XML `cacheDuration` is set to `PT1H` so IdPs
61+
re-fetch metadata sooner.
62+
- The `/settings` endpoint returns `"saml_private_key_next_configured": true`.
63+
- If encrypted assertions are enabled, both certificates also appear as `use="encryption"`
64+
descriptors, and GoTrue will automatically retry decryption with the old key if the primary
65+
key fails.
66+
67+
**Verify:**
68+
69+
```bash
70+
curl -s https://<your-domain>/auth/v1/sso/saml/metadata \
71+
| xmllint --xpath 'count(//md:KeyDescriptor[@use="signing"])' \
72+
--noout - 2>/dev/null
73+
# Expected: 2
74+
```
75+
76+
---
77+
78+
## Step 3 — Wait for IdP caches to drain
79+
80+
IdPs must re-fetch metadata and import the new certificate before you promote it. The safe
81+
window is determined by the *largest* cache TTL among your IdPs.
82+
83+
**Minimum wait:** 1 hour (the `cacheDuration=PT1H` advertised in metadata).
84+
85+
For IdPs with longer cache windows or manual metadata import workflows, trigger a metadata
86+
refresh in their admin console before proceeding, or contact the IdP admin to confirm the new
87+
certificate is imported.
88+
89+
Confirm the new certificate is trusted by performing a test login with an affected IdP if
90+
possible.
91+
92+
---
93+
94+
## Step 4 — Promote the new key
95+
96+
Swap the values and remove `_NEXT`:
97+
98+
```
99+
GOTRUE_SAML_PRIVATE_KEY=<new key from Step 1>
100+
GOTRUE_SAML_PRIVATE_KEY_NEXT= # remove / clear
101+
```
102+
103+
Redeploy / restart GoTrue.
104+
105+
**What happens:**
106+
107+
- Metadata now advertises only the new certificate.
108+
- `Cache-Control` returns to `max-age=600`.
109+
- Signing switches to the new key immediately.
110+
- If encrypted assertions are enabled, GoTrue no longer attempts the fallback decryption with
111+
the old key (it is no longer configured).
112+
113+
**Verify:**
114+
115+
```bash
116+
curl -s https://<your-domain>/auth/v1/sso/saml/metadata \
117+
| xmllint --xpath 'count(//md:KeyDescriptor[@use="signing"])' \
118+
--noout - 2>/dev/null
119+
# Expected: 1
120+
121+
curl -s https://<your-domain>/auth/v1/settings \
122+
| jq '.saml_private_key_next_configured'
123+
# Expected: false
124+
```
125+
126+
Perform a test login to confirm end-to-end flow.
127+
128+
---
129+
130+
## Encrypted assertions — additional notes
131+
132+
When `GOTRUE_SAML_ALLOW_ENCRYPTED_ASSERTIONS=true`:
133+
134+
- During the dual-key window (Step 2), GoTrue accepts assertions encrypted with **either** the
135+
primary or the next (old) certificate. No action needed.
136+
- The IdP may send assertions encrypted with the old certificate for up to the cache window
137+
after Step 4. This is safe because the old key is gone from configuration and the IdP should
138+
have already switched to the new certificate. If any IdP still sends assertions encrypted with
139+
the old certificate after promotion, those assertions will fail. Contact the IdP admin to
140+
force a metadata refresh.
141+
142+
---
143+
144+
## Rollback
145+
146+
| Phase | How to rollback |
147+
|-------|----------------|
148+
| After Step 2 (dual-key deployed, not yet promoted) | Clear `GOTRUE_SAML_PRIVATE_KEY_NEXT` and redeploy. No key material was changed at IdPs. |
149+
| After Step 4 (new key promoted) | Restore the old key to `GOTRUE_SAML_PRIVATE_KEY`, set the new key in `GOTRUE_SAML_PRIVATE_KEY_NEXT`, redeploy. You are back to the dual-key window. Wait for IdPs to re-import the old certificate before relying on it. |
150+
151+
> Avoid skipping the dual-key window. Promoting a new key before IdPs have cached the new
152+
> certificate will break SP-initiated flows for the cache window duration.
153+
154+
---
155+
156+
## Quick reference
157+
158+
| Variable | Purpose |
159+
|----------|---------|
160+
| `GOTRUE_SAML_PRIVATE_KEY` | Active signing (and decryption) key. PKCS#1 DER, Base64-encoded. |
161+
| `GOTRUE_SAML_PRIVATE_KEY_NEXT` | Incoming key during rotation. Advertised in metadata; used as decryption fallback. Clear after promotion. |
162+
| `GOTRUE_SAML_ALLOW_ENCRYPTED_ASSERTIONS` | Enable encrypted assertion support. Both keys appear as `use="encryption"` descriptors when rotation is active. |

internal/api/saml.go

Lines changed: 52 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
package api
22

33
import (
4+
"crypto/rsa"
5+
"crypto/x509"
6+
"encoding/base64"
47
"encoding/xml"
58
"net/http"
69
"net/url"
@@ -11,39 +14,27 @@ import (
1114
"github.com/crewjam/saml/samlsp"
1215
)
1316

14-
// getSAMLServiceProvider generates a new service provider object with the
15-
// (optionally) provided descriptor (metadata) for the identity provider.
16-
func (a *API) getSAMLServiceProvider(identityProvider *saml.EntityDescriptor, idpInitiated bool) *saml.ServiceProvider {
17-
var externalURL *url.URL
18-
19-
if a.config.SAML.ExternalURL != "" {
20-
url, err := url.ParseRequestURI(a.config.SAML.ExternalURL)
21-
if err != nil {
22-
// this should not fail as a.config should have been validated using #Validate()
23-
panic(err)
24-
}
25-
26-
externalURL = url
27-
} else {
28-
url, err := url.ParseRequestURI(a.config.API.ExternalURL)
29-
if err != nil {
30-
// this should not fail as a.config should have been validated using #Validate()
31-
panic(err)
32-
}
33-
34-
externalURL = url
17+
// newSAMLServiceProvider constructs a ServiceProvider for the given IdP
18+
// metadata, using the provided key/cert pair. Callers are responsible for
19+
// passing the correct pair (primary or rotation fallback).
20+
func (a *API) newSAMLServiceProvider(identityProvider *saml.EntityDescriptor, idpInitiated bool, key *rsa.PrivateKey, cert *x509.Certificate) *saml.ServiceProvider {
21+
raw := a.config.SAML.ExternalURL
22+
if raw == "" {
23+
raw = a.config.API.ExternalURL
3524
}
36-
37-
if !strings.HasSuffix(externalURL.Path, "/") {
38-
externalURL.Path += "/"
25+
u, err := url.ParseRequestURI(raw)
26+
if err != nil {
27+
panic(err)
3928
}
40-
41-
externalURL.Path += "sso/"
29+
if !strings.HasSuffix(u.Path, "/") {
30+
u.Path += "/"
31+
}
32+
u.Path += "sso/"
4233

4334
provider := samlsp.DefaultServiceProvider(samlsp.Options{
44-
URL: *externalURL,
45-
Key: a.config.SAML.RSAPrivateKey,
46-
Certificate: a.config.SAML.Certificate,
35+
URL: *u,
36+
Key: key,
37+
Certificate: cert,
4738
SignRequest: true,
4839
AllowIDPInitiated: idpInitiated,
4940
IDPMetadata: identityProvider,
@@ -56,7 +47,7 @@ func (a *API) getSAMLServiceProvider(identityProvider *saml.EntityDescriptor, id
5647

5748
// SAMLMetadata serves GoTrue's SAML Service Provider metadata file.
5849
func (a *API) SAMLMetadata(w http.ResponseWriter, r *http.Request) error {
59-
serviceProvider := a.getSAMLServiceProvider(nil, true)
50+
serviceProvider := a.newSAMLServiceProvider(nil, true, a.config.SAML.RSAPrivateKey, a.config.SAML.Certificate)
6051

6152
metadata := serviceProvider.Metadata()
6253

@@ -92,16 +83,46 @@ func (a *API) SAMLMetadata(w http.ResponseWriter, r *http.Request) error {
9283
}
9384
}
9485

86+
// During key rotation, advertise the next certificate so IdPs can
87+
// cache it before we promote it to primary.
88+
if a.config.SAML.CertificateNext != nil {
89+
nextCertData := base64.StdEncoding.EncodeToString(a.config.SAML.CertificateNext.Raw)
90+
nextKD := saml.KeyDescriptor{
91+
Use: "signing",
92+
KeyInfo: saml.KeyInfo{
93+
X509Data: saml.X509Data{
94+
X509Certificates: []saml.X509Certificate{
95+
{Data: nextCertData},
96+
},
97+
},
98+
},
99+
}
100+
keyDescriptors = append(keyDescriptors, nextKD)
101+
if a.config.SAML.AllowEncryptedAssertions {
102+
encKD := nextKD
103+
encKD.Use = "encryption"
104+
keyDescriptors = append(keyDescriptors, encKD)
105+
}
106+
}
107+
95108
spd.KeyDescriptors = keyDescriptors
96109
}
97110

111+
// Reduce cache aggressiveness during key rotation so IdPs and CDNs pick
112+
// up the updated metadata before we promote the next key.
113+
cacheControl := "public, max-age=600"
114+
if a.config.SAML.CertificateNext != nil {
115+
metadata.CacheDuration = time.Hour
116+
cacheControl = "public, max-age=60"
117+
}
118+
98119
metadataXML, err := xml.Marshal(metadata)
99120
if err != nil {
100121
return err
101122
}
102123

103124
w.Header().Set("Content-Type", "application/xml")
104-
w.Header().Set("Cache-Control", "public, max-age=600") // cache at CDN for 10 minutes
125+
w.Header().Set("Cache-Control", cacheControl)
105126

106127
if r.FormValue("download") == "true" {
107128
w.Header().Set("Content-Disposition", "attachment; filename=\"metadata.xml\"")

0 commit comments

Comments
 (0)