-
Notifications
You must be signed in to change notification settings - Fork 487
docs: add design document for JWT authentication implementation #34518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
jasonhernandez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a few discussion points. We can agree to leave the catalog as an open discussion / draft.
| ### Solution proposal: The user should be disabled from logging in when a user is de-provisioned. However, the database level role should still exist. | ||
|
|
||
| When doing pgwire jwt authentication, we can accept a cleartext password of the form `access=<ACCESS_TOKEN>&refresh=<REFRESH_TOKEN>` where `&` is a delimiter and `refresh=<REFRESH_TOKEN>` is optional. The JWT authenticator will then try to authenticate again and fetch a new access token using the refresh token when close to expiration (using the token API URL in the spec above). If the refresh token doesn’t exist, the session will invalidate. The implementation will be very similar to how we refresh tokens for the Frontegg authenticator. This would require users to have their IDP client generate `refresh` tokens. | ||
|
|
||
| By suggesting a short time to live for access tokens, this accomplishes invalidating sessions on deprovisioning of a user. When admins deprovision a user, the next time the user tries to authenticate or refresh their access token, the token API will not allow the user to login but will keep the role in the database. | ||
|
|
||
| **Alternative: Use SASL Authentication using the OAUTHBEARER mechanism rather than a cleartext password** | ||
|
|
||
| This would be the most Postgres compatible way of doing this and is what it uses for its `oauth` authentication method. However, it may run into compatibility issues with clients. For example in `psql`, there’s no obvious way of sending the bearer token directly without going through libpq's device-grant flow. Furthermore, assuming access tokens are short lived, this could lead to poor UX given there’s no native way to re-authenticate a pgwire session. Finally, our HTTP endpoints wouldn’t be able to support this given they don’t support SASL auth. | ||
|
|
||
| OAUTHBEARER reference: [https://www.postgresql.org/docs/18/sasl-authentication.html#SASL-OAUTHBEARER](https://www.postgresql.org/docs/18/sasl-authentication.html#SASL-OAUTHBEARER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this approach. I would roughly coin this as "bring your own JWT issuer / JWK"
| ### Solution proposal: The end user is able to create a token to connect to materialize via psql / postgres clients | ||
|
|
||
| Unfortunately, to provide a nice flow to generate the necessary access token and refresh token, we’d need to control the client. Thus we’ll leave the retrieval of the access token/refresh token to the user, similar to CockroachDB. | ||
|
|
||
| **Alternative: Revive the mz CLI** | ||
|
|
||
| We have an `mz` CLI that’s catered to Cloud and no longer supported. We can potentially bring this back. | ||
|
|
||
| **Open question:** Is there anything we can do on our side to easily provide access tokens / refresh tokens to the user without controlling the client? This feels like the missing piece between JWT authentication and something like `aws sso login` in the AWS CLI | ||
|
|
||
| ### Solution proposal: The end user is able to visit the Materialize console, and sign in with their IdP | ||
|
|
||
| A generic Frontend SSO redirect flow would need to be implemented to retrieve an access token and refresh token. However once retrieved, the SQL HTTP / WS API endpoints can use bearer authorization like Cloud and accept the access token. The Console would be in charge of refreshing the access token. The Console work is out of scope for this design document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can exclude this from scope and let customers use existing tooling to get JWTs. I'm not sure this is necessary or particularly useful for customers. We might need something for internal testing, but I would start with just meeting that need for now.
| ### Tests: | ||
|
|
||
| - Successful login (e2e mzcompose) | ||
| - Invalidating the session on access token expiration and no refresh token (Rust unit test) | ||
| - A token should successfully refresh if the access token and refresh token are valid (Rust unit test) | ||
| - Session should error if access token is invalid (Rust unit test) | ||
| - Session should error if refresh token is invalid (Rust unit test) | ||
| - De-provisioning a user should invalidate the refresh token (e2e mzcompose) | ||
| - Platform-check simple login check (platform-check framework) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### Tests: | |
| - Successful login (e2e mzcompose) | |
| - Invalidating the session on access token expiration and no refresh token (Rust unit test) | |
| - A token should successfully refresh if the access token and refresh token are valid (Rust unit test) | |
| - Session should error if access token is invalid (Rust unit test) | |
| - Session should error if refresh token is invalid (Rust unit test) | |
| - De-provisioning a user should invalidate the refresh token (e2e mzcompose) | |
| - Platform-check simple login check (platform-check framework) | |
| ### Tests: | |
| - Successful login (e2e mzcompose) | |
| - Invalidating the session on access token expiration and no refresh token (Rust unit test) | |
| - A token should successfully refresh if the access token and refresh token are valid (Rust unit test) | |
| - Session should error if access token is invalid (Rust unit test) | |
| - Session should error if refresh token is invalid (Rust unit test) | |
| - De-provisioning a user should invalidate the refresh token (e2e mzcompose) | |
| - Platform-check simple login check (platform-check framework | |
| - JWTs should only be accepted when a valid JWK is set (we do not want to accept JWTs that are not signed with a real, cryptographically sound key) |
| ## Phase 2: Map the `admin` claim to a user’s superuser attribute | ||
|
|
||
| Based on the `admin` claim, we can set the `superuser` attribute we store in the catalog for password authentication. We do this by doing the following: | ||
|
|
||
| - First, in our authenticator, save `admin` inside the user’s `external_metadata` | ||
| - Next, in `handle_startup_inner()` we diff them with the user’s current superuser status and if there’s a difference, apply the changes with an `ALTER` operation. We can use `catalog_transact_with_context()` for this. | ||
| - On error (e.g. if the ALTER isn’t allowed), we’ll end the connection with a descriptive error message. This is a similar pattern we use for initializing network policies. | ||
|
|
||
| This is similar to how we identify superusers in Frontegg auth, except we also treat it as an operation to update the catalog | ||
| - We can keep using the session’ metadata as the source of truth to keep parity with Cloud, but eventually we’ll want to use the catalog as the source of truth for all. We can call this **out of scope.** | ||
|
|
||
| Prototype: [https://github.com/MaterializeInc/materialize/pull/34372/commits](https://github.com/MaterializeInc/materialize/pull/34372/commits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be careful about cases like this:
User authenticates with JWT as an admin
User has admin permissions revoked in service issuing JWTs
User logs in with password.
How will we know that they had their admin permissions revoked? How will a customer confidently ensure that they're able to revoke admin access?
there are a few solutions:
- bind auth methods to user ids (i.e. they can't set / use a password after authenticating with a JWT)
- periodic sync (rate limits could be a concern)
- live validation to some external endpoint (rate limits could be a concern)
- don't support any heterogeneity in auth methods at all (i.e. if JWT is enabled, only accept JWTs, rely on the issuer / expiration time)
In general, I want to avoid any temptation or risk of confusion where we might rely on stale data in the catalog when another source of truth exists.
Rendered version: https://github.com/MaterializeInc/materialize/blob/52a83bcf8c3186c527dad2cf15876b46ce06fd5d/doc/developer/design/20251215_jwt_authentication.md
Motivation
https://linear.app/materializeinc/issue/SQL-9/auth-determine-how-sql-team-fits-into-delivering-sso-for-self-managed
Tips for reviewer
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.