KEP-6060: API Server Authentication to Webhooks#6156
Conversation
|
Skipping CI for Draft Pull Request. |
c22d99e to
db49bad
Compare
dbd42c7 to
fd71104
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pmengelbert The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
fd71104 to
efea7f5
Compare
| is distinct from the service account the the principal requesting the token | ||
| might be using to authenticate itself to the Kubernetes API Server. The | ||
| Token Acquisition Service Account must have `attest` permissions on the | ||
| `APIService` object named in the `TokenRequest`. |
There was a problem hiding this comment.
Historically, tokens only identified, and authorization checks were done by whatever received the token. This is proposing doing the authorization check at token issuance time so the webhook merely has to look for a particular claim in the token?
If we're doing an authorization check on the service account at the time of issuance, would making the bound object and authorization check be related to the Service / ValidatingWebhookConfiguration / MutatingWebhookConfiguration / CustomResourceDefinition the token is going to be used against make more sense than an APIService? That would let us verify the requested audience was coherent with it as well.
There was a problem hiding this comment.
This is proposing doing the authorization check at token issuance time so the webhook merely has to look for a particular claim in the token?
Yes. In order to avoid the webhook talking back to kube-apiserver for TokenReview or SubjectAccessReview. This is all done to prevent aggregated API servers from probing policy information by asking questions about resources it does not control.
The purpose of using the APIService as the bound object is to make it possible for the Kubernetes API Server to check whether the webhook token acquisition service account has permission to talk to the webhook, and whether it may ask do so in order to ask questions about this APIService.
With the {Validating,Mutating}WebhookConfiguration as the bound object, I can't think of a way to perform the same authorization check as with the APIService, without the webhook having to call back to kube-apiserver. At token issuance, kube-apiserver does not know what question is being asked; it knows the audience and the bound object.
Because it is proposed that the audience be derived from the webhook config, the audience from the TokenRequest is used to correlate the APIService (from the bound object) and the webhook config corresponding to the audience. From there, it can be determined whether or not the webhook is even relevant to that APIService. Furthermore, kube-apiserver can check whether or not the token acquisition service account has the requisite permissions to ask questions about resources in that APIService. Without the bound APIService, kube-apiserver cannot do that check at credential issuance (it doesn't know the question being asked). We can't assume that the webhook author will have knowledge of which API Services it must support, so the webhook must instead answer the question (of whether or not the principal may ask questions about resources in a particular APIService) by making a SAR request to kube-apiserver. We want to avoid the webhook calling back to the Kubernetes API server for SAR and TokenReview, for obvious reasons.
There was a problem hiding this comment.
After discussing further with @enj, we propose the following:
Support 3 types of bindings:
MutatingWebhookConfigurationValidatingWebhookConfigurationAPIService
When either of the webhook configurations is used as the bound object, the permissions required for token issuance are "attest" on "*", because in essence that is what is being permitted by such a token.
Tokens bound to APIService won't require such broad permission, and will be narrowly scoped to that APIService.
How does this strike you?
| 1. Verify the token's signature via the OIDC discovery endpoint. | ||
| 1. Verify that the token's audience matches the expected audience. This audience | ||
| is derived deterministically from the webhook name, and is in the format is | ||
| in the format `k8s.io:admission:<webhook-name>`, where `<webhook-name>` |
There was a problem hiding this comment.
why the unusual prefix? Isn't audience typically the host / URL where the token will be presented?
Is there a reason not to use https://$url/with/path or https://$servicename.$servicenamespace.svc:port/with/path as the audience? Webhooks already are required to have valid serving certificates for those hosts, so no new knowledge should be required to make the webhooks able to validate a similar audience in the token.
There was a problem hiding this comment.
The reason for this is that the webhook author may not know the URL where the webhook is deployed, and there will be more plumbing needed on the webhook side to make that information available to the token verification routine.
That said, we don't want this to be a sticking point, so we'll accept your suggestion if you don't agree with the above reasoning.
There was a problem hiding this comment.
Audience as a URL is more typical ... and since webhooks already are required to have valid serving certificates for their host, it seems more likely to me they can know or be told the host/URL more easily
| 1. Verify that the `APIGroup` and `APIVersion` encoded in the token's bound | ||
| APIService match the `APIGroup` and `Version` of the resource in the body | ||
| of the `AdmissionReview` request. |
There was a problem hiding this comment.
Does this mean a server that serves lots of different API groups/versions (like kube-apiserver) combined with a webhook that intercepts lots of different groups/versions (as a generic label protection or something) would need a distinct token for every API group/version type it sent to the webhook? That's not ideal.
There was a problem hiding this comment.
That is an unfortunate consequence.
In practice, aggregated API Servers will typically have one, and almost always fewer than five, relevant APIService objects. kube-apiserver is a distinct case (could have thousands of APIServices), and I share your concern about the explosion of tokens there.
We could permit kube-apiserver to use a "wildcard" APIService as the bound object. During bootstrapping, the requisite permissions (i.e. "attest" on "*" or an equivalent rule) would be set up for kube-apiserver's token acquisition service account. Then kube-apiserver only has to maintain one token per webhook, while aggregated API servers should be able to manage with one per webhook/apiservice combination.
There was a problem hiding this comment.
Note, rather than using the wildcard APIService, we think it's better to allow bindings to either an APIService or one of the two WebhookConfiguration types (see comment linked below):
| in the format `k8s.io:admission:<webhook-name>`, where `<webhook-name>` | ||
| is the "inner" name of the webhook (i.e. the name in the inner list | ||
| of webhooks). | ||
| 1. Verify that the `APIGroup` and `APIVersion` encoded in the token's bound |
There was a problem hiding this comment.
A request to $group/v2 of an API can be converted to $group/v1 and sent to a webhook that asked to intercept $group/v1. See https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#matching-requests-matchpolicy
The object sent to the webhook would be in $group/v1, the kind and resource in the AdmissionReview would be $group/v1 and the requestKind and requestResource in the AdmissionReview would be $group/v2.
What APIService / version would you expect to be in the token sent to a webhook for that?
(I commented elsewhere about the implications of tying the token to a particular presented $group/$version ... we may want to tie it to the target webhook or service)
There was a problem hiding this comment.
In this case:
- Aggregated API Servers (likely) don't need to worry about this, since they don't serve custom resources. Any conversion (or request to convert) will have to be done by the aggregated API server itself, so it can just request the token post-conversion.
- kube-apiserver won't need to worry about this either, if it can request (as proposed in my comment above) tokens bound to one of the
WebhookConfigurationtypes.
Alternatively, we can ignore the APIVersion and decide that we only care about the APIGroup. I don't think it's ever particularly relevant.
If we do decide we care about the version, then this is covered by the webhook: it will check whether or not the (APIGroup, APIVersion) match between those stated in the token and those from the AdmissionReview request body.
| for this use. To distinguish between ServiceAccount tokens used for other | ||
| purposes, the term **Webhook Authentication Token (WAT)** will be used. However, | ||
| it is important to understand that these are ServiceAccount tokens in every | ||
| sense; but their use is constrained by newly added private claims. |
There was a problem hiding this comment.
Maybe it's just me, but I found defining new terms / acronyms for things we already have a bit confusing.
Also, be careful with language like "constrained" unless you really mean these tokens would be invalid to use as service account tokens in other ways because of these claims (which would make them sort of not normal service account tokens)
There was a problem hiding this comment.
Maybe it's just me, but I found defining new terms / acronyms for things we already have a bit confusing.
Not my intention to confuse. It helped with the writing to have those terms laid out, so that I have a consistent way to refer to a specific use of something in a particular context. It also helped with conciseness (ex. "token acquisition service account", vs "the service account token named in the TokenRequest for a token to authenticate to webhooks"). The latter is hard to work into more complex sentences.
Also, be careful with language like "constrained" [. . .]
Noted. I'll give this another pass.
| When a [webhook authentication client](#webhook-authentication-client) needs | ||
| to call an admission webhook about a given resource, it issues a `TokenRequest` | ||
| for its [webhook token acquisition service account](#webhook-token-acquisition-service-account) | ||
| to the Kubernetes API Server. The request includes: | ||
|
|
||
| 1. A `BoundObjectRef` pointing to the APIService corresponding to the resource | ||
| being admitted (e.g., `v1.networking.k8s.io`). |
There was a problem hiding this comment.
We don't want to have to mint a token for every webhook invocation, a token should be able to be retained and renewed/rotated after some percentage of its lifetime is used. I was expecting each client to only have to maintain one token per webhook, not one token per resource $group/$version per webhook.
There was a problem hiding this comment.
We don't want to have to mint a token for every webhook invocation, a token should be able to be retained and renewed/rotated after some percentage of its lifetime is used.
Yes, the tokens will be cached until their expiration, at which time they will be refreshed. We're in agreement on that. I'll make sure it gets mentioned here.
| In the case of `kube-apiserver`, the [webhook token acquisition service | ||
| account](#webhook-token-acqcuisition-service-account) will be a discoverable service | ||
| account automatically created in the boostrapping process. The name will be | ||
| randomized to discourage its abuse by other webhook authentication clients. |
There was a problem hiding this comment.
We don't randomize built-in service account names for other things ... this seems more confusing than helpful
Rather than randomizing the kube-apiserver serviceaccount, let's just make sure we have workable user stories / examples / documentation / defaults for how:
- aggregated servers configure their own distinct service account
- permissions are granted to the server's service account (what does the server admin do, what does the webhook configuration author do, what knowledge does each of those actors have, how does that combine to route permissions to the right spots?)
There was a problem hiding this comment.
Agreed. I don't think we can really do anything to prevent people from abusing it, so there's no need to complicate things. I'll update the user stories to include these.
|
|
||
| A user creates a Pod. The kube-apiserver needs to consult a validating | ||
| admission webhook. It requests a WAT from itself for its dedicated service | ||
| account, bound to APIService `v1.networking.k8s.io` with an audience derived from the |
There was a problem hiding this comment.
if it's presenting a pod (Group: "", Version: "v1"), why is it getting a token bound to APIService v1.networking.k8s.io?
There was a problem hiding this comment.
Mistake, thanks for catching. I meant to use a different resource (to avoid the weirdness of having an empty string APIGroup in the example).
| When an aggregated API server needs to call an admission webhook, it requests | ||
| a WAT from the Kubernetes API Server. Each aggregated API server should | ||
| have a dedicated service account for this purpose, as it must be named in | ||
| the token request. The request flow is: | ||
|
|
||
| 1. The aggregated API server authenticates to the kube-apiserver using | ||
| whatever credential it is configured with. That principal must be authorized | ||
| to `create serviceaccount/token` in the relevant namespace. | ||
| 2. It sends a `TokenRequest` for its dedicated service account, with a | ||
| `BoundObjectRef` pointing to the APIService it serves (e.g., | ||
| `v1.example.com`) and the appropriate audience. |
There was a problem hiding this comment.
similar comment about token reuse and wanting to maintain one reusable token per client per webhook, not per resource $group/$version
There was a problem hiding this comment.
This will be resolved if we are in agreement on this:
| ClusterTrustBundle signer attestation). To illustrate the permission model, | ||
| the following RBAC configuration is given as an example. To paraphrase Donald |
There was a problem hiding this comment.
let's make sure it's possible to set up permissions correctly given the knowledge of these two personas:
- webhook author / webhook config manifest author who knows the things they are intercepting, but not server identities
- aggregated server admin who knows the identity they are giving the server, but not the webhooks that will need to be called by the server
There was a problem hiding this comment.
This is covered by:
#6156 (comment)
Basically, webhook config manifest authors can set up a service account with "attest" on the "*" APIService (or equivalent authz check); that allows TokenRequests for that service account to be bound to a MutatingWebhookConfiguration or a ValidatingWebhookConfiguration.
Aggregated API Server admins will set up a service account with "attest" on the relevant APIService (or equivalent).
80f24d9 to
7576c3e
Compare
|
/hold @pmengelbert lost me in the commit history 😄 |
2e5c5b9 to
4d4f686
Compare
- Also did a cursory filling out of kep.yaml Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>
Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>
These updates are still WIP, to be completed shortly. Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>
Signed-off-by: Ben Petersen <admin@benjaminapetersen.me>
be23593 to
4258856
Compare
|
|
||
| 1. Verify the token's signature via the OIDC discovery endpoint. | ||
| 1. Verify that the token's audience matches the expected audience. This audience | ||
| is derived deterministically from the webhook url, and is in the format |
There was a problem hiding this comment.
Nit --- this is doubled up "is in the format is in the format"
There was a problem hiding this comment.
Thanks. This and other things have since been updated. The version I just pushed is much closer to final.
| 1. Verify the token's signature via the OIDC discovery endpoint. | ||
| 1. Verify that the token's audience matches the expected audience. This audience | ||
| is derived deterministically from the webhook url, and is in the format | ||
| is in the format `https://<url>/with/path`, where `<url>` matches that |
There was a problem hiding this comment.
While this is a good default, I think there are probably setups where the webhook backend expects a different audience? It might need to be configurable in the Mutating/ValidatingAdmissionWebhook object.
In general, it's the relying party that's in control of which audience value(s) are expected.
| #### `kube-apiserver`: | ||
| In the case of `kube-apiserver`, the [token acquisition service | ||
| account](#token-acqcuisition-service-account) will be a service with a | ||
| well-known name, `kube-system:webhook-auth`, which is automatically created |
There was a problem hiding this comment.
I would prefer if we give kube-apiserver one identity that it can use for all purposes (contacting webhooks, contacting peer replicas, uploading metrics, etc).
If that needs to be a service account, then that's fine, but it should probably be something like system:serviceaccount:kube-system:kube-apiserver.
Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>
bf2bc22 to
a82528c
Compare
|
@pmengelbert: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Uh oh!
There was an error while loading. Please reload this page.