-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Check known required permissions for install before installing with the helm applier #1858
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
2991d5d
to
65ef8a2
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1858 +/- ##
==========================================
- Coverage 68.94% 64.76% -4.19%
==========================================
Files 66 68 +2
Lines 5236 5883 +647
==========================================
+ Hits 3610 3810 +200
- Misses 1394 1821 +427
- Partials 232 252 +20
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, will you be adding testing in the authorization package?
} | ||
} | ||
|
||
func (a *rbacPreAuthorizer) PreAuthorize(ctx context.Context, manifestManager user.Info, manifestReader io.Reader) (map[string][]rbacv1.PolicyRule, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a short doc comment would be nice, especially about the return value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, adding in next commit
gvk := uObj.GroupVersionKind() | ||
restMapping, err := a.restMapper.RESTMapping(gvk.GroupKind(), gvk.Version) | ||
if err != nil { | ||
errs = append(errs, fmt.Errorf("could not get REST mapping for object %d in manifest with GVK %s: %w", i, gvk, err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to also try and add name of the object if possible via uObj.GetName
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, adding the object’s name should make debugging easier. Adding next commit.
} | ||
|
||
func (a *rbacPreAuthorizer) attributesAllowed(ctx context.Context, attributesRecord authorizer.AttributesRecord) (bool, error) { | ||
decision, _, err := a.authorizer.Authorize(ctx, attributesRecord) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second returned value is a 'reason' string. Would that be included in error? If not, maybe it would make sense to either return it or wrap the returned error with it for better context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep we should return it, done in next commit
} | ||
|
||
for ns, nsMissingRules := range missingRules { | ||
if compactMissingRules, err := validation.CompactRules(nsMissingRules); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that CompactRules cannot return a non nil error right now, but a comment about why we're not handling error might be a good idea for the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding in next commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason why we shouldn't just go ahead and account for that possibility for future-proofing reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I guess the way this works is that if there is an error, then we just stick with the original non-compacted set of missing rules, which seems like a reasonable fallback. I think that is actually the important detail to call out in the comment.
} | ||
|
||
func hasAggregationRule(clusterRole *rbacv1.ClusterRole) bool { | ||
return clusterRole.AggregationRule != nil && len(clusterRole.AggregationRule.ClusterRoleSelectors) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(clusterRole.AggregationRule.ClusterRoleSelectors)
is 0 would it not mean "match all" in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment to try to clarify, does this help? see next commit for comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all mostly copied code from kubernetes (e.g. https://github.com/kubernetes/kubernetes/blob/fd0a3484176f8f29f038b4b28132dd3c7583eb00/pkg/registry/rbac/clusterrole/policybased/storage.go#L124-L126)
Looking at this again, I almost wonder if we could have the escalationChecker simply call this code with a fake storage backend. That way we'd pick up any changes that are made here and avoid maintaining our own copy of it.
missingRules, err := h.PreAuthorizer.PreAuthorize(ctx, &ceServiceAccount, strings.NewReader(tmplRel.Manifest)) | ||
|
||
var preAuthErrors []error | ||
if len(missingRules) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missingRules iteration below would not fire if it has no items, so we should be ok with skipping this if
missingRuleDescriptions = append(missingRuleDescriptions, ruleDescription(ns, rule)) | ||
} | ||
} | ||
slices.Sort(missingRuleDescriptions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
if len(preAuthErrors) > 0 { | ||
return nil, "", fmt.Errorf("pre-authorization failed: %v", preAuthErrors) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is a fairly self-contained logic, how about extracting a helper function to enclose it and make Apply
easier to follow?
Something like:
if features.OperatorControllerFeatureGate.Enabled(features.PreflightPermissions) {
if _, err := preAuthPreflight(...); err != nil {...}
}
@@ -164,6 +197,34 @@ func (h *Helm) buildHelmChart(bundleFS fs.FS, ext *ocv1.ClusterExtension) (*char | |||
return h.BundleToHelmChartFn(bundleFS, ext.Spec.Namespace, watchNamespace) | |||
} | |||
|
|||
func (h *Helm) template(ctx context.Context, ext *ocv1.ClusterExtension, chrt *chart.Chart, values chartutil.Values, post postrender.PostRenderer) (*release.Release, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to merge it with getReleaseState
somehow? I guess the main thing is i.ClientOnly and the fact that we need to instantiate a separate client for that.
If it doesn't and ends up clumsy, I'd still probably adjust the name to be closer togetReleaseState
, since logically it's almost the same
"sigs.k8s.io/controller-runtime/pkg/client" | ||
) | ||
|
||
type PreAuthorizer interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could declare this interface on the applier/helm side to decouple it completely from this package as a dependency, but I saw that we're usually declaring interfaces on the producer side, probably because those are then reused, so seems fine as is now too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to add unit tests based on your general comment I think. If so it might one more reason to change where the interface is made, could help with testing...
I added some tests but they still need to be tweaked/finalized. I noticed while writing them up that due to the order of the logic where missing rules are checked before escalation, if bind/escalate are in play but we're missing the explicit permissions that bind/escalate would give us we end up with a result where there's no error but we do have missing rules. @joelanford is that what we would want? I would think if we can bind or escalate that we would not return that we're missing those rules since the SA can grant them. |
hack/tools/k8sMaintainer.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: move to hack/tools/k8smaintainer/main.go
(and move the doc to hack/tools/k8smaintainer/README.md
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, make it a separate go module to avoid the golang.org/x/mod
dependencies being added to our main module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the move and naming makes sense, but we already have an indirect usage of x/mod
in our go.mod on main: golang.org/x/mod v0.24.0 // indirect
and we have that after go mod tidy
even with the creation of a separate module:
from a run on main
:
operator-controller main $ go mod why -m -vendor golang.org/x/mod
# golang.org/x/mod
github.com/operator-framework/operator-controller/hack/ci/custom-linters/analyzers
github.com/operator-framework/operator-controller/hack/ci/custom-linters/analyzers.test
golang.org/x/tools/go/analysis/analysistest
golang.org/x/tools/internal/testenv
golang.org/x/mod/modfile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without a new go module, the changes are made in 7a6a943
// - nil: indicates that the authorization check passed and the operation is permitted. | ||
// - non-nil error: indicates that the authorization failed (either due to insufficient permissions | ||
// or an error encountered during the check), the error provides a slice of several failures at once. | ||
func (a *rbacPreAuthorizer) PreAuthorize(ctx context.Context, manifestManager user.Info, manifestReader io.Reader) (map[string][]rbacv1.PolicyRule, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend avoiding returning a map because of the issues it causes with random iteration order.
WDYT about defining type ScopedPolicyRules struct
that includes the namespace and list of policy rules, and then returning []ScopedPolicyRules
instead?
// - non-nil error: indicates that the authorization failed (either due to insufficient permissions | ||
// or an error encountered during the check), the error provides a slice of several failures at once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-nil errors are just when there were issues evaluating the permissions, right? If we successfully evaluate permissions and determine that some rules are missing, then we'll return a nil
error with some missing rules.
I would also say that the returned error may include multiple failures encountered during rule evaluation (but it also may not, for example if there was a failure decoding the manifest).
Signed-off-by: Joe Lanford <[email protected]>
Signed-off-by: Joe Lanford <[email protected]>
Signed-off-by: Joe Lanford <[email protected]>
Signed-off-by: Joe Lanford <[email protected]>
This is the manual version. Needed to change rbac.go a bit to allow for v1.32.2 code changes, but basically as-was Signed-off-by: Brett Tofel <[email protected]>
go.mod made in the current form the tool generates Signed-off-by: Brett Tofel <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
rbac_test.go likely coming soon Signed-off-by: Brett Tofel <[email protected]>
7a6a943
to
e974006
Compare
Signed-off-by: Tayler Geiger <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
e974006
to
8f76fa8
Compare
Signed-off-by: Tayler Geiger <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
Use []ScopedPolicyRules struct for first return value in PreAuthorize() to avoid issues with random iteration order in previous map return value. Signed-off-by: Tayler Geiger <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
Name: fmt.Sprintf("system:serviceaccount:%s:%s", ns, saName), | ||
} | ||
// Ensure the manifest contains only allowed rules so that the escalation check succeeds with no missing rules | ||
modifiedManifest := strings.Replace(testManifest, `- apiGroups: ["*"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary? The escalation check should pass with the create certificates bit included, that's the verb that we're testing we can escalate for, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is meant to be a happy path test, right? It seems like the dry-run w/ fake storage we're now using is correctly catching that the manifest includes a certificates rule that the service account w/ escalatingClusterRole
doesn’t have permission to escalate to create the certificate. I think we were missing what should have been an escalation error before?
This version of the test: uses privilegedClusterRole
, new test name, passes:
func TestPreAuthorize_SuccessWithFullPermissions(t *testing.T) {
t.Run("preauthorize succeeds with full permissions using unchanged manifest", func(t *testing.T) {
featuregatetesting.SetFeatureGateDuringTest(t, features.OperatorControllerFeatureGate, features.PreflightPermissions, true)
// Use a client configured with a privileged cluster role that grants all permissions,
// including certificate creation.
fakeClient := setupFakeClient(privilegedClusterRole)
preAuth := NewRBACPreAuthorizer(fakeClient)
testServiceAccount := user.DefaultInfo{
Name: fmt.Sprintf("system:serviceaccount:%s:%s", ns, saName),
}
// Use the unchanged testManifest which includes the certificates rule.
missingRules, err := preAuth.PreAuthorize(context.TODO(), &testServiceAccount, strings.NewReader(testManifest))
require.NoError(t, err)
require.Equal(t, []ScopedPolicyRules{}, missingRules)
})
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must still be misunderstanding how the escalation works or something because I wrote the test the way it was originally, where the SA does not have the "create" on "certificates" explicitly granted, so we can test if the permissions checking logic results in no missing rules and passes, because as I understood it, if the SA has all the verbs on RBAC objects and is also given bind/escalate, then additional rules like "create" on "certificates" that it doesn't explicitly start with should not cause a failure of the permissions check because it can bind or escalate those. When I ran this test locally in its original form where the SA has bind/escalate but not "create certificates," it passes with no missing rules as I expected.
` | ||
missingRules, err := preAuth.PreAuthorize(context.TODO(), &testServiceAccount, strings.NewReader(manifest)) | ||
require.Error(t, err) | ||
// Instead of calling it opaque, we check that the error contains "forbidden:" to indicate it's from storage-layer checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not just a failed escalation check? How is that the storage layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably redo the comments, I just made them quickly to keep it straight in my head where the logic that created the error came from, but that is from the Create with fakestorage, coming from this code, for instance:
@@ -255,16 +257,20 @@ type decodedManifest struct { | |||
func (dm *decodedManifest) rbacObjects() []client.Object { | |||
objects := make([]client.Object, 0, len(dm.clusterRoles)+len(dm.roles)+len(dm.clusterRoleBindings)+len(dm.roleBindings)) | |||
for obj := range maps.Values(dm.clusterRoles) { | |||
objects = append(objects, &obj) | |||
o := obj // avoid aliasing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer a problem in 1.22+
// If there are errors here, you might choose to log them or collect them. | ||
// We'll still try to run the upstream escalation check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to collect and return these errors. This is the equivalent of the SelfSubjectAccessReview. That the check itself fails to even tell us whether we had missing rules, we need to tell the user what happened.
for _, obj := range dm.rbacObjects() { | ||
if err := ec.checkEscalation(ctx, manifestManager, obj); err != nil { | ||
// In Kubernetes 1.32.2 the specialized PrivilegeEscalationError is gone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the error type that I added in my copied code so that we could extract the missing rules and separate out the rule evaluation errors.
When we get to phase 2 of this work, we're going to have to do something like that regardless.
// If escalation check fails, return the detailed missing rules along with the error. | ||
if len(escalationErrors) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The missingRules
are separate from the escalation check. If there are missingRules
, but there are no escalation errors, we still need to report the missing rules.
For example, consider a manifest where the only contents are a secret and configmap. There will never be an escalation there because only RBAC objects can fail escalation checks. If a user lacks permissions on either secrets or configmaps, we need to fail return those and fail the preflight check.
return allMissingPolicyRules, fmt.Errorf("authorization evaluation errors: %w", errors.Join(preAuthEvaluationErrors...)) | ||
} | ||
return allMissingPolicyRules, nil | ||
// If the escalation check passed, override any computed missing rules (since the final decision is allowed). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other comment. This is incorrect. The missing rules from authorizeAttributesRecords
need to be unioned with the missing rules from the escalation checks.
The fact that all we're getting now is the opaque string-based errors from the escalation check is problematic even now because we can't compute this union, which means we are very likely going to repeat missing rules and make the condition messages longer and more confusing.
// Merge extra Role rules if present. | ||
key := types.NamespacedName{Namespace: v.Namespace, Name: v.Name} | ||
if extra, ok := ec.extraRoles[key]; ok { | ||
v.Rules = append(v.Rules, extra.Rules...) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the right place to merge extra rules. The extraRoles
and extraClusterRoles
need to be used to augment rules returned by the rules resolver when evaluating a role binding or cluster role binding.
The rules resolver only knows about roles and cluster roles that exist in our cache (via KAS and etcd). However, we may have role bindings and cluster role bindings in our manifest that reference other roles and cluster roles in our manifest.
So when ConfirmNoEscalation
eventually executes, the rules passed to it need to include the rules that would be bound if the referenced roles or cluster roles did exist.
featuregatetesting.SetFeatureGateDuringTest(t, features.OperatorControllerFeatureGate, features.PreflightPermissions, true) | ||
fakeClient := setupFakeClient(privilegedClusterRole) | ||
preAuth := NewRBACPreAuthorizer(fakeClient) | ||
testServiceAccount := user.DefaultInfo{Name: fmt.Sprintf("system:serviceaccount:%s:%s", ns, saName)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can DRY up these testServiceAccount initializations by making it a global since I think we use the exact same value every time
This reverts commit 2681194.
Opened kubernetes/kubernetes#130955 to hopefully get the PrivilegedEscalationError upstreamed in k8s |
Description
This is a successor PR to #1716 and is primarily the contributions of @trgeiger and @joelanford .
Goal and title, remain the same. Approach is a bit modified:
Pulls in RBAC authorization code from
k8s.is/kubernetes
, uses that code to check GET and other verb permissions as prelude to and as response from a Helm dry-runTo pull in the RBAC auth code concisely, repeatably and with warnings if the used code changes, we add a maintenance utility that adds the needed
replace
directives for all related staging modules (e.g.,k8s.io/api
,k8s.io/apimachinery
, etc.) and they are automatically pinned to the corresponding published version.All this code is initially called at
in
internal/operator-controller/applier/helm.go
Reviewer Checklist