Skip to content

✨ Add support for installing bundles with webhooks #1914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

perdasilva
Copy link
Contributor

@perdasilva perdasilva commented Apr 11, 2025

Description

Depends on #1893

Adds webhook support to bundle renderer with certificate management. Based on Joe's PoC #1506

Only the cert-manager provisioner is implemented in this PR. In a follow up, we'll add the openshift-serviceca provisioner and a way to configure OLM for one or the other.

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 11, 2025
Copy link

netlify bot commented Apr 11, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 5e3f09c
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/68231debb6df6900087233ea
😎 Deploy Preview https://deploy-preview-1914--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@perdasilva perdasilva changed the title Add wh support ✨ Add support for bundles with webhooks Apr 11, 2025
@perdasilva perdasilva changed the title ✨ Add support for bundles with webhooks ✨ Add support for installing bundles with webhooks Apr 11, 2025
Copy link

codecov bot commented Apr 11, 2025

Codecov Report

Attention: Patch coverage is 92.07773% with 53 lines in your changes missing coverage. Please review.

Project coverage is 68.92%. Comparing base (540804c) to head (d663ee0).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...-controller/rukpak/render/generators/generators.go 87.83% 17 Missing and 10 partials ⚠️
...operator-controller/rukpak/util/testing/testing.go 0.00% 14 Missing ⚠️
...troller/rukpak/render/certproviders/certmanager.go 94.00% 4 Missing and 5 partials ⚠️
internal/operator-controller/rukpak/util/util.go 85.71% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1914      +/-   ##
==========================================
+ Coverage   66.78%   68.92%   +2.13%     
==========================================
  Files          75       76       +1     
  Lines        6337     6931     +594     
==========================================
+ Hits         4232     4777     +545     
- Misses       1841     1874      +33     
- Partials      264      280      +16     
Flag Coverage Δ
e2e 42.08% <18.32%> (-3.11%) ⬇️
unit 59.38% <91.77%> (+3.00%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 17, 2025
@perdasilva perdasilva marked this pull request as ready for review April 28, 2025 13:12
@perdasilva perdasilva requested a review from a team as a code owner April 28, 2025 13:12
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 28, 2025
@openshift-ci openshift-ci bot requested review from grokspawn and OchiengEd April 28, 2025 13:13
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 28, 2025
APIVersion: admissionregistrationv1.SchemeGroupVersion.String(),
},
ObjectMeta: metav1.ObjectMeta{
GenerateName: fmt.Sprintf("%s-", generateName),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using GenerateName is what OLMv0 does, correct? I think it does this so that multiple validating/mutating webhook configuration objects can be created when own/single namespace bundles are installed multiple times.

That is an anti-goal for OLMv1, so perhaps it makes sense to deviate from OLMv0's use of GenerateName in this case.

Unless there is a really strong reason not to, I'd prefer to use a deterministic name here instead, as that would ultimately make the set of on-cluster objects deterministic as well. I'm sure there are many benefits, but one that comes to mind is that ClusterExtension SAs would be able to specify the deterministic name in advance when setting up RBAC for get/update/patch/delete verbs with the correct resourceName.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another benefit: using a deterministic name here means that this webhook can only be installed once, regardless of the install modes supported by the extension. That is directly in line with one of the stated invariants of OLMv1 (no two cluster extensions can manage the same object)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to use Name instead. I'm trying to think if there's foot gun potential. I might be a little more defensive here and ensure that any - suffix is removed from what is defined in the CSV to avoid any bad name errors.

We might already, but I'll double check and ensure that all webhook names defined in the CSV (for a particular type of webhook) are unique.

I guess if there is a naming clash, i.e. two different bundles call their webhook "my-webhook", the user can just install the bundle in another namespace as an escape hatch...wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, now I understand better the @joelanford recommendation/idea: https://redhat-internal.slack.com/archives/C0881N26CGP/p1745960594662879?thread_ts=1745942335.360109&cid=C0881N26CGP

I am OK with since in OLMv1 we should have one instance of each project only the cluster

👍

@@ -17,6 +19,21 @@ func ObjectNameForBaseAndSuffix(base string, suffix string) string {
return fmt.Sprintf("%s-%s", base, suffix)
}

func ToUnstructured(obj client.Object) (*unstructured.Unstructured, error) {
gvk := obj.GetObjectKind().GroupVersionKind()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A client.Object is not guaranteed to have a populated GVK. If we need guarantees about setting the GVK on the unstructured type, we should also pass in a scheme so that we can lookup the GVK based on the object type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I've just hardened it by checking if version and kind are empty. I don't think we need to add the complexity of passing a scheme. At least not for now =D

Copy link
Contributor

@camilamacedo86 camilamacedo86 Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want the GKV and use it is the right approach then, I think we should use the controller-runtime implementation to do so ( GVKForObject ): https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/client/apiutil/apimachinery.go#L95C6-L95C18

Copy link
Contributor Author

@perdasilva perdasilva Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need it right now. The precondition for the function is that the client.Object will carry the GVK. If not, then error. But, I'd say check how we're using it in this PR and let me know if that's not the right way to create an object with GVK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that's acceptable since we only use it with CertManager? That way, we'll have the GKV. Is that what you were thinking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't get what it buys us apart from allowing the function to take client.Objects that don't have a gvk set.

I guess my question is, if we do:

ToUnstructured(&certmanagerv1.Certificate{
		TypeMeta: metav1.TypeMeta{
			APIVersion: certmanagerv1.SchemeGroupVersion.String(),
			Kind:       "Certificate",
		}
		...
)

do we get anything out of adding the scheme? Or would adding it just enable us to do something like:

ToUnstructured(&certmanagerv1.Certificate{}, scheme)

or take arbitrary client.Objects at the cost of creating a new scheme and registering the certmanagerv1 scheme to it?

Since we don't have a need for handling arbitrary client.Objects, is there any point in making the method more complex right now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if we check/fail if GVK is unset, then I think that's sufficient.

@@ -13,6 +13,7 @@ const (
// Ex: SomeFeature featuregate.Feature = "SomeFeature"
PreflightPermissions featuregate.Feature = "PreflightPermissions"
SingleOwnNamespaceInstallSupport featuregate.Feature = "SingleOwnNamespaceInstallSupport"
WebhookSupport featuregate.Feature = "WebhookSupport"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to add support for new providers later, how do you envision that looking from a feature gate standpoint?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, downstream will use a different provider, but the implementation remains the same.

I don’t see a strong need to support multiple providers upstream for now — downstream will likely stick with one as well. Since this is under an alpha flag, we have flexibility to adjust later.

So I think this is fine as-is and no need to optimize prematurely. We can revisit if we decide to support more providers in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are missing my point (which I should have explained a bit better 😅 ). Let's say we implement webhook support now with cert-manager, but then later decide we want to add a separate implementation (e.g. BuiltIn or OPA's cert-controller)?

My point is, I think there is a difference between "we support webhooks" and "we support a certain certificate provider".

I would suggest calling this WebhookProviderCertManager so that if/when we introduce other cert providers upstream, we don't have a strange naming convention where WebhookSupport actually means cert-manager webhook support and WebhookSupportFooBar means "webhook support with the foobar provider".

And then once at least one provider's feature gate graduates to GA, then we can say that we support bundles with webhooks.

I also think we should explore implementing OCP's Service CA provider in our upstream.

That would benefit Red Hat: if Red Hat implements it only downstream, they (we) would have to carry patches in the downstream codebase that could invite conflicts when syncing. There is no precedent yet for carrying a patch in the downstream Go source code, and I'm not convinced it makes sense to change that for this feature.

It would also benefit other users of OCP. For example, if an OCP user wants to disable OCP's OLM and use upstream OLM instead, they would still be able to use the service-ca provider. We've talked about this possibility with SRE organizations that provide managed OpenShift and that want a consistent OLM surface across a fleet of OCP clusters with varying OCP versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another general benefit: If we use a naming convention like WebhookProvider<ProviderName> for the feature gates, we can directly translate the set of enabled webhook providers to a --webhook-provider flag's presence and the set of allowed values (i.e. the set of <ProviderName> suffixes).

Then, distro maintainers retain the choice of:

  • no webhook support (don't set the flag)
  • webhook support with whichever provider they prefer (they set the flag to that provider)

All this driven by the feature gate naming convention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the featuregate name

joelanford
joelanford previously approved these changes Apr 29, 2025
Copy link
Member

@joelanford joelanford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! My only open question (that I already left in a code comment) is how we would introduce more certificate providers in the future?

Presumably we'd want a feature gate per provider, so that each provider could mature independently? If so, should we have CertManagerWebhookSupport for the current feature gate name?

I don't think we have to answer that question or change what we've got in the PR right now, so approving!

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2025
@perdasilva perdasilva enabled auto-merge April 30, 2025 08:46
camilamacedo86

This comment was marked as off-topic.

camilamacedo86
camilamacedo86 previously approved these changes May 1, 2025
Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@perdasilva

Everything looks great to me! 👍
Awesome work 🥇

Just one thought: I think it would be valuable to have some e2e tests covering install/upgrade scenarios with webhooks.

Do you think we could add those before merging? Or, since the feature is behind a flag, maybe we could make sure to cover that in a follow-up?

Other small nits (nice-to-haves): #1914 (comment)

Aside from that, I'm all good with it.

/lgtm
/approved

@perdasilva perdasilva added this pull request to the merge queue May 1, 2025
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 1, 2025
@camilamacedo86 camilamacedo86 removed this pull request from the merge queue due to a manual request May 1, 2025
@camilamacedo86 camilamacedo86 requested a review from joelanford May 1, 2025 11:24
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label May 5, 2025
Copy link

openshift-ci bot commented May 5, 2025

New changes are detected. LGTM label has been removed.

@perdasilva perdasilva force-pushed the add-wh-support branch 2 times, most recently from 8338dc5 to 7d0dce0 Compare May 5, 2025 10:20
@perdasilva
Copy link
Contributor Author

Just one thought: I think it would be valuable to have some e2e tests covering install/upgrade scenarios with webhooks.

Do you think we could add those before merging? Or, since the feature is behind a flag, maybe we could make sure to cover that in a follow-up?

I think this falls under the discussion around having e2e tests for feature-gated code. I don't know that we've landed on a solid answer yet, and I think we may want a separate e2e suite (maybe?) for FGs? Because of this, I think it will come in a follow-up.

Copy link

openshift-ci bot commented May 7, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: camilamacedo86

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 7, 2025
Per Goncalves da Silva added 7 commits May 12, 2025 09:59
Signed-off-by: Per Goncalves da Silva <[email protected]>
Signed-off-by: Per Goncalves da Silva <[email protected]>
Signed-off-by: Per Goncalves da Silva <[email protected]>
Signed-off-by: Per Goncalves da Silva <[email protected]>
Signed-off-by: Per Goncalves da Silva <[email protected]>
Signed-off-by: Per Goncalves da Silva <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants