Skip to content

Support v1alpha1.ScyllaDBDatacenter registration with global ScyllaDB Manager instance #2590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

rzetelskik
Copy link
Member

@rzetelskik rzetelskik commented Apr 14, 2025

Description of your changes: This PR adds integration between ScyllaDB clusters defined by v1alpha1.ScyllaDBDatacenters and the unmanaged, global ScyllaDB Manager instance. ScyllaDBDatacenters can now be automatically created and deleted from state of the global ScyllaDB Manager instance.

Which issue is resolved by this Pull Request:
Resolves #2570

/kind feature
/priority important-soon
/cc

@scylla-operator-bot scylla-operator-bot bot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Apr 14, 2025
Copy link
Contributor

@rzetelskik: GitHub didn't allow me to request PR reviews from the following users: rzetelskik.

Note that only scylladb members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Description of your changes: wip

Which issue is resolved by this Pull Request:
Resolves #2570

/kind feature
/priority important-soon
/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@scylla-operator-bot scylla-operator-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Apr 14, 2025
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rzetelskik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@scylla-operator-bot scylla-operator-bot bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 14, 2025
@rzetelskik rzetelskik force-pushed the manager-sdc-registration branch 7 times, most recently from c069b48 to badb7c7 Compare April 15, 2025 11:44
@rzetelskik rzetelskik changed the title [WIP] Support v1alpha1.ScyllaDBDatacenter registration with global ScyllaDB Manager instance Support v1alpha1.ScyllaDBDatacenter registration with global ScyllaDB Manager instance Apr 15, 2025
@scylla-operator-bot scylla-operator-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 15, 2025
@rzetelskik
Copy link
Member Author

/cc zimnx mflendrich
Ready to review. Please pay special attention to differences between this implementation and the existing manager-controller.

@rzetelskik rzetelskik force-pushed the manager-sdc-registration branch from badb7c7 to 557bb7c Compare April 15, 2025 11:59
@rzetelskik
Copy link
Member Author

@zimnx one thing I'm unsure about and so I'd like to hear your opinion. This PR introduces verification utils for v1alpha1.ScyllaDBDatacenters, most importantly the Verify function which is supposed to mirror the existing logic for v1.ScyllaClusters. The existing one checks for the upgrade status (

o.Expect(sc.Status.Upgrade.FromVersion).To(o.Equal(sc.Status.Upgrade.ToVersion))
), which, for ScyllaDBDatacenters is reflected in a conditionally existing ConfigMap. Should the upgrade context CM be verified as part of this method then? To me it seems like only verifying it in upgrade-related tests would suffice.

@rzetelskik rzetelskik force-pushed the manager-sdc-registration branch from 557bb7c to d485a02 Compare April 15, 2025 12:14
@rzetelskik
Copy link
Member Author

@rzetelskik: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gke-parallel d485a02 link true /test e2e-gke-parallel
Full PR test history. Your PR dashboard.

#2096 (comment)
known flake
/retest

@rzetelskik
Copy link
Member Author

@rzetelskik: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gke-parallel-clusterip d485a02 link true /test e2e-gke-parallel-clusterip
Full PR test history. Your PR dashboard.

timed out
/retest

@zimnx
Copy link
Collaborator

zimnx commented Apr 15, 2025

@zimnx one thing I'm unsure about and so I'd like to hear your opinion. This PR introduces verification utils for v1alpha1.ScyllaDBDatacenters, most importantly the Verify function which is supposed to mirror the existing logic for v1.ScyllaClusters. The existing one checks for the upgrade status (

o.Expect(sc.Status.Upgrade.FromVersion).To(o.Equal(sc.Status.Upgrade.ToVersion))

), which, for ScyllaDBDatacenters is reflected in a conditionally existing ConfigMap. Should the upgrade context CM be verified as part of this method then? To me it seems like only verifying it in upgrade-related tests would suffice.

Verify is executed post rollout. It should validate whether main and dependant objects and status are in expected state.
Upgrade context ConfigMap shouldn't exist at all at this point. We could validate it's not existing, although it's a implementation detail so maybe black box approach would be better in the long term. I would prefer the latter.

@rzetelskik rzetelskik force-pushed the manager-sdc-registration branch from d485a02 to 77c829a Compare April 16, 2025 10:02
@rzetelskik
Copy link
Member Author

  • Modified GlobalScyllaDBManager controller to unit test makeScyllaDBManagerClusterRegistrationForScyllaDBDatacenter func

@rzetelskik rzetelskik force-pushed the manager-sdc-registration branch from 77c829a to 206fad0 Compare April 16, 2025 13:31
@rzetelskik
Copy link
Member Author

  • Updated condition names to reflect sync func name

@rzetelskik rzetelskik force-pushed the manager-sdc-registration branch 2 times, most recently from 3213446 to 9432c1f Compare April 16, 2025 21:41
@rzetelskik rzetelskik force-pushed the manager-sdc-registration branch 2 times, most recently from c637f08 to b0fd082 Compare April 16, 2025 21:49
@rzetelskik rzetelskik force-pushed the manager-sdc-registration branch from b0fd082 to 64def13 Compare April 16, 2025 22:14
Copy link
Contributor

@rzetelskik: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gke-parallel 64def13 link true /test e2e-gke-parallel

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Collaborator

@zimnx zimnx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first batch

type LocalScyllaDBReference struct {
// kind specifies the type of the resource.
Kind string `json:"kind"`
// name specifies the name of the resource.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be worth mentioning that the name references object in the same namespace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the Local in the struct name is a conventional way of saying this (by similarity to LocalObjectReference) - I could add a comment to the entire struct though, if it's not clear enough?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct name isn't visible in kubectl explain, while comments are.

@@ -82,6 +82,10 @@ func (o *Observer) GetGenericHandlers() cache.ResourceEventHandlerFuncs {
}
}

func (o *Observer) EventRecorder() record.EventRecorder {
return o.eventRecorder
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It's not visible why this getter is necessary from this commit point of view. Either squash it into where it's being used, or add usage to commit introducing it.

globalScyllaDBManagerSelector = labels.SelectorFromSet(labels.Set{
naming.GlobalScyllaDBManagerRegistrationLabel: naming.LabelValueTrue,
})
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we keep selectors in naming package, this could be moved there as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we keep selectors in naming package

only some of them, see e.g.

ncSelector := labels.SelectorFromSet(labels.Set{

scSelector := labels.SelectorFromSet(labels.Set{

Is there a benefit to making it public?

func (gsmc *Controller) getScyllaDBManagerClusterRegistrations() (map[string]*scyllav1alpha1.ScyllaDBManagerClusterRegistration, error) {
smcrs, err := gsmc.scyllaDBManagerClusterRegistrationLister.List(labels.SelectorFromSet(labels.Set{
naming.GlobalScyllaDBManagerLabel: naming.LabelValueTrue,
}))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, move selector to naming package

}

smcrMap := map[string]*scyllav1alpha1.ScyllaDBManagerClusterRegistration{}
for i := range smcrs {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preallocate space for len(smcrs) keys


func (gsmc *Controller) getScyllaDBManagerClusterRegistrations() (map[string]*scyllav1alpha1.ScyllaDBManagerClusterRegistration, error) {
smcrs, err := gsmc.scyllaDBManagerClusterRegistrationLister.List(labels.SelectorFromSet(labels.Set{
naming.GlobalScyllaDBManagerLabel: naming.LabelValueTrue,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicitly use .Namespace(corev1.NamespaceAll) if you're listing in all namespaces

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Both end up calling ListAll, it's just syntax sugar?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's explicit about intention, this looks buggy. I had to dive into logic to understand whether it's desired or not.

}()

scyllaDBDatacenters, err := gsmc.scyllaDBDatacenterLister.List(globalScyllaDBManagerSelector)
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicitly use .Namespace(corev1.NamespaceAll) if you're listing in all namespaces

smcrMap := map[string]*scyllav1alpha1.ScyllaDBManagerClusterRegistration{}
for i := range smcrs {
smcrMap[smcrs[i].Name] = smcrs[i]
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scmrs having the same name in different namespaces would collide here. It feels like a map[string]map[string]* should be used across this controller to distinguish different namespaces and preserve ability to use our controllerhelper functions suitable for per namespace reconcilation.

old.(*corev1.Namespace),
cur.(*corev1.Namespace),
smcrc.handlers.EnqueueAllWithUntypedFilterFunc(hasGlobalScyllaDBManagerNamespaceName),
smcrc.deleteNamespace,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnqueueAll enqueues all ScyllaDBManagerClusterRegistration objects from untypedObj namespace. But this is Namespace so it has empty .Namespace field, which value turns out to be equal to special corev1.NamespaceAll, meaninig it will enqueue all ScyllaDBManagerClusterRegistration from all namespaces. Is this expected behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, e.g. deletion of scylla-manager namespace should unblock finalization of all existing SMCRs having the global-manager label. Although I can write a dedicated method if you think the route to this behaviour is hacky

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's very much hacky, lets have dedicated method expressing what is desired behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support v1alpha1.ScyllaDBDatacenter registration with global ScyllaDB Manager instance
3 participants