Skip to content

Mirror ConfigMaps and Secrets referenced in ScyllaDBCluster into remote datacenters #2524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 9, 2025

Conversation

zimnx
Copy link
Collaborator

@zimnx zimnx commented Feb 28, 2025

The ScyllaDBCluster controller has been extended with a new capability of mirroring the ConfigMaps and Secrets referenced by the ScyllaDBCluster, into remote datacenters.
This improvement provides users with the convenience of managing the configuration of an entire cluster from a single centralized location.

Note: any modifications made to the ConfigMaps or Secrets do not automatically initiate a rollout to apply those changes.
A manual trigger is required to implement the updates - for example using forceRedeploymentReason.

Requires:

Fixes #2255

/cc

@zimnx zimnx added kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Feb 28, 2025
@scylla-operator-bot scylla-operator-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 28, 2025
Copy link
Contributor

@zimnx: GitHub didn't allow me to request PR reviews from the following users: zimnx.

Note that only scylladb members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

The ScyllaDBCluster controller has been extended with a new capability of mirroring the ConfigMaps and Secrets referenced by the ScyllaDBCluster, into remote datacenters.
This improvement provides users with the convenience of managing the configuration of an entire cluster from a single centralized location.

Note: any modifications made to the ConfigMaps or Secrets do not automatically initiate a rollout to apply those changes.
A manual trigger is required to implement the updates - for example using forceRedeploymentReason.

Requires:

Fixes #2255

/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@scylla-operator-bot scylla-operator-bot bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 28, 2025
@zimnx zimnx changed the title [WIP] Mirror ConfigMaps and Secrets referenced in ScyllaDBCluster into remete datacenters [WIP] Mirror ConfigMaps and Secrets referenced in ScyllaDBCluster into remote datacenters Feb 28, 2025
@scylla-operator-bot scylla-operator-bot bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 5, 2025
@zimnx zimnx force-pushed the remote-configmaps branch from 2f439a4 to 39594e3 Compare March 17, 2025 12:01
@scylla-operator-bot scylla-operator-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 17, 2025
@zimnx zimnx force-pushed the remote-configmaps branch from 39594e3 to f3429ba Compare March 18, 2025 14:25
@zimnx zimnx changed the title [WIP] Mirror ConfigMaps and Secrets referenced in ScyllaDBCluster into remote datacenters Mirror ConfigMaps and Secrets referenced in ScyllaDBCluster into remote datacenters Mar 24, 2025
@scylla-operator-bot scylla-operator-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 24, 2025
@zimnx
Copy link
Collaborator Author

zimnx commented Mar 24, 2025

ready to review
/cc rzetelskik
/cc mflendrich

Copy link
Member

@rzetelskik rzetelskik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, rest lgtm

@zimnx zimnx force-pushed the remote-configmaps branch 2 times, most recently from 0ff4bf6 to 30fc7f0 Compare March 25, 2025 13:20
@rzetelskik rzetelskik self-requested a review March 26, 2025 09:35
Copy link
Member

@rzetelskik rzetelskik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
/assign mflendrich

Copy link
Collaborator

@mflendrich mflendrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we're granting too wide RBAC access and should design for the necessary minimum, not for catch-all.

@zimnx zimnx force-pushed the remote-configmaps branch from 30fc7f0 to 6a3bedf Compare March 27, 2025 11:47
@zimnx
Copy link
Collaborator Author

zimnx commented Mar 27, 2025

clean rebase to pick up e2e timeout fix

@zimnx zimnx requested a review from mflendrich March 27, 2025 14:30
Copy link
Collaborator

@mflendrich mflendrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided offline that, in order to allow the remote SA access only to a predetermined (by the user) list of namespaces (as opposed to access to CMs/secrets across the entire cluster), we can introduce a possibility for the user to specify the set of namespaces that the remote operator will have access to.

But over the course of this review I came to a realization that this is not enough, and is making another problem more prominent -- the possibility to hijack other cms/secrets - as described in one of the comments attached.

Also attached two less important maintainability comments, but the remote resource overwrite issue seems critical to me.

Added `app.kubernetes.io/managed-by: remote.scylla-operator.scylladb.com` to all remotely managed resources.
It helps with setting up remote Informers caching only resources created by Operator to limit memory footprint.
zimnx added 2 commits March 28, 2025 15:02
…te datacenters

The ScyllaDBCluster controller has been extended with a new capability of mirroring the ConfigMaps and Secrets referenced by the ScyllaDBCluster, into remote datacenters.
This improvement provides users with the convenience of managing the configuration of an entire cluster from a single centralized location.

Note: any modifications made to the ConfigMaps or Secrets do not automatically initiate a rollout to apply those changes.
A manual trigger is required to implement the updates - for example using `forceRedeploymentReason`.
Copy link
Collaborator

@mflendrich mflendrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline (at length), this implementation comes with an unconditional requirement that remote kubernetes cluster access must have very broad permissions effectively permitting reading of secrets, including SA tokens, across all namespaces, which paves a theoretically exploitable way towards privilege escalation (by reading 3rd party access tokens) or other attacks (by overwriting 3rd party configuration stored in secrets/CMs). While mechanisms in the operator exist that will likely (in the typical case) make it very hard to overwrite or leak 3rd party secrets, direct access to the remote kubernetes cluster kubeconfig opens the floodgates.

My first and foremost recommendation is to adjust the design so that

  • remote operator access is restricted to a user-defined set of namespaces from day 1
  • RBAC-controlled access to all secrets/configmaps is not necessary (e.g. by using deterministically named secrets/CMs in the remote cluster)

but the approach predates my engagement with the project, so I'm not directly questioning it, assuming that these limitations are well-known.

To unblock possible future improvement here, over the course of review we've made adjustments give the user control over the remote namespace name (created by operator in the remote cluster), making it possible to pave a future way (see #2577) to incrementally implement the two traits expressed in the bullet points above.

Therefore I'm conditionally approving this PR with an expectation that it's decided whether those limitations (materialized by #2577) should be a prerequisite for release or not.

Also created #2576 for a code health issue (not directly introduced by this PR but somewhat exacerbated) to be addressed at a later time.

/lgtm
(with the above caveat)

/hold for acknowledgement

@scylla-operator-bot scylla-operator-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 7, 2025
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mflendrich, rzetelskik, zimnx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [mflendrich,rzetelskik,zimnx]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mflendrich
Copy link
Collaborator

/lgtm
(it was not picked up from #2524 (review))

@scylla-operator-bot scylla-operator-bot bot added the lgtm Indicates that a PR is ready to be merged. label Apr 7, 2025
@zimnx
Copy link
Collaborator Author

zimnx commented Apr 8, 2025

ack
/hold cancel

@scylla-operator-bot scylla-operator-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 8, 2025
@scylla-operator-bot scylla-operator-bot bot merged commit 48e7d16 into scylladb:master Apr 9, 2025
13 checks passed
@zimnx zimnx deleted the remote-configmaps branch April 9, 2025 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Multi-DC] Configuration Synchronization Across Multiple Datacenters
3 participants