Skip to content

Latest commit

 

History

History
1216 lines (881 loc) · 57.1 KB

File metadata and controls

1216 lines (881 loc) · 57.1 KB

KEP-4958: CSI Sidecars All In One

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

  • (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
  • [] (R) KEP approvers have approved the KEP status as implementable
  • (R) Design details are appropriately documented
  • [] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
    • e2e Tests for all Beta API Operations (endpoints)
    • (R) Ensure GA e2e tests for meet requirements for Conformance Tests
    • (R) Minimum Two Week Window for GA e2e tests to prove flake free
  • (R) Graduation criteria is in place
  • (R) Production readiness review completed
  • (R) Production readiness review approved
  • "Implementation History" section is up-to-date for milestone
  • User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
  • Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

We propose to combine the source code of the CSI Sidecars in a monorepo, Instead of just putting the code repositories together, it is expected that the program entries for all sidecars will be consolidated. therefore we can:

  • Improve the CSI Sidecar release process by reducing the number of components released
  • Decrease the maintenance tasks the SIG Storage community maintainers do to maintain the Sidecars
  • Propogate changes in common libraries used by CSI Sidecars immediately instead of through additional PRs
  • Reduce the number of components CSI Driver authors and cluster administrators need to keep up to date in k8s clusters

As a side effects of combining the CSI Sidecars into a single component we also

  • Reduce the memory usage/API Server calls done by the CSI Sidecars through the usage of a shared informer.
  • Reduce the cluster resource requirements need to run the CSI Sidecars

Motivation

Increased maintenance tasks on components maintained by the SIG Storage community

The SIG Storage community maintains many storage related projects, each on its own git repo including:

  • CSI Drivers - SMB CSI Driver, NFS CSI Driver, Hostpath CSI Driver, ISCSI CSI Driver, NVMf CSI Driver
  • CSI Sidecars
    • Typically deployed with the controller component of the CSI Driver: external-attacher, external-provisioner, external-resizer, external-snapshotter, external-health-monitor (alpha), livenessprobe
    • Typically deployed with the node component of the CSI Driver: node-driver-registrar, livenessprobe
  • Controllers
    • snapshot-controller, volume-data-source-validator (beta)
  • Webhooks
    • csi-snapshot-validation-webhook
  • CSI libraries and utilities
    • csi-lib-utils, csi-release-tools, csi-test, lib-volume-populator (beta)
  • Host binaries
    • CSI Proxy As part of the maintenance work of these components the SIG Storage community:
  1. Bumps the go runtime, Which usually fix vulnerabilities, then the application binary is rebuild and a new image is released. this is done in csi-release tools and propogated to the other repos(example) The effort is part of point #3 below.

  2. Updates the dependencies to the latest version, which usually have new releases fixing vulnerabilities, the SIG Storage community reviewers/approvers look at every PR generated by a bot and LGTM/approve it. Because we have different repos the human effort is multiplied. e.g. review # dependencies * # CSI Sidecars PRs (example)

  3. Propogates changes in CSI related dependencies across all the CSI sidecars and CSI Drivers that need them. csi-release-tools has common build utilities used across all the repos, whenever there's a change in this component it's need to be propogated across all the repos.(example). Because we have different repos the human effort is multiplied e.g. make (# updates in csi-release-tools + # new changes in csi-lib-utils) * # CSI Sidecars.

To keep dependencies up to date the SIG Storage community uses https://github.com/dependabot which is a bot that automatically creates a PR whenever a dependency creates a new release. As a side effect, after enabling the bot the number of PRs increased. Also note that because each component is on its own repo a bump in a dependency(assuming that the dependency is shared among many CSI Sidecars) is multiplied accross of them.

Stats for dependency/vuln updates across CSI Sidecars as of Aug 11th, 2023.

CSI Sidecar \ PRs reviewed & merged Dependabot dependency update csi-release-tools propagation csi-lib-utils
external-attacher 14(unreleased)
12 (release 4.3.0)
8 (release 4.2.0)
2 (unreleased)~71 (lifetime) ~15 (lifetime)
external-provisioner 36 (unreleased)
30 (release 3.5.0)
11 (release 3.4.0)
2 (unreleased)~75 (lifetime) ~19 (lifetime)
external-resizer 5 (release 1.8.0)
5 (release 1.7.0)
2 (unreleased)~62 (lifetime) ~10 (lifetime)
external-snapshotter 14 (unreleased) ~90 (lifetime) ~19 (lifetime)
node-driver-register 13 (unreleased)
8 (release 2.8.0)
2 (release 2.7.0)
3 (release 2.6.0)
~70 (lifetime) ~7 (lifetime)
livenessprobe 9 (unreleased) ~41 (lifetime) ~9 (lifetime)

Table: PR to CSI Sidecars related to vuln fixes and library propagation

CSI Sidecars releases

The CSI Drivers/CSI Sidecars have an indirect dependency on the k8s version. This could happen because of:

  • A new CSI feature that touches CSI Sidecars and k8s component - For example the ReadWriteOncePod feature needs changes in k8s components (kube-apiserver, the kube-scheduler, the kubelet), CSI Sidecars

Because of this indirect dependency the SIG Storage community creates a minor release of each CSI Sidecar for every k8s minor release. We use csi-hospath (a CSI Driver used for testing purposes) to test the compatibility of the new releases with the latest k8s version.

We follow the instructions on SIDECAR_RELEASE_PROCESS.md on every CSI Sidecar to create a minor release.

Maintenance tasks by CSI Driver authors and cluster administrators

Kubernetes and CSI are constantly evolving(see the section above on how CSI Sidecars evolve)and so are CSI Drivers, CSI Driver authors must keep their drivers up to date with the new features in k8s and CSI. A CSI Driver implementing most of the CSI features inludes the following components:

csi driver basic structure

keeping up with vulnerabilities with fixes

A cluster administrator in addition to keeping up with the latest k8s and CSI features might need to manage different aspect of the integration too like security. CSI Sidecars depend on multiple dependencies which might be susceptible to vulnerabilities. In the case these vulnerabilities are fixed in a new release of a dependency it must be propagated all the way until the CSI Sidecar repository.

Usually the above might be enough for the latest release however the vulnerability might also affest older releases of the CSI Sidecars, therefore the fix needs to be appliedto older CSI Sidecar releases

sidecar version bumps up

The above increases the work not only for the SIG Storage community which has to cherry pick the fix but also to cluster administrators who have to update existing CSI Driver integrations in previous k8s releases bumping the CSI Sidecars

To avoid this propogation issue, cluster administrators have the following options:

  • Use the same version of CSI Sidecars in previous k8s integrations

sidecar version strategies of gke

Resource utilization by the CSI Sidecar components

In Some CSI Driver control plane deployment setups each sidecar is configured with a minimum memory request, some examples of OSS CSI Driver deployments resource allocations:

  • Memory request
    • EBS CSI Driver
      • In a CP node, sets a 40Mi memory request for each CSI Sidecars(5 sidecars), a total of 200Mi per node.
      • In a worker node, sets a 40Mi memory request for each CSI Sidecar(2 sidecars), a total of 80Mi per node
    • Azuredisk
      • In a CP node, sets a 20Mi memory request for each CSI Sidecars(5 sidecars), a total of 100Mi per node
      • In a worker node, sets a 20Mi memory request for each CSI Sidecars(2 sidecars), a total of 40Mi per node
    • AlibabaCloud Disk
      • In a CP node, sets a 16Mi memory request for each CSI Sidecars(average 4 sidecars) a total of 64Mi per node
      • In a worker node, sets a 16Mi memory request for each CSI Sidecars(1 sidecars), a total of 40Mi per node The 5x memory request is addtional overhead in the control plane nodes, 2x in the worker nodes

Goals

Non-Goals

  • The sidecars not include sig-storage-lib-external-provisioner.
    • Because it doesn't depend on release-tools or csi-lib-utils.
  • release-tools and csi-lib-utils are not included in the monorepo.
    • we can start with the sidecars only and no utility libraries, after we see that it works in CI then we can consider moving the utilities to the monorepo. we will open another KEP if we need to move them.

Proposal

Overview

The proposal consists of creating a monorepo which creates a single artifact with common sidecars combined in one binary:

  • Combine the source code of all common CSI sidecars (external-attacher, external-provisioner, external-resizer, external-snapshotter, livenessprobe, node-driver-registrar), Controllers(snapshot controller, volume-health-monitor controller), Webhooks(csi-snapshot-validation-webhook) in a single repository. A total of 7 repositories including 6 sidecars, 2 controllers and 1 webhook.
  • Include the source code of helper utilities in the same repository(csi-release-tools, csi-lib-utils), sidecars/apps use the local modules through go workspaces. A total of 1 release helper and 1 go module.
  • Create a new cmd/ entrypoint that enables sidecars selectively, similar to kube-controller-manager and the --controllers flag.

csi aio structure state

CSI Driver authors would include a single sidecar in their deployments(in both the control plane and node pools). while the artifact version is the same, the command/arguments will be differents.

pictures: desired aio component structure

The CSI Driver deployment manifest would look like this in the control plane:

kind: Deployment
apiVersion: app/v1
metadata:
  name: csi-driver-deployment
spec:
  replicas: 1
  templates:
    spec:
      containers:
        - name: csi-driver
          args:
            - "--v=5"
            - "--endpoint=unix:/csi/csi.sock"
        - name: csi-sidecars
          command:
            - csi-sidecars
            - "--csi-address=unix:/csi/csi.sock"
            # similar style as kube-controller-manager
            - "--controllers=attacher,provisioner,resizer,snapshotter"
            - "--feature-gates=Topology=true"
            # leader election flags for all the components as one
            - "--leader-election"
            - "--leader-election-namespace=kube-system"
            # global timeouts
            - "--timeout=30s"
            # per controller specific flags are prefixed with the component name
            - "--attacher-timeout=30s"
            - "--attacher-worker-thread=100"
            - "--provisioner-timeout=30s"
          volumeMounts:
            - mountPath: /csi
              name: socket-dir

The CSI Driver deployment manifest would look like this in the worker node

kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: csi-driver-deployment
spec:
  template:
    spec:
      containers:
        - name: csi-driver
          args:
            - "--v=5"
            - "--endpoint=unix:/csi/csi.sock"
        - name: csi-sidecars
          command:
            - csi-sidecars
            - "--csi-address=unix:/csi/csi.sock"
            # similar style as kube-controller-manager
            - "--controllers=node-driver-registrar"
            - "--kubelet-registration-path=/var/lib/kubelet/plugins/<csi-driver>/csi.sock"
          volumeMounts:
            - name: registration-dir
              mountPath: /registration
            - name: plugin-dir
              mountPath: /csi
      volumes:
        - name: registration-dir
          hostPath:
            path: /var/lib/kubelet/plugins_registry/
            type: Directory
        - name: plugin-dir
          hostPath:
            path: /var/lib/kubelet/plugins/<csi-driver>/
            type: DirectoryOrCreate

Quantifiable characteristics of the current state and of the proposed state

Characteristics/State Current state of CSI Sidecars(let #csi-sidecars=6) CSI Sidecars in signal component
Human effort of propogating csi-release-tools (#csi-release-tools changes * #csi-sidecars) 0(because csi-release-tools is part of the repo)
Human effort of propogating csi-lib-utils (#csi-lib-utils changes * #csi-sidecars) 0(because csi-lib-utils is part of the repo)
go mod dependency bumps (#dependency changes * #csi-sidecars) * CSI release supported(unknown) #dependency changes * releases supported(follow k8s release)
runtime udpate (#csi-release-tools changes related with go runtime updates * #csi-sidecars) #go runtime updates
members of CSI releases per k8s minor release #csi-sidecars 1

Additional properties of a single CSI Sidecar component without a quantifiable benefit:

Dimension Pros Cons
Releases
  • Easier releases
  • Better definition of which sidecar releases are supported for CVE fixs i.e. if our model of support is similar to k8s (last 3 releases) then the same applies to the CSI sidecar releases
  • Release nodes in csi-release-tools are part of the release. Currently, commits in csi-release-tools with release notes get lost because the git subtree commands replays commits but loses the PR release note if csi-release-tools is part of the repo
  • No longer able to do single releases per component.
  • More frequent major version bumps, Currently, we increase the major version of a sidecar when we remove a command line parameter or require new RBAC rules, We ended up with provisioner v5, attacher v4, and snapshotter v8. With a common repo, we would end up with 5+4+8=v17 in the worst case.
  • Testability
  • Easier testing
  • Test features that spawn multiple components e.g. the RWOP feature can be tested as a whole. @pohly
  • Performance & Reliability
  • Can use a shared informer decreasing the load on the API server. @msau42
  • Container getting OOMKilled kills the entire CSI machinery, not just a single component.
    • In HA, another replica would take over a few seconds.
  • Simplicity
  • Consolidation of common parameters like leader election, structed logging
  • Combination of metrics/health ports @msau42
  • Enables using additional sidecars that aren't used because of addtional build pipelines that might be needed to support that additional component.
  • Logs would be interleaved making it harder to trace what happened for a request
  • CSI utility liraries that are not only used by CSI Sidecars but by other project.
    • make an external repo which is automatically syncronized from the internal csi-release-tools e.g. a similar analogy to k/k/staging/lib -> k/lib
  • Integration with CSI Drivers
  • Less config in the controller/node yaml manifest
  • Less confusion for CSI Driver authors on which CSI Sidecar versions to use @msau42
  • Complex configuration for the single CSI Sidecar component
  • Difficulty expressing per CSI Sidecar configration e.g. kube-api-qps, kube-qpi-burst
    • global flag, override through a CSI sidecar flag e.g. kube-api-qps -> attacher-kube-api-qps
  • User Stories (Optional)

    Notes/Constraints/Caveats (Optional)

    Design Details

    Glossary

    • Individual repository - An existing repository in the kubernetes-csi/ org in Github e.g. the external-attacher repository.
    • Individual component - An existing component of csi sidecars.
    • AIO monorepo or monorepo - The monolithic repository where most of the code of the CSI Sidecars will be migrated.
    • Monorepo component - The source code of an individual repository that is currently being migrated or already migrated to the monorepo.

    AIO Monorepo

    Release Management

    We are consider to switch semantic version to k8s version, there are some pros and cons

    pros:

    • We don't need to reinvent the wheel about what our dev process is going to look like, we follow the same docs as k8s https://kubernetes.io/releases/release/. This is tried and tested for many releases
    • Cluster administrators would know which version to use to match their CSI Driver deployment e.g. for a k8s 1.27 cluster they'd use the 1.27 release of the CSI Sidecar.

    cons:

    • Breaking changes might happen in a minor release, Cluster administrators MUST read sidecar release notes considering breaking changes before working on a big release.
    • Version skew scenario becomes confusing for the cluster administrator e.g. they deploy the CSI Sidecars v1.x, cluster is upgraded to v1.{x+3} (CP upgrade first, NP later), nodepools would have CSI sidecar at v1.{x+3} with kubelet at v1.x
    • k/k at 1.27.5 - CSI 1.27.0 or (different mapping still)

    After investigation, we found that there isn't clear advantage to switch to k8s versioning, so we chose to keep Semantic Versioning in monorepo.

    RBAC policy

    We designed the AIO repo's RBAC policy to mirror that of individual repos, where each controller maintains its own policy. Driver maintainers should apply proper RBAC when enabling specific controllers in AIO more discuss info in here

    We plan to combine informer caches of different controllers in the future

    Command Line

    Divided the command lines into two types, a generic command line whose configuration is common to all controllers and is configured only once, and the other type of command lines whose configuration is different for each controller. these command lines each has a new unique name. prefix with the controller name.

            - name: csi-sidecars
              command:
                - csi-sidecars
                - "--csi-address=unix:/csi/csi.sock"
                # similar style as kube-controller-manager
                - "--controllers=attacher,provisioner,resizer,snapshotter"
                - "--feature-gates=Topology=true"
                # leader election flags for all the components as one
                - "--leader-election"
                - "--leader-election-namespace=kube-system"
                # global timeouts
                - "--timeout=30s"
                # per controller specific flags are prefixed with the component name
                - "--attacher-timeout=30s"
                - "--attacher-worker-thread=100"
                - "--provisioner-timeout=30s"

    example PR: kubernetes-csi/external-attacher#620

    Monorepo component

    poc version: https://github.com/mauriciopoppe/csi-sidecars

    monorepo attacher: https://github.com/mauriciopoppe/csi-sidecars/tree/main/pkg/attacher

    Development workflow

    overview

    After we see the Monorepo component running fine in integration/e2e tests in k8s, we need to perform a hard cut so that new deployment goes in the monorepo component only.

    AIO MonoRepo state definition
    • Design: Current state of AIO MonoRepo
    • Alpha: all six sidecar repo had been integrated into mono repo, All the e2e tests has passed.
    • Beta (production-verified): six sidecars working through CSI hostpath, three cloud vendor can using it in its production environment.
    • GA (released): Official released, Available for accept PRs from SIG Storage Developer
    • standalone: Never need sync codes from individual repos, AIO MonoRepo become the source of truth
    Individual repository state definition
    • Released: current state of individual repos
    • FeatureFreeze:
      • Any new feature PRs are not allowed to be filed to the master branch or release-X branches(Controlled by the individual repo maintainer, categorize it and reject it if it's a feature)
      • SIG Storage Developer file the feature PRs to AIO MonoRepo
      • Except for the serious bugfixes or CVE fixes PRs (only from individual repo maintainer) which can be merged in master and backported to the other release-X branches
    • Deprecated:
      • Not maintaining this repository
      • Eventually the image is going away for the individual repo is going away (although wouldn't possible unless we migrate ALL the sidecars)
      • (future) archive it but not at the same time as the deprecation time, this is a terminal state so we can't undo it

    state change

    Migration Process

    migration process

    Risks And Mitigations

    • Breaking changes in one component forces the single release to be a breaking change

    • Vulnerability that might affects one component affects all other components

    see details in: https://docs.google.com/document/d/1SD4YRas_qXMP363L4j3WBTV_F9anq-5FM5gdGmJq7h0/edit?usp=sharing

    • Panic in one component restarts the sidecar

    For each sidecar define the where in the stack a panic should be caught to possibly restart the controller.

    List of fixed issues related with panics: - kubernetes-csi/external-provisioner#839 - kubernetes-csi/external-provisioner#582 - kubernetes-csi/external-attacher#502

    panic like OOM doesn't count into this type(perhaps no good way to reduce the blast radius)

    • Keeping the monorepo and the existing sidecars repo up to date after the migration for X releases

    MileStone

    Milestone (completed):

    Develop a minimal proof of concept

    POC: https://github.com/mauriciopoppe/csi-sidecars-aio-poc

    Milestone-setup-a-repository-inside-kubernetes-csi

    Design phase

    Milestone-Build-the-project-using-a-modified-copy-of-release-tools

    Design phase

    Milestone-set-up-new-test-infra-jobs-to-test-the-project-through-the-hostpath-CSI-Driver

    Design phase

    Milestone-mirroring-of-nested-directories-to-repos-in-kubernetes-csi

    Design phase

    Milestone-definition-of-the-development-workflow

    Design phase

    Milestone-migration-of-CSI-Drivers-to-the-new-model

    Design phase

    Milestone-all-six-sidecar-repo-had-been-integrated-into-monorepo

    Alpha phase

    Milestone-be-ready-to-accept-PR-from-community

    Beta phase

    Milestone-six-sidecars-working-through-CSI-hostpath

    Beta phase

    Milestone-three-cloud-vendors-start-using-the-monorepo-component

    GA phase

    Milestone-all-individual-repo-has-been-into-deprecated-state

    Standalone phase

    Milestone-merge-sidecar-informer-caches

    Standalone phase

    Test Plan

    Prerequisite testing updates
    Unit tests
    Integration tests
    e2e tests

    Graduation Criteria

    Upgrade / Downgrade Strategy

    Version Skew Strategy

    Production Readiness Review Questionnaire

    Feature Enablement and Rollback

    How can this feature be enabled / disabled in a live cluster?

    It's actually not a feature, but we can enable it by deploy new version of csidriver and disable it by delete the new version and redeploy the old version

    Does enabling the feature change any default behavior?

    This won't make any changes to the default behavior of Kubernetes.

    Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

    It's actually not a feature, it's kind of architectural change. so user can deploy old version csi driver to disable it.

    What happens if we reenable the feature if it was previously rolled back?

    Nothing happend, it will act as usually

    Are there any tests for feature enablement/disablement?

    Yes. We will add unit tests with and without the feature gate enabled.

    Rollout, Upgrade and Rollback Planning

    How can a rollout or rollback fail? Can it impact already running workloads?
    What specific metrics should inform a rollback?
    Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
    Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

    Monitoring Requirements

    How can an operator determine if the feature is in use by workloads?
    How can someone using this feature know that it is working for their instance?
    • Events
      • Event Reason:
    • API .status
      • Condition name:
      • Other field:
    • Other (treat as last resort)
      • Details:
    What are the reasonable SLOs (Service Level Objectives) for the enhancement?
    What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
    • Metrics
      • Metric name: plugin_execution_duration_seconds{plugin="VolumeBinding",extension_point="Score"}
      • [Optional] Aggregation method:
      • Components exposing the metric:
    • Other (treat as last resort)
      • Details:
    Are there any missing metrics that would be useful to have to improve observability of this feature?

    Nothing in particular.

    Dependencies

    Does this feature depend on any specific services running in the cluster?

    No.

    Scalability

    Will enabling / using this feature result in any new API calls?
    Will enabling / using this feature result in introducing new API types?
    Will enabling / using this feature result in any new calls to the cloud provider?
    Will enabling / using this feature result in increasing size or count of the existing API objects?
    Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
    Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
    Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

    Troubleshooting

    How does this feature react if the API server and/or etcd is unavailable?
    What are other known failure modes?
    What steps should be taken if SLOs are not being met to determine the problem?

    Implementation History

    Drawbacks

    Alternatives

    Infrastructure Needed (Optional)