Skip to content

improvement: Bump Salt to 3006.24#4898

Draft
TeddyAndrieux wants to merge 44 commits intodevelopment/133.0from
improvement/bump-salt-3006
Draft

improvement: Bump Salt to 3006.24#4898
TeddyAndrieux wants to merge 44 commits intodevelopment/133.0from
improvement/bump-salt-3006

Conversation

@TeddyAndrieux
Copy link
Copy Markdown
Collaborator

@TeddyAndrieux TeddyAndrieux commented Apr 30, 2026

Salt 3006.24 upgrade

  • Bump Salt to 3006.24 (master image now uses the onedir-based RPM)
  • Drop the m2crypto package and the m2crypto Salt state — Salt 3006 ships cryptography-backed x509_v2, enabled via features: { x509_v2: true } on master and minion configs
  • Remove deprecated verbose parameter from x509 module calls
  • Remove no-longer-needed use_superseded on module.run states
  • Replace deprecated module.wait + watch pattern with module.run + onchanges
  • Drop the module.run Salt state where it has no remaining purpose

Salt master/minion compatibility during upgrade

  • Add minimum_auth_version: 2 on the master to accept v2-protocol minions during the rolling upgrade (to be removed in development/135)
  • Add reload_modules: False override on the salt-minion package install to skip the post-install module_refresh that fails when the running 3002 minion's files are replaced by the onedir layout
  • Add a dedicated salt-ssh ssh_pre_flight script (ssh-preflight.sh) that installs python3.12 and switches the python3 alternative on each target — Salt 3006 thin requires Python >= 3.7, RHEL/Rocky 8 ships 3.6
  • Switch the bootstrap script to install python3.12 and set the python3 alternative
  • Add dnf to the salt-master image so the yumpkg Salt module loads (its __virtual__ rejects microdnf-only environments and would otherwise disable pkg.* including pkg.version_cmp)

Certificate authority handling (upgrade path from old x509/m2crypto-generated certs)

  • Add a shared preserved_ski Jinja macro that pins subjectKeyIdentifier on x509.certificate_managed to the value of the existing CA cert (or hash on first install) — prevents x509_v2 from regenerating CAs with a new SKI that would invalidate every leaf cert's AKI, since cryptography and m2crypto compute the SKI differently for the same public key. Critical when upgrading clusters whose CAs were originally generated by the old x509 (m2crypto) module.
  • Apply the macro to all six CAs: kubernetes, etcd, front-proxy, dex, nginx-ingress, backup-server

Salt orchestration fixes

  • Replace state.orchestrate_single with state.orchestrate + a dedicated patch_kubesystem_namespace.sls — the _single runner has long tripped a ReferenceError: weakly-referenced object no longer exists on the post-state event firing, but in 3006 it now propagates as a non-zero exit code (was previously swallowed silently)
  • Use a dedicated state to patch the kube-system namespace cluster-version annotation
  • Create the salt-master kubeconfig after the highstate completes, not during it (kube-apiserver isn't ready yet during the bootstrap highstate)
  • Fix invalid require_in in deploy_node orchestrate
  • Add missing dependency on deploy core component objects in the upgrade orchestrate

Python and OS toolchain

  • Drop Python 3.6 support across the codebase: remove python36-rpm, python36-pyOpenSSL, python36-psutil package dependencies
  • Drop the Rocky Linux grains workaround that was needed for older Salt versions
  • Drop the no-longer-needed RedHat 7 logic in Salt states
  • Bump the buildchain to require Python 3.10+
  • Bump lib-alert-tree to Python 3.10
  • Bump devcontainer Python to 3.10.13; drop Python 2.7 and Python 3.6 from the devcontainer
  • Bump Python dependencies for tests, docs, salt tests
  • Remove the virtualenv < 20.22.0 pin in tox.ini (was needed for Python 3.6)

Build / CI / lint

  • Fix doit >= 0.37 compatibility: doit 0.37 dropped cloudpickle and uses stdlib pickle, which rejects local closures — refactor on_failure and title_* helpers in buildchain/buildchain/iso.py and buildchain/buildchain/utils.py to module-level functions (using functools.partial where they capture an argument)
  • Fix the docs build: change git config --global to git config --system in docs/entrypoint.sh so the tempuser the build sudo's to inherits the safe.directory setting and git describe no longer fails (which had silently set release = None and broken Sphinx)
  • Move docs / tests / salt-tests / buildchain pip-compile invocations from tox.ini to pre-commit hooks
  • Bump black to 26.3.1 (and apply formatting)
  • Bump pylint to v4.0.5 with associated dependencies
  • Bump mypy to v1.20.2 with associated dependencies; add a mypy config file
  • Bump yamllint to v1.38.0

Patch

Fixes: #3436 MK8S-251


TODO:

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Apr 30, 2026

Hello teddyandrieux,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Apr 30, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Peer approvals must include at least 1 approval from the following list:

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented May 4, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Peer approvals must include at least 1 approval from the following list:

@TeddyAndrieux TeddyAndrieux force-pushed the improvement/bump-salt-3006 branch from b8b4c73 to cc0d369 Compare May 5, 2026 09:39
@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented May 6, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Peer approvals must include at least 1 approval from the following list:

@TeddyAndrieux TeddyAndrieux force-pushed the improvement/bump-salt-3006 branch from 0ae2995 to 0cc092e Compare May 7, 2026 08:34
@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented May 7, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Peer approvals must include at least 1 approval from the following list:

@TeddyAndrieux TeddyAndrieux force-pushed the improvement/bump-salt-3006 branch from 0cc092e to d401278 Compare May 7, 2026 13:39
…changes`

Both as the same effect but `module.wait` + `watch` does not seems to
works well with newer salt version
During the bootstrap process, we skip the creation of
the salt-master kubeconfig during the highstate since
at this stage the kube-apiserver is not ready yet.
During upgrade and downgrade, we patch the kube-system namespace
annotation with the new cluster version.

Before we were using `state.orchestrate_single` but this
one always return an error about
`weakly-referenced object no longer exists`, to avoid this
we switch to use `state.orchestrate` instead.
Notables changes:
- Switch from scality saltstack repository to official saltstack repository
- Use classic `six` instead of the removed `salt.ext.six`
- We no longer use `ipaddress` from `salt._compat` as since salt 3003 he
  do not have the same behavior as `ipaddress` from Python3 and we want
  to keep this behavior
- Mount salt cache directory in salt-api container
- Rework salt-master image to install python dependencies in
  salt onedir
- Salt `random.get_str` now add punctuation by default (disable it)
- Enable x509_v2 features in salt-master and salt-minion
  (otherwise we need to install m2crypto in the python salt onedir)
  NOTE: In some place we have to add if blocks to support older
  salt versions to support upgrade
  NOTE: We also have to retrieve the old SKI to avoid CA changes
  when upgrading
- Add `-f` flag to pgrep command in salt-master manifest since
  without it with the new salt python one dir the command is too long
  and `pgrep salt-master` will not work
- Starting salt 3006 salt-api is disabled by default, we enable it
- Switch to python39 during expansion since salt-ssh does not support
  python <= 3.7, we also move to ssh_preflight script
It's the default behavior since salt 3005.
This rpm is not longer needed with the new salt version
relying on onedir
This rpm is not longer needed with the new salt version
relying on onedir
This rpm is not longer needed with the new salt version
relying on onedir
This workaround is not longer needed with the new salt version
that support Rocky Linux by default.
We also fix some new pylint warnings, and regenerate
fresh pylintrc files
We also fix some new mypy warnings, and add
a mypy configuration file
…commit

We also remove the platform-requirements.txt file, as it is no longer needed.
We also fix tests and tox configuration to works
with newer dependencies.

Plus we force color in CI runs
We also remove the deprecated assertDictContainsSubset call.
This one was needed to build containerd rpm but we
no longer build it ourselves.
salt-ssh's `SSH.__init__` calls `_expand_target`, which when given a
single non-glob target that resolves to a reachable host calls
`_get_roster` -> `salt.roster.get_roster_file`. That helper requires the
configured roster file (default `/etc/salt/roster`) to exist on disk
and be readable, even when a non-flat backend like `kubernetes` is
selected via `roster: kubernetes` - the file is only stat'd, never
read.

When no roster file is present, `salt-ssh --roster=kubernetes
<single-host> ...` fails with `OSError: Roster file "/etc/salt/roster"
not found`. Most invocations don't trip this because they use a glob
(`salt-ssh '*' ...`) or target an unreachable host - both bypass the
expansion path that hits `_get_roster`.

Drop a zero-byte `/etc/salt/roster` into the salt config so the check
passes regardless of how operators invoke salt-ssh against the cluster.
NOTE: Due to a bug in salt 3006.24, we need to patch the salt code to
fix salt-ssh compatibility with Python 3.12.
@TeddyAndrieux TeddyAndrieux force-pushed the improvement/bump-salt-3006 branch from d401278 to 486d1a9 Compare May 7, 2026 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade Salt to 3003.1

2 participants