Add Kubernetes mitigation manifest#29
Open
ClemDNL wants to merge 1 commit into
Open
Conversation
Adds a self-contained DaemonSet manifest under k8s/ that applies the
mitigation from the README (modprobe blacklist of esp4/esp6/rxrpc +
page-cache flush) to every Linux node in a Kubernetes cluster, and
re-applies it automatically on any new node that joins the cluster
(autoscaling, node-image upgrade, scale-set rolling update).
- k8s/dirtyfrag-mitigation.yaml — single-file manifest applyable with
kubectl apply -f. Uses an init container that nsenter's into PID 1
to write /etc/modprobe.d/disable-dirtyfrag.conf, modprobe -r each
module that has refcnt=0, and echo 3 > /proc/sys/vm/drop_caches.
For any module that remains loaded with refcnt > 0, emits a single
aggregated Warning Kubernetes Event on the Node (no auto-cordon).
A long-running pause container keeps the pod Running so the init
container is only re-executed on pod recreation.
- k8s/README.md — apply / verify / revert instructions and
compatibility notes (esp4/esp6 = IPsec, rxrpc = AFS).
- README.md — short Kubernetes section in Mitigation pointing to k8s/.
Tested on AKS (Azure) running Kubernetes 1.30, in a production
environment across staging and production clusters.
02ca786 to
44af5b1
Compare
|
|
||
| if [ "${CLEANUP_MODE}" = "true" ]; then | ||
| echo "[dirtyfrag] CLEANUP mode on node ${NODE_NAME}: removing mitigation" | ||
| nsenter -t 1 -m -u -i -n -p -- sh -c "rm -f ${MODPROBE_FILE}; depmod -a 2>/dev/null || true; for m in ${MODULES}; do modprobe -r \$m 2>/dev/null || true; done; true" |
There was a problem hiding this comment.
What is the reason for entering all host namespaces here?
Writing configuration for modprobe could he achieved via a hostMount to /etc/modprobe.d and module management can be achieved by granting the container the SYS_MODULE capability.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a self-contained Kubernetes DaemonSet manifest under
k8s/that applies the mitigation from the README to every Linux node in a cluster, and re-applies it automatically on any new node that joins (autoscaling, node-image upgrade, scale-set rolling update).This is helpful because the disclosure's recommended mitigation must run on every node, and on managed-Kubernetes platforms (AKS / EKS / GKE) operators don't typically have direct shell access to nodes — a DaemonSet is the cleanest way to roll a host-level fix across the fleet.
Files
k8s/dirtyfrag-mitigation.yaml— single-file manifest (ServiceAccount+ClusterRole+ClusterRoleBinding+DaemonSet) applyable in one shot withkubectl apply -f. Uses a privileged init container thatnsenter's into PID 1 to:/etc/modprobe.d/disable-dirtyfrag.confblacklistingesp4,esp6,rxrpc.refcnt=0, runmodprobe -rto unload it from the live kernel.sync; echo 3 > /proc/sys/vm/drop_caches(per the README's mitigation guidance; gated onDROP_CACHES, defaulttrue).refcnt > 0, emit a single aggregated Warning Kubernetes Event (reason=DirtyFragModulesInUse) on the affectedNodelisting the in-use modules so operators can drain & reboot. No auto-cordon — operators decide when to recycle nodes.A long-running
pausecontainer keeps the pod inRunningso the init container is only re-executed on pod recreation (i.e. on each new node).k8s/README.md— apply / verify / revert instructions, plus the compatibility note thatesp4/esp6= IPsec ESP andrxrpc= AFS RxRPC sockets, and pointers for clusters that need to keep one of these modules.README.md— adds a short Kubernetes subsection inside Mitigation pointing tok8s/.Apply
Revert (when upstream patches roll out)
The modprobe drop-in persists for the lifetime of each node, so a cleanup pass is provided:
kubectl -n kube-system set env ds/dirtyfrag-mitigation CLEANUP_MODE=true kubectl -n kube-system rollout restart ds/dirtyfrag-mitigation kubectl -n kube-system rollout status ds/dirtyfrag-mitigation kubectl delete -f https://raw.githubusercontent.com/V4bel/dirtyfrag/master/k8s/dirtyfrag-mitigation.yamlValidation
kubectl apply --dry-run=server -f k8s/dirtyfrag-mitigation.yamlagainst a real cluster — all four resources (ServiceAccount,ClusterRole,ClusterRoleBinding,DaemonSet) accepted.Compatibility note for reviewers
esp4/esp6provide IPsec ESP transforms;rxrpcprovides the RxRPC socket family. None of these are required by a typical workload-only Kubernetes cluster, but operators with node-level IPsec or AFS clients should edit theMODULESenv var (or label-exclude the affected node pool) before applying. This is documented ink8s/README.md.