-
Notifications
You must be signed in to change notification settings - Fork 12
Add secret watch and automatic pod restart for PTP security key rotation. #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
b958e6d to
1954555
Compare
edcdavid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First batch of comment, location of changes is correct but need to improve logic to support one secret and sa_file per profile
| "net/url" | ||
| "os" | ||
| "path/filepath" | ||
| "sort" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert these changes
controllers/ptpconfig_controller.go
Outdated
|
|
||
| // List all PtpConfigs to find which ones reference this secret | ||
| ptpConfigList := &ptpv1.PtpConfigList{} | ||
| if err := r.List(ctx, ptpConfigList); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
search only in the openshift-ptp namespace (names.namespace)
controllers/ptpconfig_controller.go
Outdated
| if profile.PtpSecretName != nil && *profile.PtpSecretName != "" { | ||
| secretName := *profile.PtpSecretName | ||
| glog.Infof("Found PtpSecretName in profile %s: %s", profileName, secretName) | ||
| if chosenSecretName == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code picks the fist secret name from the list of secret names across the all ptpconfig->profiles in the openshift ptp namespace. Instead, the code should get a pair of {secret name, sa_file} for each ptpconfig->profile. Each profile of each ptp config can configure a different sa file and secret.
controllers/ptpconfig_controller.go
Outdated
| chosenSecretKeys = map[string]struct{}{k: {}} | ||
| glog.Infof("Found PTP security secret '%s' with key '%s'", chosenSecretName, k) | ||
| } | ||
| } else if len(sec.Data) > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if multiple keys exist in the secret, print a warning that the additional keys will be ignored, no need to process them.
Also rename "chosenSecretKeys" to secretFileContent
controllers/ptpconfig_controller.go
Outdated
|
|
||
| // extractSaFileFromPtp4lConf scans the [global] section for a line starting with 'sa_file' | ||
| // and returns the remainder of the line as the path. | ||
| func extractSaFileFromPtp4lConf(conf string) (string, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is redundant, use populatePtp4lConf to parse the ptp4l config instead
controllers/ptpconfig_controller.go
Outdated
| } | ||
|
|
||
| // Extract filename from the full path - this will be used as subPath | ||
| filename := path.Base(fullPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subpath is just the filename in sa_path
pkg/apply/merge.go
Outdated
| } | ||
|
|
||
| // MergeDaemonSetForUpdate preserves ptp-security-volume added by ptpconfig_controller. | ||
| // This volume is dynamically managed based on PtpConfig CRs and should not be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here multiple ptp-security volumes are added to the daemonset template, not just one
| return ctrl.NewControllerManagedBy(mgr). | ||
| For(&ptpv1.PtpConfig{}). | ||
| Watches( | ||
| &corev1.Secret{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
watch only the openshift-ptp namespace
pkg/apply/merge.go
Outdated
| // MergeDaemonSetForUpdate preserves ptp-security-volume added by ptpconfig_controller. | ||
| // This volume is dynamically managed based on PtpConfig CRs and should not be | ||
| // overwritten when PtpOperatorConfig reconciles the DaemonSet template. | ||
| func MergeDaemonSetForUpdate(current, updated *uns.Unstructured) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should simply merge the 2 daemonset templates, specifically the volumes.
controllers/ptpconfig_controller.go
Outdated
| injectPtpSecurityVolume(daemonSet, chosenSecretName, chosenSecretKey, chosenSecretKeys, saFilePaths, secretHash) | ||
|
|
||
| // 7. Update the DaemonSet | ||
| if err := r.Update(ctx, daemonSet); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of updating the daemonset directly, use the merge function MergeObjectForUpdate which is called by apply.ApplyObject as in func (r *PtpOperatorConfigReconciler) syncLinuxptpDaemon(ctx context.Context, defaultCfg *ptpv1.PtpOperatorConfig, nodeList *corev1.NodeList) error {:
link
After your change, that function is merging its updated template content with the current template. Now this function needs to do the same. This way both updates are preserved.
This function should first remove old security volumes (because of changing the sa_file name or secret name), then add the new ones, otherwise there will be leftovers.
controllers/ptpconfig_controller.go
Outdated
| //+kubebuilder:rbac:groups=ptp.openshift.io,resources=ptpconfigs/status,verbs=get;update;patch | ||
| //+kubebuilder:rbac:groups=ptp.openshift.io,resources=ptpconfigs/finalizers,verbs=update | ||
| //+kubebuilder:rbac:groups=config.openshift.io,resources=infrastructures,verbs=get;list;watch | ||
| //+kubebuilder:rbac:groups="",resources=secrets,verbs=get;list;watch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning: this adds secrets list/get/watch to the controller's ClusterRole that grants the controller access to all the secrets in any namespace of the cluster. I think maybe we should create a separate role and rolebinding in config/rbac so that the controller can only be granted access to secrets in the operator's namespace.
Makefile
Outdated
| ## Tool Versions | ||
| KUSTOMIZE_VERSION ?= v4.5.7 | ||
| CONTROLLER_TOOLS_VERSION ?= v0.15.0 | ||
| CONTROLLER_TOOLS_VERSION ?= latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might work for you in your arm64 VM, but I'm not sure this is a safe move, as a new version might break the controller build process due to dependencies mismatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted to original version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can test it with v0.19.0. This is the latest and seems to work fine with arm64
ea8d467 to
bb504e8
Compare
edcdavid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some more comments!
controllers/ptpconfig_controller.go
Outdated
| chosenSecretKeys[k] = struct{}{} | ||
| } | ||
|
|
||
| // 2. Deduplicate mounts (multiple profiles might use same secret and sa_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this section since the validation webhook should not allow this to happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I removed this code, and left the validation of the ptpconfig in the webhook. in case there is a collision once updating/creating a ptpconfig it will log an error.
| } | ||
|
|
||
| // Determine which security volumes to use | ||
| var securityVolumesToUse []interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic seems too complicated for the merge, try the following
- remove all the security volumes from the original (before update) template
- calculates the updated volumes/annotations/mounts for all ptpconfigs in the updated template
- insert the new volumes/annotations/mounts list in the original linuxptp-daemon template
This way will support adding and deleting any sa_file or secrets and will keep non security changes from original template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
controllers/ptpconfig_controller.go
Outdated
| Scheme *runtime.Scheme | ||
| Log logr.Logger | ||
| Scheme *runtime.Scheme | ||
| APIReader client.Reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need for this APIReader field. The embedded interface client.Client already has the Reader already.
controllers/ptpconfig_controller.go
Outdated
|
|
||
| // Load secret and compute hash | ||
| sec := &corev1.Secret{} | ||
| if err := r.APIReader.Get(ctx, types.NamespacedName{Namespace: names.Namespace, Name: secretName}, sec); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use r.Get(...) directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I got rid of APIReader
fa26d30 to
eb90b39
Compare
Add validateSecretExists() to verify ptpSecretName references exist Add validateSppInSecret() to validate SPP numbers against secret data Implement context-based merge to distinguish controller sources PtpConfig controller: use security volumes from updated (supports deletion) PtpOperatorConfig controller: preserve security volumes from current Update merge logic for volumes, annotations, and volume mounts Add controller source constants to apply package Update all tests to pass context parameter Add PTP authentication testing support with conditional enablement Add configure-switch-ptp-security.sh script to configure external switch Add ptp-security.yaml template with security associations (SPP 0 and 1) Add GetPtp4lConfigWithAuth() to conditionally inject auth settings Add PtpSecretName field conditionally based on PTP_AUTH_ENABLED Add negative test for SPP mismatch validation (2-minute check) Update run-ci-github.sh to configure switch and run auth-enabled tests Apply auth settings to all config creation functions (GM, slave, BC, etc) Add PTP authentication security tests for attack scenarios Add rogue client injection test: verifies unauthenticated clients are blocked Add MITM protection test: verifies tampered messages with wrong keys are dropped Add replay attack test: verifies seqid_window is configured for anti-replay Implement robust cleanup with delete and recreate for test-slave1 Add pod stabilization waits between tests to prevent race conditions Include optional log validation for authentication failure messages
eb90b39 to
8b14b1d
Compare
edcdavid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some new comments
ptp-tools/Makefile
Outdated
| $(KUSTOMIZE) build ../config/default | kubectl apply -f - | ||
| $(KUSTOMIZE) build ../config/custom | kubectl apply -f - | ||
| kubectl patch ptpoperatorconfig default -nopenshift-ptp --type=merge --patch '{"spec": {"ptpEventConfig": {"enableEventPublisher": true, "transportHost": "http://ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043", "storageType": "local-sc"}, "daemonNodeSelector": {"node-role.kubernetes.io/worker": ""}}}' | ||
| rebuild-ptpop: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to add these changes, these are already covered by make podman-build-ptpop and make-podman-push-ptpop
| @@ -0,0 +1,174 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command is too complex for a test. I would follow a similar pattern as for the original switch configuration but with a new configuration including authentication(ptpswitchconfig_auth.cfg and the sa_file maybe derived from the secret ptp-security.yaml):
podman cp ptpswitchconfig_auth.cfg switch1:/etc/ptp4l.conf
$(podman exec switch1 systemctl enable --now ptp4l) || {
status=$?
echo "❌ command failed with code $status"
podman exec switch1 systemctl start ptp4l || true
podman exec switch1 systemctl status ptp4l
podman exec switch1 journalctl -u ptp4l
exit $status
}
In real operation, the switch config will be likely duplicated and not copied from the cluster secret anyways, so not to far from e2e workflow.
| PTP_TEST_MODE=dualfollower ginkgo --skip=".*The interfaces supporting ptp can be discovered correctly.*" --skip="Negative - run pmc in a new unprivileged pod on the slave node.*" -v --keep-going --output-dir=$JUNIT_OUTPUT_DIR --junit-report=$JUNIT_OUTPUT_FILE -v "$SUITE"/serial | ||
|
|
||
| # Configure switch1 for authentication testing | ||
| kubectl apply -f ptp-security.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional: add all code responsible for enabling authentication in the switch in a function called enable_switch_auth
scripts/run-ci-github.sh
Outdated
| export CNF_TESTS_IMAGE=test:lptpd | ||
|
|
||
| # Configure switch1 for authentication testing | ||
| podman cp ptpswitchconfig.cfg switch1:/etc/ptp4l.conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional: add all code responsible for disabling authentication in the switch in a function called disable_switch_auth
api/v1/ptpconfig_webhook.go
Outdated
| validSpps, err := parseValidSppsFromSecret(secret) | ||
| if err != nil { | ||
| ptpconfiglog.Error(err, "failed to parse SPPs from secret", "secret", secret.Name, "profile", profileName) | ||
| // Fail open - don't block if we can't parse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can't parse the secret to extract the spp, we should fail in my opinion. The goal of this function is to validate the spp, if no spp is found, we should not succeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if no spp is found in the ptpconfig, then the configuration of the ptpconfig should fail?
api/v1/ptpconfig_webhook.go
Outdated
| // parseValidSppsFromSecret extracts all valid SPP numbers from a PTP security secret | ||
| // It looks for any line starting with "spp <number>" regardless of structure or sections | ||
| func parseValidSppsFromSecret(secret *corev1.Secret) ([]string, error) { | ||
| var validSpps []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format of the secret seems to be the same as the format of the ptp4l config file. Shouldn't we reuse this function to parse it : populatePtp4lConf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PopulatePtp4lConf parses the ptp4lconf in a ptpconfig and returns a map of sections which points to a map of options inside a specific section..
for instance ptp4lconf has a section global which maps every line to a key, value.
in our test case, the secret has the same section twice [security_association] which includes the spp values, so if we use PopulatePtp4lConf it will duplicate the key.. only the first section will be considered, hence only one spp value will be checked ! (the first one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see. We can keep the current code then.
| return uns.SetNestedSlice(updated.Object, mergedVolumes, "spec", "template", "spec", "volumes") | ||
| } | ||
|
|
||
| // mergeSecurityAnnotations implements simplified merge logic for annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the comments to make sure they are descriptive of the logic in the function (not just this comment). Explain that annotations are set to detect changes in the secret content.
| // - PtpConfig controller: use all volumes/annotations/mounts from updated (even if empty) | ||
| // - PtpOperatorConfig controller: use base from updated + preserve security from current | ||
| // This supports adding/deleting sa_file or secrets while preserving non-security changes. | ||
| func MergeDaemonSetForUpdate(ctx context.Context, current, updated *uns.Unstructured) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following functions are doing mostly the same thing but slightly differently which make them nt good candidate to share more code using template or similar. However , we should at least split them into logical pieces to help with understanding.
mergeSecurityVolumes
mergeSecurityAnnotations
mergeSecurityVolumeMounts
This is for volumes for instance, similar workflow for annotations and volumen mount. I would created helper function to abstract each logical step:
get volumes from current
get volumes from updated
if ptpoperatorconfig -> keep security changes
if ptpconfig -> ovverwrite security changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will push a new commit soon with more helper functions for each piece of logic
| metadata: | ||
| annotations: | ||
| controller-gen.kubebuilder.io/version: v0.15.0 | ||
| controller-gen.kubebuilder.io/version: v0.19.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of problems to build your project because you used controller-gen v0.19.0.
I also think it might be the reason for discrepancies on make manifests because the new version seems to "optimize" the roles by combining several API versions into one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted to the previous version.
so that issue should be fixed
ccea7ba to
65cd51c
Compare
greyerof
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the new functions validateSecretConflicts() and validateSppInSecret() are iterating throught the ptpconfig profiles and using the populatePtp4lConf to get the ptp config to validate things. Why not doing the secret and spp param validation inside the r.validate(), as it's already iteraring through profiles and populating the ptpconf that you need to review?
api/v1/ptpconfig_webhook.go
Outdated
| } | ||
|
|
||
| // contains checks if a string slice contains a specific string | ||
| func contains(slice []string, item string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this func, just use golang's standard slices' package func "slices.Contains()" as in line 192.
api/v1/ptpconfig_webhook.go
Outdated
| return nil | ||
| } | ||
|
|
||
| for _, profile := range r.Spec.Profile { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original r.validate() func that is called from ValidateCreate() and ValidateUpdate() already iterates over the Profiles. Isn't it possible to check the PtpSecretName there?
|
|
||
| // Parse ptp4lConf to get spp value from [global] section | ||
| conf := &ptp4lConf{} | ||
| if err := conf.populatePtp4lConf(profile.Ptp4lConf, profile.Ptp4lOpts); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The r.validate() func is already populating the ptp4lConf when it iterates over the profiles. Isn't it possible to reuse that one?
Add helper functions for security item detection (isSecurityItem, isSecurityAnnotation) Extract filter/extract operations into reusable functions for volumes, annotations, and mounts Improve SPP validation to fail-closed (reject on parse errors instead of allowing) Refactor parseValidSppsFromSecret to reuse ptp4lConf parser for consistency Add detailed comments explaining secret hash change detection mechanism Downgrade controller-gen from v0.19.0 to v0.15.0 for compatibility Add kubebuilder marker to skip auto-generation for Ptp4lConf type Regenerate CRDs and RBAC with ptpSecretName field support
65cd51c to
481eed1
Compare
Export ptp4lConfSection to Ptp4lConfSection (public) Remove wrapper pattern: Ptp4lConf now directly contains sections field Update all references from p.conf.X to p.X (direct access) Change populatePtp4lConf receiver from private to public type Remove manual DeepCopy implementations (now auto-generated) Regenerate DeepCopy methods for public Ptp4lConf and Ptp4lConfSection
- Move Secret and DaemonSet RBAC from ClusterRole to namespace-scoped Roles - Add secrets_role.yaml and daemonsets_role.yaml with RoleBindings - Configure manager cache to watch Secrets only in openshift-ptp namespace - Remove cluster-scoped kubebuilder RBAC markers from controller - Update CSV manifests to reflect new RBAC structure This fixes RBAC errors where the controller attempted cluster-wide Secret watching but only had namespace-scoped permissions.
| secretName, profileName) | ||
| } | ||
| // For other errors (like permission issues), log but don't block | ||
| ptpconfiglog.Error(err, "failed to verify secret existence", "secret", secretName, "profile", profileName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to return nil here? Shouldn't this be an error too?
Added a code to the ptpconfig_controller, which watches the ptpconfig CRs and checks for the field Ptp4lConf under PtpProfile, it parses this field and checks if there it includes sa_file and extracts it. The operator automatically mounts PTP security secrets at the path specified in the
sa_filedirective, ensuring ptp4l can access them without manual configuration.Added a condition checking for a change in the sa_file path,. Changes to
sa_filepath trigger pod restart to apply new mounts and update the linuxptp template with new mount configuration.Added a field in ptpconfig_types.go to the PtpProfile Struct “PtpSecretName”, so the user can add the secret name to the ptpconfig and then the operator will watch this field to add it to the secret volume, and then create the subPath in the volumeMounts out of that secret.
Added a function the merge.go, “MergeDaemonSetForUpdate” which preserves the ptp-security-volume added to the ptpconfig_controller, this volume is dynamically managed based on the PtpConfig CRs and should not be overwritten when PtpOperatorConfig reconciles the Daemonset template.