Skip to content

Adding validating admission webhooks for NIMService/NIMCache with Helm deployment updates#581

Merged
shivamerla merged 1 commit intoNVIDIA:mainfrom
aryangorwade:validating-webhook-nimservice-nimcache-helm
Aug 1, 2025
Merged

Adding validating admission webhooks for NIMService/NIMCache with Helm deployment updates#581
shivamerla merged 1 commit intoNVIDIA:mainfrom
aryangorwade:validating-webhook-nimservice-nimcache-helm

Conversation

@aryangorwade
Copy link
Copy Markdown
Collaborator

Adds validating admission webhooks (ValidateCreate and ValidateUpdate) with cert-manager TLS integration for both NIMService and NIMCache CRDs and wires them into the Helm chart so they are deployed automatically with the operator.

Webhook and Helm deployment follows the style of https://github.com/Mellanox/network-operator.

Specification document: https://docs.google.com/document/d/11pir7oqXmDNUB_BrfCnbj7wa8VNbsos43a-jif01C98/edit?usp=sharing

Key Changes

  1. Webhook Implementations (each file has a helper file)

    • internal/webhook/apps/v1alpha1/nimservice_webhook.go
    • internal/webhook/apps/v1alpha1/nimcache_webhook.go
      Each file defines:
    • Detailed validation logic in ValidateCreate and ValidateUpdate.
    • A no-op ValidateDelete stub (required by admission.CustomValidator, but never invoked because the webhook is registered for create and update only).
  2. Controller Wiring

    • SetupNIMServiceWebhookWithManager and SetupNIMCacheWebhookWithManager register the validators with the controller-runtime manager.
    • main.go now invokes both Setup*WebhookWithManager helpers so the webhooks start with the operator.
  3. Generated Manifests

    • Running make manifests now produces the corresponding ValidatingWebhookConfiguration YAML under config/webhook/.
    • Each rule includes verbs=create;update; deletion is intentionally excluded.
    • Paths follow the required pattern (e.g., /validate-apps-nvidia-com-v1alpha1-nimservice).
  4. Helm Chart Updates (TLS + Webhook Packaging)

  • Align the NIM Operator chart (deployments/helm/k8s-nim-operator/) with the Network Operator’s admission-controller pattern.
  • New template: templates/admission_controller.yaml
    • Generates the webhook Service, ValidatingWebhookConfiguration, and TLS assets.
    • Supports two modes, driven by values:
      • useCertManager: true – create Issuer and Certificate resources for self-signed certs.
      • useCertManager: false – load a user-supplied Secret containing caCrt, tlsCrt, tlsKey.
    • Entire block gated by operator.admissionController.enabled.
  • values.yaml additions
    operator:
      admissionController:
        enabled: true          # enable webhooks by default
        useCertManager: true   # toggle between cert-manager or custom certs
        # certificate:
        #   caCrt: |-
        #   tlsCrt: |-
        #   tlsKey: |-
  • deployment.yaml tweaks
    • Conditionally expose webhook port 9443.
    • Mount the webhook-server-cert secret at /tmp/k8s-webhook-server/serving-certs.
    • Add ENABLE_WEBHOOKS=true env var when admissionController.enabled is set.
  • Static Kubebuilder manifests in config/certmanager and config/webhook remain for local dev (make manifests && make deploy).
  1. RBAC Adjustments

    • ClusterRole/Role expanded to allow:
      • Reading secrets for TLS
      • Reading/writing validatingwebhookconfigurations to enable certificate patching
  2. Testing & CI

    • Manual testing with a variety of YAMLS for every specification in the document has been done

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Jul 16, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@aryangorwade aryangorwade force-pushed the validating-webhook-nimservice-nimcache-helm branch from d101583 to 69d026f Compare July 16, 2025 23:23
Copy link
Copy Markdown
Collaborator

@visheshtanksale visheshtanksale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Fix the linter error.
  • Take latest updates from main and rebase your changes on top of it.
  • Its generally a good practice to start PR with a single commit.

@visheshtanksale visheshtanksale self-requested a review July 17, 2025 00:44
@aryangorwade aryangorwade force-pushed the validating-webhook-nimservice-nimcache-helm branch 10 times, most recently from 9d88736 to 03326b0 Compare July 17, 2025 19:55
@aryangorwade
Copy link
Copy Markdown
Collaborator Author

Addressed all concerns-edited code to pass linting tests as well. Linting caught NIMCache.Spec.Storage.HostPath being deprecated, so it was removed from the code.

@aryangorwade
Copy link
Copy Markdown
Collaborator Author

aryangorwade commented Jul 17, 2025

All concerns addressed.

There are these lines in config/manager/kustomization.yaml:

images: 
- name: controller
    newName: localhost:5000/k8s-nim-operator
    newTag: dev

Also, in deployments/helm/k8s-nim-operator/values.yaml:

image: 
    repository: localhost:5000/k8s-nim-operator
    tag: dev

These were changed for local development and testing. Should they be changed back to their original values of

images: 
- name: controller
     newName: nvcr.io/nvidia/cloud-native/nim-operator
     newTag: v1.0.0
image: 
    repository: ghcr.io/nvidia/k8s-nim-operator
    tag: main

Additionally, Kustomization files were changed. Does anything need to be done about those changes?

@shivamerla
Copy link
Copy Markdown
Collaborator

All concerns addressed.

There are these lines in config/manager/kustomization.yaml:

images: 
- name: controller
    newName: localhost:5000/k8s-nim-operator
    newTag: dev

Also, in deployments/helm/k8s-nim-operator/values.yaml:

image: 
    repository: localhost:5000/k8s-nim-operator
    tag: dev

These were changed for local development and testing. Should they be changed back to their original values of

images: 
- name: controller
     newName: nvcr.io/nvidia/cloud-native/nim-operator
     newTag: v1.0.0
image: 
    repository: ghcr.io/nvidia/k8s-nim-operator
    tag: main

Additionally, Kustomization files were changed. Does anything need to be done about those changes?

Yes, we should change these to original values.

@aryangorwade aryangorwade force-pushed the validating-webhook-nimservice-nimcache-helm branch 4 times, most recently from a90d526 to 42416ae Compare July 18, 2025 18:12
@aryangorwade
Copy link
Copy Markdown
Collaborator Author

Addressed these concerns @shivamerla, please review

@aryangorwade aryangorwade force-pushed the validating-webhook-nimservice-nimcache-helm branch 5 times, most recently from 5cf860a to 72ef39c Compare July 22, 2025 21:28
@aryangorwade aryangorwade force-pushed the validating-webhook-nimservice-nimcache-helm branch 4 times, most recently from 948a393 to 7623902 Compare July 26, 2025 03:42
@aryangorwade aryangorwade force-pushed the validating-webhook-nimservice-nimcache-helm branch 6 times, most recently from 16c3b62 to 3ce054e Compare July 30, 2025 20:19
shivamerla
shivamerla previously approved these changes Jul 31, 2025
@shivamerla shivamerla requested a review from varunrsekar July 31, 2025 05:33
@aryangorwade
Copy link
Copy Markdown
Collaborator Author

Code coverage is 64%. Should I aim for 100%?

ok  	github.com/NVIDIA/k8s-nim-operator/internal/webhook/apps/v1alpha1	0.043s	coverage: 64.0% of statements

@aryangorwade aryangorwade force-pushed the validating-webhook-nimservice-nimcache-helm branch from ee301cf to 4e1e763 Compare July 31, 2025 22:53
Copy link
Copy Markdown
Collaborator

@varunrsekar varunrsekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the change!

Reg the DCO check failure, please sign-off on all your commits

@aryangorwade aryangorwade force-pushed the validating-webhook-nimservice-nimcache-helm branch from 4e1e763 to 18e1a2f Compare August 1, 2025 17:21
…te and ValidateCreate. Helm deployment configured as well. Addressed linting, public functions, and git issues.

Signed-off-by: Aryan <[email protected]>
@aryangorwade aryangorwade force-pushed the validating-webhook-nimservice-nimcache-helm branch from 18e1a2f to c0632a8 Compare August 1, 2025 17:22
@shivamerla shivamerla merged commit 581daa0 into NVIDIA:main Aug 1, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants