Add support to install by kustomize by avinashsingh77 · Pull Request #179 · llm-d-incubation/llm-d-modelservice

avinashsingh77 · 2026-01-12T11:20:11Z

This PR introduces Kustomize as an alternative installation method for llm-d-modelservice, providing users with a declarative, composable deployment approach alongside the existing Helm charts.

Multi-accelerator support: Nvidia, Intel (XE/i915/Gaudi), AMD, and Google TPU configurations
Composable components: 6 optional features including multinode (LeaderWorkerSet), monitoring (Prometheus), P/D disaggregation, DRA, and FMA
8 ready-to-use examples: From basic single-node to advanced multi-node and disaggregated deployments
Full feature parity with Helm: All existing capabilities available with documentation.

Gregory-Pereira · 2026-01-23T04:56:01Z

For context for maintainers this PR is to aggregate feedback on the potential migration and resolve points on if / how it should work rather than code that should be merged in this repo -- it will eventually land in the main repo

Gregory-Pereira

I dont have time to do the full review right, but ive called out some things to start. I think my overall objection to this right now is that there is too many configuration overlays. I think another pattern we could consider is having one modelserver directory per guide, and then just do variation based on the hardware accelerator. we could move monitoring into base as described below, this would also let us get rid of single vs multi-node as well, because guides explicitly have multi or non multi-node deployments for that pattern. We could move DRA to be based on the accelerator, typically for Nvidia or AMD GPUs we can k8s device plugin system, Ive only really ever seen intel devices go through DRA. Take that point with a grain of salt though because I am definetly no DRA expert.

The point here I guess I am making is that I think we need to aggregate more of these overlays into more "whole" deployments. We can try to group things per guide - their purpose is to demonstrate patterns within inference which I think is not being shown here. The project is aimed at providing "guides" / "well-lit-paths" which are fleshed out examples, pre tuned to work in production, it is users responsibility to walk back up the path and build their own to suit their use-case. Hope this context framing helps inform the design here

kustomize/base/resources/decode-deployment.yaml

kustomize/components/monitoring/decode-podmonitor.yaml

kustomize/components/monitoring/prefill-podmonitor.yaml

kustomize/components/routing-proxy/kustomization.yaml

Gregory-Pereira · 2026-01-23T05:19:13Z

kustomize/overlays/examples/fma-requester/README.md

Very cool I haven't seen FMA used yet

kustomize/base/monitoring/decode-podmonitor.yaml

github-actions · 2026-02-19T03:40:10Z

This PR is marked as stale after 21d of inactivity. After an additional 14d of inactivity (7d to become rotten, then 7d more), it will be closed. To prevent this PR from being closed, add a comment or remove the lifecycle/stale label.

Co-authored-by: Greg Pereira <grpereir@redhat.com>

Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

avinashsingh77 · 2026-03-02T10:32:22Z

@Gregory-Pereira Does it look better now?

Changes:

Base layer — Monitoring (PodMonitors) moved into base/monitoring/, always included. Added env: [] to all base resource templates for accelerator patch compatibility.
Accelerators — DRA moved from components/dra/ to sibling directories accelerators/nvidia-dra/ and accelerators/amd-dra/, each referencing their parent accelerator.
Guide-aligned overlays — 8 overlays created, mapping 1:1 with llm-d guides:

| Overlay | → | Accelerator Variants |
inference-scheduling/ → nvidia, amd, intel-xpu, intel-gaudi, google-tpu
pd-disaggregation/ → nvidia, google-tpu, intel-xpu
wide-ep-lws/ → nvidia
workload-autoscaling/ → nvidia
simulated-accelerators/ → (CPU only)
tiered-prefix-cache/ → nvidia
precise-prefix-cache-aware/ → nvidia, intel-xpu
predicted-latency-based-scheduling/ → nvidia (placeholder)

Questions/Notes:

For now I have added a placeholder with empty overlay directory for predicted-latency-based-scheduling.

add feature to install by kustomize

32ac3a7

Gregory-Pereira reviewed Jan 23, 2026

View reviewed changes

github-actions bot added the lifecycle/stale label Feb 19, 2026

Update port name

8677925

Co-authored-by: Greg Pereira <grpereir@redhat.com>

github-actions bot removed the lifecycle/stale label Feb 24, 2026

avinashsingh77 added 3 commits March 2, 2026 13:17

fix lint errors

4a0a0c7

Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

remove DP and TP size env vars

cdbbc27

Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

reorganize kustomize to match guides

7f5b7fc

Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

avinashsingh77 requested a review from Gregory-Pereira March 2, 2026 10:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support to install by kustomize#179

Add support to install by kustomize#179
avinashsingh77 wants to merge 5 commits intollm-d-incubation:mainfrom
avinashsingh77:helm-to-kustomize

avinashsingh77 commented Jan 12, 2026

Uh oh!

Gregory-Pereira commented Jan 23, 2026

Uh oh!

Gregory-Pereira left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Gregory-Pereira Jan 23, 2026

Uh oh!

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

avinashsingh77 commented Mar 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

avinashsingh77 commented Jan 12, 2026

Uh oh!

Gregory-Pereira commented Jan 23, 2026

Uh oh!

Gregory-Pereira left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Gregory-Pereira Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

avinashsingh77 commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Gregory-Pereira left a comment •

edited

Loading

avinashsingh77 commented Mar 2, 2026 •

edited

Loading