Update requester template to work with latest FMA by manoelmarques · Pull Request #216 · llm-d-incubation/llm-d-modelservice

manoelmarques · 2026-02-17T17:20:05Z

Added new resources to the requester template. They use custom CRDs that need to be pre-installed separately:

InferenceServerConfig: https://raw.githubusercontent.com/llm-d-incubation/llm-d-fast-model-actuation/main/config/crd/fma.llm-d.ai_inferenceserverconfigs.yaml

LauncherConfig: https://raw.githubusercontent.com/llm-d-incubation/llm-d-fast-model-actuation/main/config/crd/fma.llm-d.ai_launcherconfigs.yaml

rubambiza

LGTM. Leave some general comments without explicit approval to make sure others get a chance to weigh in.

rubambiza · 2026-02-18T02:27:07Z

examples/output-requester.yaml

+      VLLM_LOGGING_LEVEL: DEBUG
+      VLLM_NIXL_SIDE_CHANNEL_PORT: "5600"
+      VLLM_SERVER_DEV_MODE: "1"
+      VLLM_USE_V1: "1"


I presume the requester will always be launched on GPU-enabled nodes. However, if it is launched using CPU, I wanted to flag that there is an upcoming PR from Jun that will set the VLLM_CPU_KV_CACHE_SPACE variable to avoid a kill by OOM for the launcher pod. This is just an FYI in case we need to make changes in the near future.

rubambiza · 2026-02-18T02:41:05Z

examples/output-requester.yaml

+        command:
+        - /bin/bash
+        - -c
+        image: ghcr.io/llm-d-incubation/llm-d-fast-model-actuation/launcher:latest


@aavarghese Just a general pondering, do I understand it correctly that the tag the launcher images use follows whatever semver we are using for a (test) release that is availed to llmd-benchmark? 👇

https://github.com/aavarghese/llm-d-fast-model-actuation/blob/5d92470760aa3825cab8dc1624fee6f33b65228d/.github/workflows/publish-release.yaml#L132

If so, then I think whatever is being output here should not necessarily be latest, right?
It would be great to make sure that all these discussions we are going through process-wise are actually useful.

CC: @diegocastanibm

I think latest is the best default right now?! And when Manoel will be running the benchmark with a stable FMA release or release candidate, he will use values-requester.yaml to specify the launcher/requester tag that will override latest...

kalantar · 2026-02-19T21:04:02Z

I lack understanding of the FMA approach. It appears to me that the modelServerConfig is similar in many ways to a prefill or decode pod. I see these defined as well. Could you explain the relationship or point me to a document explaining I could look at.

rubambiza · 2026-02-20T16:49:44Z

I lack understanding of the FMA approach. It appears to me that the modelServerConfig is similar in many ways to a prefill or decode pod. I see these defined as well. Could you explain the relationship or point me to a document explaining I could look at.

@kalantar We are working on updating the documentation for FMA as it evolves. For now, you can get an overview in this open PR: https://github.com/rubambiza/llm-d-fast-model-actuation/blob/202dc3691615f9677de9578d11e0b470815ee33d/README.md

Co-authored-by: aavarghese <avarghese@us.ibm.com> Co-authored-by: manoelmarques <manoel.marques@ibm.com> Signed-off-by: manoelmarques <manoel.marques@ibm.com> Signed-off-by: aavarghese <avarghese@us.ibm.com>

manoelmarques force-pushed the fma branch 3 times, most recently from 1051709 to 24c0a8c Compare February 17, 2026 18:54

rubambiza reviewed Feb 18, 2026

View reviewed changes

manoelmarques force-pushed the fma branch from 24c0a8c to 2048801 Compare February 18, 2026 20:41

manoelmarques force-pushed the fma branch 4 times, most recently from a7176d5 to f2b4ca8 Compare February 23, 2026 14:46

Update requester template to work with latest FMA

77dc636

Co-authored-by: aavarghese <avarghese@us.ibm.com> Co-authored-by: manoelmarques <manoel.marques@ibm.com> Signed-off-by: manoelmarques <manoel.marques@ibm.com> Signed-off-by: aavarghese <avarghese@us.ibm.com>

manoelmarques force-pushed the fma branch from f2b4ca8 to 77dc636 Compare March 2, 2026 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update requester template to work with latest FMA#216

Update requester template to work with latest FMA#216
manoelmarques wants to merge 1 commit intollm-d-incubation:mainfrom
manoelmarques:fma

manoelmarques commented Feb 17, 2026

Uh oh!

rubambiza left a comment

Uh oh!

rubambiza Feb 18, 2026

Uh oh!

rubambiza Feb 18, 2026

Uh oh!

aavarghese Feb 23, 2026

Uh oh!

kalantar commented Feb 19, 2026

Uh oh!

rubambiza commented Feb 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

manoelmarques commented Feb 17, 2026

Uh oh!

rubambiza left a comment

Choose a reason for hiding this comment

Uh oh!

rubambiza Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

rubambiza Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

aavarghese Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

kalantar commented Feb 19, 2026

Uh oh!

rubambiza commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rubambiza commented Feb 20, 2026 •

edited

Loading