Skip to content

Update requester template to work with latest FMA#216

Draft
manoelmarques wants to merge 1 commit intollm-d-incubation:mainfrom
manoelmarques:fma
Draft

Update requester template to work with latest FMA#216
manoelmarques wants to merge 1 commit intollm-d-incubation:mainfrom
manoelmarques:fma

Conversation

@manoelmarques
Copy link
Contributor

@manoelmarques manoelmarques force-pushed the fma branch 3 times, most recently from 1051709 to 24c0a8c Compare February 17, 2026 18:54
Copy link

@rubambiza rubambiza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Leave some general comments without explicit approval to make sure others get a chance to weigh in.

VLLM_LOGGING_LEVEL: DEBUG
VLLM_NIXL_SIDE_CHANNEL_PORT: "5600"
VLLM_SERVER_DEV_MODE: "1"
VLLM_USE_V1: "1"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume the requester will always be launched on GPU-enabled nodes. However, if it is launched using CPU, I wanted to flag that there is an upcoming PR from Jun that will set the VLLM_CPU_KV_CACHE_SPACE variable to avoid a kill by OOM for the launcher pod. This is just an FYI in case we need to make changes in the near future.

command:
- /bin/bash
- -c
image: ghcr.io/llm-d-incubation/llm-d-fast-model-actuation/launcher:latest

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aavarghese Just a general pondering, do I understand it correctly that the tag the launcher images use follows whatever semver we are using for a (test) release that is availed to llmd-benchmark? 👇

https://github.com/aavarghese/llm-d-fast-model-actuation/blob/5d92470760aa3825cab8dc1624fee6f33b65228d/.github/workflows/publish-release.yaml#L132

If so, then I think whatever is being output here should not necessarily be latest, right?
It would be great to make sure that all these discussions we are going through process-wise are actually useful.

CC: @diegocastanibm

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think latest is the best default right now?! And when Manoel will be running the benchmark with a stable FMA release or release candidate, he will use values-requester.yaml to specify the launcher/requester tag that will override latest...

@kalantar
Copy link
Collaborator

I lack understanding of the FMA approach. It appears to me that the modelServerConfig is similar in many ways to a prefill or decode pod. I see these defined as well. Could you explain the relationship or point me to a document explaining I could look at.

@rubambiza
Copy link

rubambiza commented Feb 20, 2026

I lack understanding of the FMA approach. It appears to me that the modelServerConfig is similar in many ways to a prefill or decode pod. I see these defined as well. Could you explain the relationship or point me to a document explaining I could look at.

@kalantar We are working on updating the documentation for FMA as it evolves. For now, you can get an overview in this open PR: https://github.com/rubambiza/llm-d-fast-model-actuation/blob/202dc3691615f9677de9578d11e0b470815ee33d/README.md

@manoelmarques manoelmarques force-pushed the fma branch 4 times, most recently from a7176d5 to f2b4ca8 Compare February 23, 2026 14:46
Co-authored-by: aavarghese <avarghese@us.ibm.com>
Co-authored-by: manoelmarques <manoel.marques@ibm.com>
Signed-off-by: manoelmarques <manoel.marques@ibm.com>
Signed-off-by: aavarghese <avarghese@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants